Back to Projects
Deactivated

AI Code Reviewer

Semantic Code Review System

Java Quarkus Qdrant OpenAI GPT-4o Docker

This project has been deactivated. It served as an exploration of LLM-powered developer tooling and semantic search pipelines.

Motivation

Exploring how LLMs can power smarter developer tooling. This project analysed submitted code and returned high-quality, contextual feedback similar to pull request comments from experienced engineers.

How it Worked

01

Data Collection — Custom web scraper collected real-world code diffs and PR comments from GitHub, alongside public datasets.

02

Embedding & Storage — Each code snippet embedded using OpenAI embeddings and stored in a Qdrant vector database alongside associated review comments.

03

Semantic Search — When users submitted code, the system performed semantic similarity search to retrieve the most relevant code + comment pairs.

04

Review Generation — Retrieved pairs passed into GPT-4o, which generated professional-style code reviews based on contextually similar examples.

Key Learnings

Semantic Search Pipelines

Structuring retrieval pipelines with OpenAI embeddings for code-level similarity.

Vector Database Management

Querying and managing Qdrant for vector-based retrieval at scale.

Prompt Engineering

Crafting prompts for natural, contextual feedback using LLMs.

Production Deployment

Building a scalable Java + Quarkus API and deploying via Docker.

Tech Stack

Java Quarkus JavaScript Qdrant OpenAI GPT-4o OpenAI Embeddings Docker