1. Purpose of System

This system is to autonomously review and enhance Python code using a multi-agent architecture. The agents collaborate in a loop to generate, critique, test, and refine code until it meets a predefined quality threshold. The goal is to simulate a collaborative programming environment that continuously improves the code through structured feedback and testing.

2. System Architecture

The system is composed of four core agents:

Critic agent: evaluates code quality and returns a score, and improvement suggestions.
Code generator agent: generates or revises code based on feedback.
Test generator agent: creates test cases from the code.
Execution agent: runs the code against test cases to evaluate its correctness.

These agents are orchestrated in a LangGraph workflow, which continues iterating until both the critic score and test pass rate surpasses a threshold, or a maximum number of iteration is reached.

3. Used Tools

LangGraph: manages the agent workflow and conditional loops.
OpenAI GPT: powers the agents (code generation, test case generation, and critique).
Python 3 Standard Libraries: used for test execution and validation.