OpenAI’s New Codex AI Agent Can Do Coding Tasks for You
OpenAI has officially released a research preview of Codex, a new AI coding agent built on a refined version of its o3 model architecture, tailored specifically for software engineering.
Built to Execute and Test Code Autonomously
The new Codex agent runs on Codex-1, a variant of the O3 model, optimized for writing cleaner code and following instructions with greater precision. Unlike previous AI assistants, Codex is capable of autonomously testing its outputs and iterating until the code passes. It operates in a sandboxed virtual machine environment hosted in the cloud and can integrate with GitHub to preload repositories, making it capable of completing tasks like bug fixes, test creation, or feature implementation within 1 to 30 minutes.
Users retain access to their computers while Codex performs tasks in the background. The agent supports multiple simultaneous software engineering tasks and is now available to ChatGPT Pro, Team, and Enterprise users through the ChatGPT sidebar interface.
Codex Interface and Workflow
In ChatGPT, users can assign tasks to Codex by typing a prompt and selecting “Code” or “Ask,” depending on whether they want implementation or insight. Assigned tasks appear in a sidebar, allowing users to monitor progress. OpenAI’s research lead Josh Tobin explained that the goal is for Codex to eventually function as a “virtual teammate,” capable of autonomously completing complex tasks that typically take engineers several hours or even days.
Safety, Limitations, and Future Roadmap
Codex operates in an air-gapped environment and does not access the public internet or third-party APIs, which reduces potential misuse but may also restrict certain use cases. OpenAI stated that Codex has safeguards in place to refuse requests for malicious software development. The tool inherits many of the O3 model’s safety mechanisms.
Despite the progress, OpenAI acknowledged its limitations. AI coding agents remain prone to mistakes, especially during debugging. A recent Microsoft study found that many top-tier models, including Claude 3.7 Sonnet and o3-mini, failed to reliably fix broken code.
In addition, OpenAI is updating Codex CLI, its open-source terminal-based coding agent, with an o4-mini model optimized for engineering tasks. This new version is now the default in Codex CLI and will also be accessible via OpenAI’s API, priced at $1.50 per million input tokens and $6 per million output tokens.

