Preprint 2026 · Survey

Code as Agent Harness

Toward Executable, Verifiable, and Stateful Agent Systems

A code-centered view of agentic AI: code is not only generated output, but the operational substrate for reasoning, acting, environment modeling, execution feedback, and multi-agent coordination.

agent-harness.sh


        

Xuying Ning1†, Katherine Tieu1†, Dongqi Fu2†, Tianxin Wei1†, Zihao Li1†, Yuanchen Bei1†, Jiaru Zou3, Mengting Ai1, Zhining Liu1, Ting-Wei Li1, Lingjie Chen1, Yanjun Zhao1, Ke Yang1, Bingxuan Li1, Cheng Qian1, Gaotang Li1, Xiao Lin1, Zhichen Zeng1, Ruizhong Qiu1, Sirui Chen1, Yifan Sun1, Xiyuan Yang1, Ruida Wang1, Rui Pan1, Chenyuan Yang1, Dylan Zhang1, Liri Fang1, Zikun Cui2, Yang Cao2, Pan Chen2, Dorothy Sun2, Ren Chen2, Mahesh Srinivasan2, Nipun Mathur2, Yinglong Xia2, Hong Li2, Hong Yan2, Pan Lu3, Lingming Zhang1, Tong Zhang1, Hanghang Tong, Jingrui He

1University of Illinois Urbana-Champaign · 2Meta · 3Stanford University · Core Contributor · §Corresponding Author

3 Connected Layers
6+ Application Areas
102 PDF Pages
450+ Cited Work

Code becomes the runtime medium for agents.

Recent LLMs have become strong code generators, but emerging agentic systems use code for more than final answers. This survey frames code as an agent harness: a unified infrastructure layer for agent reasoning, action, environment modeling, feedback-driven control, and verification. It studies how code connects agents to executable steps, durable state, reusable tools, tests, traces, repositories, and multi-agent workflows.

Three Layers of Code as Harness

01

Harness Interface

Code connects agents to reasoning, action, and environment modeling: executable reasoning traces, programmable actions, DOM/API interfaces, simulators, tests, and state representations.

  • Reasoning substrate
  • Action interface
  • Environment representation
02

Harness Mechanisms

Planning, memory, tool use, control, and optimization sustain agents over long-horizon execution. Failures become feedback for repair rather than dead ends.

  • Planning and decomposition
  • Working and long-term memory
  • Tests, traces, and static analysis
03

Scaling the Harness

Shared code artifacts allow multiple agents to coordinate, review, test, debate, red-team, and verify progress inside a common repository or workflow state.

  • Manager, planner, coder, reviewer, tester roles
  • Centralized and distributed workflows
  • Shared state and collective verification

Where the harness shows up

Coding Assistants GUI / OS Agents Embodied Agents Scientific Discovery Personalization Recommendation DevOps Enterprise Workflows

Harness engineering is the hard part.

Evaluation beyond final success

Intermediate states, traces, repair attempts, and safety checks need first-class metrics.

Verification with incomplete feedback

Agents must act under partial tests, noisy execution signals, and hidden environment state.

Regression-free improvement

Harnesses should learn from failure without silently breaking previously working behavior.

Shared state across agents

Coordination depends on durable memory, repository state, review artifacts, and permissions.

BibTeX

@misc{ning2026codeagentharness,
  title         = {Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems},
  author        = {Xuying Ning and Katherine Tieu and Dongqi Fu and Tianxin Wei and Zihao Li and Yuanchen Bei and Jiaru Zou and Mengting Ai and Zhining Liu and Ting-Wei Li and Lingjie Chen and Yanjun Zhao and Ke Yang and Bingxuan Li and Cheng Qian and Gaotang Li and Xiao Lin and Zhichen Zeng and Ruizhong Qiu and Sirui Chen and Yifan Sun and Xiyuan Yang and Ruida Wang and Rui Pan and Chenyuan Yang and Dylan Zhang and Liri Fang and Zikun Cui and Yang Cao and Pan Chen and Dorothy Sun and Ren Chen and Mahesh Srinivasan and Nipun Mathur and Yinglong Xia and Hong Li and Hong Yan and Pan Lu and Lingming Zhang and Tong Zhang and Hanghang Tong and Jingrui He},
  year          = {2026},
  eprint        = {2605.18747},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2605.18747},
}