Preprint 2026 · Survey

Code as Agent Harness

Toward Executable, Verifiable, and Stateful Agent Systems

A code-centered view of agentic AI: code is not only generated output, but the operational substrate for reasoning, acting, environment modeling, execution feedback, and multi-agent coordination.

Paper arXiv GitHub Cite

agent-harness.sh

Xuying Ning^1†, Katherine Tieu^1†, Dongqi Fu^2†, Tianxin Wei^1†, Zihao Li^1†, Yuanchen Bei^1†, Jiaru Zou³, Mengting Ai¹, Zhining Liu¹, Ting-Wei Li¹, Lingjie Chen¹, Yanjun Zhao¹, Ke Yang¹, Bingxuan Li¹, Cheng Qian¹, Gaotang Li¹, Xiao Lin¹, Zhichen Zeng¹, Ruizhong Qiu¹, Sirui Chen¹, Yifan Sun¹, Xiyuan Yang¹, Ruida Wang¹, Rui Pan¹, Chenyuan Yang¹, Dylan Zhang¹, Liri Fang¹, Zikun Cui², Yang Cao², Pan Chen², Dorothy Sun², Ren Chen², Mahesh Srinivasan², Nipun Mathur², Yinglong Xia², Hong Li², Hong Yan², Pan Lu³, Lingming Zhang¹, Tong Zhang¹, Hanghang Tong^1§, Jingrui He^1§

¹University of Illinois Urbana-Champaign · ²Meta · ³Stanford University · ^†Core Contributor · ^§Corresponding Author

3 Connected Layers

6+ Application Areas

102 PDF Pages

450+ Cited Work

Abstract

Code becomes the runtime medium for agents.

Recent LLMs have become strong code generators, but emerging agentic systems use code for more than final answers. This survey frames code as an agent harness: a unified infrastructure layer for agent reasoning, action, environment modeling, feedback-driven control, and verification. It studies how code connects agents to executable steps, durable state, reusable tools, tests, traces, repositories, and multi-agent workflows.

Taxonomy

Three Layers of Code as Harness

Harness Interface

Code connects agents to reasoning, action, and environment modeling: executable reasoning traces, programmable actions, DOM/API interfaces, simulators, tests, and state representations.

Reasoning substrate
Action interface
Environment representation

Harness Mechanisms

Planning, memory, tool use, control, and optimization sustain agents over long-horizon execution. Failures become feedback for repair rather than dead ends.

Planning and decomposition
Working and long-term memory
Tests, traces, and static analysis

Scaling the Harness

Shared code artifacts allow multiple agents to coordinate, review, test, debate, red-team, and verify progress inside a common repository or workflow state.

Manager, planner, coder, reviewer, tester roles
Centralized and distributed workflows
Shared state and collective verification

Applications

Where the harness shows up

Coding Assistants GUI / OS Agents Embodied Agents Scientific Discovery Personalization Recommendation DevOps Enterprise Workflows

Open Problems

Harness engineering is the hard part.

Evaluation beyond final success

Intermediate states, traces, repair attempts, and safety checks need first-class metrics.

Verification with incomplete feedback

Agents must act under partial tests, noisy execution signals, and hidden environment state.

Regression-free improvement

Harnesses should learn from failure without silently breaking previously working behavior.

Shared state across agents

Coordination depends on durable memory, repository state, review artifacts, and permissions.

Citation

BibTeX

@misc{ning2026codeagentharness,
  title         = {Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems},
  author        = {Xuying Ning and Katherine Tieu and Dongqi Fu and Tianxin Wei and Zihao Li and Yuanchen Bei and Jiaru Zou and Mengting Ai and Zhining Liu and Ting-Wei Li and Lingjie Chen and Yanjun Zhao and Ke Yang and Bingxuan Li and Cheng Qian and Gaotang Li and Xiao Lin and Zhichen Zeng and Ruizhong Qiu and Sirui Chen and Yifan Sun and Xiyuan Yang and Ruida Wang and Rui Pan and Chenyuan Yang and Dylan Zhang and Liri Fang and Zikun Cui and Yang Cao and Pan Chen and Dorothy Sun and Ren Chen and Mahesh Srinivasan and Nipun Mathur and Yinglong Xia and Hong Li and Hong Yan and Pan Lu and Lingming Zhang and Tong Zhang and Hanghang Tong and Jingrui He},
  year          = {2026},
  eprint        = {2605.18747},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2605.18747},
}