On the fragility of Agentic Systems

When we began developing CASSIA, the premise was simple: Large Language Models (LLMs) are excellent reasoning engines. If we treat them as individual agents—one acting as a biologist, another as a critic, and another as a summarizer—we should be able to replicate the manual annotation workflow.

However, biological data is messy. Unlike code generation, where the output works or it doesn't, cell type annotation exists in a grey area. A cell might express markers for both T-cells and NK cells. An LLM agent, trained to be helpful, often tries to "force" a classification where ambiguity is the scientific reality.

The "Yes-Man" Problem

One of the primary failure modes we observed was agent agreeableness. If Agent A (the Proposer) hallucinated a cell type based on weak evidence, Agent B (the Critic) often failed to correct it, instead fabricating a justification for the error. This echo chamber effect is the single biggest hurdle in deploying autonomous agents in rigorous scientific pipelines.

The solution lies not in better prompting, but in grounding. By forcing the agents to query external, immutable knowledge graphs before debating, we reduce the hallucination window. But the struggle remains: how do we teach a system to say "I don't know"?

Dec 2025

The New Productivity Metric: Agent-Hours per Human-Hour

I've been letting Claude Code run continuously for the past few weeks, and I've realized something fundamental has shifted in how we should measure productivity. The critical metric is no longer output per hour worked. It's how many agent-hours you can generate per human-hour invested.

This is surplus value (剩余价值) in its purest form. But unlike traditional capital, where surplus value is extracted from labor, here the surplus comes from autonomous systems working on your behalf. Every hour I spend designing a task architecture, writing clear specifications, or setting up workflows translates into 10, 20, sometimes 50 hours of agent work running in parallel.

The beauty of continuous agent work is that it's fundamentally asymmetric. While I sleep, the agents are refactoring code, running experiments, writing documentation, searching through literature. The compounding effect is staggering. A single well-structured prompt can spawn days of productive work.

Here's the uncomfortable truth: if you're not exploiting this, you're losing competitive edge. This isn't optional anymore. The researcher who can orchestrate 100 agent-hours per week will outpace the one doing everything manually by an order of magnitude. The startup that parallelizes development across autonomous agents will ship faster than the one relying solely on human velocity.

This is free capital sitting on the table. The infrastructure is here. The models are capable. The only bottleneck is learning to delegate effectively and trust the process. Those who master this will define the next decade of scientific and technical progress. Those who don't will wonder why they can't keep up.

We're entering an era where the limiting factor isn't intelligence or effort. It's the ability to architect work such that autonomous agents can execute it. That's the skill worth developing. That's the new literacy.

Yixuan Elliot Xie

Selected Papers

CASSIA

Gut-larynx axis and its contribution to laryngeal immunity

Surrogate selection oversamples expanded T cell clonotypes

Single-cell view into the role of microbiota shaping host immunity in the larynx

Transcriptomic and proteomic spatial profiling of diffuse midline glioma

Thoughts