Evidence-ledger draft

Evidence Ledgers for Autonomous Research Harnesses

A public release of an evidence-ledger draft for autonomous research harnesses, including paper text, claim ledger, and auditable source artifacts.

Paper draft

Readable HTML version of the generated Markdown draft.

Claim ledger

5 claims traced to source paper IDs and evidence rows.

Audit status

pass 8 supported claims, 5 filled evidence rows.

Download evidence CSV

Release notes

Generated at 2026-05-08T02:31:06+00:00 from local artifacts in cs-ai/research-harnesses.
The site intentionally excludes the localhost tmux control UI; only static research artifacts are public.
Rows marked abstract-derived still require full-text audit before submission-level claims.

Evidence-backed papers

Paper	Claim	Notes
`2605.03042v1`	Long-horizon single-agent research is unreliable by default; the central failure mode is plausible unsupported success, mitigated through persistent state, modular execution, and independent assurance via cross-family executor/reviewer separation.	Full text read from arXiv PDF. Integration implication: our harness should keep evidence_matrix/claim ledger as first-class artifacts, add research-harnesses taxonomy, and keep reviewer-independent audits as hard invariants.
`2504.08066v1`	AI Scientist-v2 reports an end-to-end agentic system that formulates hypotheses, designs and executes experiments, analyzes and visualizes data, and writes manuscripts; compared with v1 it removes reliance on human-authored code templates and uses progressive agentic tree search.	Abstract-derived evidence row; needs full-text audit before final submission.
`2603.28589v1`	Clinical autonomous research needs domain-specific evidence grounding; Medical AI Scientist transforms surveyed literature into actionable evidence and uses clinician-engineer co-reasoning to improve traceability of generated ideas.	Abstract-derived evidence row; full text required before citing numerical comparisons.
`2507.23276v2`	Current AI Scientist systems have visible achievements, but the field still needs clarity on bottlenecks and critical components before scientific agents can produce ground-breaking discoveries that solve grand challenges.	Abstract-derived evidence row; needs full text for taxonomy details.
`2511.04583v4`	Jr. AI Scientist follows a novice-student-like workflow from a baseline paper: analyze limitations, formulate hypotheses, iterate experiments until improvements, and write a result paper; the work also reports risks and limitations of current AI Scientist systems.	Abstract-derived evidence row; useful for risk and workflow comparison.