Evidence-ledger draft

Claim ledger

CSV-backed claim ledger tying paper claims to paper IDs and evidence status.

paper_idclaimevidence_statusaudit_status
2605.03042v1Long-horizon single-agent research is unreliable by default; the central failure mode is plausible unsupported success, mitigated through persistent state, modular execution, and independent assurance via cross-family executor/reviewer separation.has evidence rownot reviewed
2504.08066v1AI Scientist-v2 reports an end-to-end agentic system that formulates hypotheses, designs and executes experiments, analyzes and visualizes data, and writes manuscripts; compared with v1 it removes reliance on human-authored code templates and uses progressive agentic tree search.has evidence rownot reviewed
2603.28589v1Clinical autonomous research needs domain-specific evidence grounding; Medical AI Scientist transforms surveyed literature into actionable evidence and uses clinician-engineer co-reasoning to improve traceability of generated ideas.has evidence rownot reviewed
2507.23276v2Current AI Scientist systems have visible achievements, but the field still needs clarity on bottlenecks and critical components before scientific agents can produce ground-breaking discoveries that solve grand challenges.has evidence rownot reviewed
2511.04583v4Jr. AI Scientist follows a novice-student-like workflow from a baseline paper: analyze limitations, formulate hypotheses, iterate experiments until improvements, and write a result paper; the work also reports risks and limitations of current AI Scientist systems.has evidence rownot reviewed