Claim ledger

paper_id	claim	evidence_status	audit_status
2605.03042v1	Long-horizon single-agent research is unreliable by default; the central failure mode is plausible unsupported success, mitigated through persistent state, modular execution, and independent assurance via cross-family executor/reviewer separation.	has evidence row	not reviewed
2504.08066v1	AI Scientist-v2 reports an end-to-end agentic system that formulates hypotheses, designs and executes experiments, analyzes and visualizes data, and writes manuscripts; compared with v1 it removes reliance on human-authored code templates and uses progressive agentic tree search.	has evidence row	not reviewed
2603.28589v1	Clinical autonomous research needs domain-specific evidence grounding; Medical AI Scientist transforms surveyed literature into actionable evidence and uses clinician-engineer co-reasoning to improve traceability of generated ideas.	has evidence row	not reviewed
2507.23276v2	Current AI Scientist systems have visible achievements, but the field still needs clarity on bottlenecks and critical components before scientific agents can produce ground-breaking discoveries that solve grand challenges.	has evidence row	not reviewed
2511.04583v4	Jr. AI Scientist follows a novice-student-like workflow from a baseline paper: analyze limitations, formulate hypotheses, iterate experiments until improvements, and write a result paper; the work also reports risks and limitations of current AI Scientist systems.	has evidence row	not reviewed