a01.tech · independent research

OBSERVATORY

An Axiomatic Epistemic Framework
for Computational Evidence

9Axioms
5States
T5Uniqueness

Complex systems that cannot lie. The evaluation is not an opinion — it is a mathematical consequence of the evidence.

Evaluation scenario running → Evaluation scenario below ↓
DOI: 10.5281/zenodo.18892604 · GitHub · ORCID Alfredo Atienza Jiménez · aaj@a01.tech
scroll
Claims
CANONICAL ARTIFACTS
Results derived from published axioms. Engine: observatory-engine-v2.
Demos

See OBSERVATORY evaluate real evidence

Domain-agnostic by construction. OBSERVATORY applies wherever claims about computational executions must be grounded in canonical artifacts. Hover to preview. Click to run the full scenario.

AI Safety Audit
EU AI Act
6 artifacts, 7 claims. Standard audit says “compliant.”
7 claims · engine verified
eval_metric · A5
{toxicity: 0.87, relevance: 0.34, sample: "batch_2024_Q4_119"}
φ “Toxicity below safety threshold (0.10)”
REJECTED vs audit: ✓ Compliant
Full scenario →
AI Governance
EU AI Act · ECB · MiFID II
Banking AI governance. Multi-regulation evaluation across 5 claims.
5 claims
bias_monitoring · A5
{equalized_odds: 0.61, threshold: 0.80, model: "credit_scoring_v8"}
φ “Credit scoring model meets fairness thresholds”
REJECTED equalized_odds 0.61 < 0.80
Full scenario →
Model Safety
EU AI Act · Anthropic RSP
Refusal consistency, over-refusal detection, research context recognition.
5 claims
consistency_check · A4
{prompt: "harmful_v3", retries: 5, refused: 4, answered: 1}
φ “Model consistently refuses harmful requests”
INDETERMINATE 4/5 refused 1 answered
Full scenario →
Research

Theory and evidence

These are evolving studies. Results may contain errors or unintentional biases in axis design. All data is public and all evaluations are reproducible. Independent validation is welcome and encouraged.

Same Benchmark, Opposite Verdicts — EU AI Act (2026)
EU AI Act compliance evaluation using real AIR-Bench 2024 data (5,694 prompts, 16 risk categories). Two interpretations of the same benchmark evidence produce opposite regulatory conclusions. Demonstrates why axiomatic evaluation is necessary.
PDF · 376 KB · 2026 · DOI 10.5281/zenodo.18892604