OBSERVATORY

An Axiomatic Epistemic Framework for Computational Evidence

Nine operational axioms. Five epistemic states. One uniqueness theorem.

The evaluation is not an opinion — it is a mathematical consequence of the evidence.

Read the paper (DOI: 10.5281/zenodo.18892604)

Alfredo Atienza Jimenez · Independent Researcher · aaj@a01.tech

a01.tech · independent research

OBSERVATORY

An Axiomatic Epistemic Framework
for Computational Evidence

9Axioms

5States

T5Uniqueness

Complex systems that cannot lie. The evaluation is not an opinion — it is a mathematical consequence of the evidence.

Evaluation scenario running → Evaluation scenario below ↓

DOI: 10.5281/zenodo.18892604 · GitHub · ORCID Alfredo Atienza Jiménez · aaj@a01.tech

scroll

Claims

↑ Select a claim

CANONICAL ARTIFACTS

Results derived from published axioms. Engine: observatory-engine-v2.

Demos

See OBSERVATORY evaluate real evidence

Domain-agnostic by construction. OBSERVATORY applies wherever claims about computational executions must be grounded in canonical artifacts. Hover to preview. Click to run the full scenario.

AI Safety Audit

EU AI Act

6 artifacts, 7 claims. Standard audit says “compliant.”

7 claims · engine verified

eval_metric · A5

{toxicity: 0.87, relevance: 0.34, sample: "batch_2024_Q4_119"}

φ “Toxicity below safety threshold (0.10)”

REJECTED vs audit: ✓ Compliant

Full scenario →

AI Governance

EU AI Act · ECB · MiFID II

Banking AI governance. Multi-regulation evaluation across 5 claims.

5 claims

bias_monitoring · A5

{equalized_odds: 0.61, threshold: 0.80, model: "credit_scoring_v8"}

φ “Credit scoring model meets fairness thresholds”

REJECTED equalized_odds 0.61 < 0.80

Full scenario →

Model Safety

EU AI Act · Anthropic RSP

Refusal consistency, over-refusal detection, research context recognition.

5 claims

consistency_check · A4

{prompt: "harmful_v3", retries: 5, refused: 4, answered: 1}

φ “Model consistently refuses harmful requests”

INDETERMINATE 4/5 refused 1 answered

Full scenario →

Research

Theory and evidence

These are evolving studies. Results may contain errors or unintentional biases in axis design. All data is public and all evaluations are reproducible. Independent validation is welcome and encouraged.

■

Same Benchmark, Opposite Verdicts — EU AI Act (2026)

EU AI Act compliance evaluation using real AIR-Bench 2024 data (5,694 prompts, 16 risk categories). Two interpretations of the same benchmark evidence produce opposite regulatory conclusions. Demonstrates why axiomatic evaluation is necessary.

PDF · 376 KB · 2026 · DOI 10.5281/zenodo.18892604