The Sovereign Measurement Problem
Sovereign AI programmes require evaluation independence. When a nation deploys AI infrastructure at scale, it cannot rely on vendor-provided assurance metrics. The evaluation must be independent, reproducible, and produce evidence that is verifiable without vendor cooperation.
- Independent evaluation
MTCP requires only API access. No vendor cooperation, model weights, or internal access required. Evaluation is entirely external and black-box.
- No vendor dependencies
Results are produced independently of model providers. The evaluation methodology is DOI-registered and publicly documented.
- Reproducible methodology
Any qualified third party can verify MTCP results. The framework, grading scale, and evaluation vectors are fully specified.
- Sovereign evidence chain
All evaluation records are SHA-256 hash-chained. The evidence chain is independently verifiable without requiring access to the MTCP platform.
Arabic and Gulf Deployment
MTCP provides the first published multi-language, multi-script constraint persistence evaluation. 12 languages across 4 script families, with specific focus on Arabic for Gulf sovereign AI infrastructure.
- 12 languages evaluated
Latin-script (French, German, Turkish, Malay), CJK (Mandarin, Japanese, Korean), Arabic-script (Arabic, Farsi, Urdu), and Tamil.
- 4 script families
Latin, CJK, Arabic-script, and Tamil. Script distance from English predicts constraint failure rate.
- NDMO alignment
Evaluation outputs mapped to Saudi National Data Management Office governance requirements for AI systems processing Arabic content.
- NCA alignment
Critical infrastructure AI assurance aligned to Saudi National Cybersecurity Authority requirements.
- Bedrock Gulf region
AWS Bedrock model stack evaluated for Gulf deployment context. Arabic constraint persistence data for regional infrastructure decisions.
Sovereign Deployment Standard
Minimum evidence requirements for deploying AI models within sovereign critical infrastructure. These thresholds represent the baseline below which deployment introduces unacceptable governance risk.
| Requirement | Minimum | Rationale |
| BIS Grade |
Grade A (90%+) |
Critical infrastructure cannot tolerate constraint degradation |
| CSAS Score |
Grade A |
Multi-system coordination must preserve constraints |
| Arabic LANG |
90%+ |
Arabic-language constraint persistence for Gulf deployment |
| ACPS |
Full evaluation |
Adversarial resistance mandatory for critical infrastructure |
| TDS Validity |
90-day window |
Temporal stability baseline required before deployment |
| Evidence Chain |
BECIS verified |
Full hash-chain integrity for audit purposes |
All sovereign deployments receive a Constraint Manifest — a portable signed document that travels with the model and can be independently verified by any receiving system.
The Constitutional Layer
Three new frameworks complete the constitutional foundation of sovereign AI governance. Before these frameworks, MTCP produced empirical scores. Now it produces formally grounded verdicts.
- Constraint Object Specification (Framework F31)
Defines the governed constraint as a formal object with identity, provenance, scope, validity conditions, inheritance rules, and expiry. Every evaluation in MTCP now links to a registered constraint object. Evaluations without a registered constraint object are not admissible as governance evidence.
- Legitimacy Resolution Protocol (Framework F32)
Specifies what makes authority over a constraint legitimate and cross-regime comparable. A jurisdiction resolution with an unverifiable authority source is not governance. It is assumption. LRP turns JRS from an empirical score into a formally grounded verdict.
- Governance Reference Conditions (Framework F33)
Defines what must hold before two systems are canonically comparable. Five conditions must all be satisfied: same constraint object, LRP satisfied for both, equivalent evaluation conditions, JRS above threshold for both, and intact BEC chains for both. Without GRC compatibility, CSAS degradation at a coordination boundary cannot be classified as a governance failure.
The Canonical Priority Theorem establishes that GRC compatibility is logically prior to coordination measurement. This makes the comparability problem prior to the coordination problem formally and operationally.
For sovereign AI deployment this means: before any two AI systems are permitted to coordinate in a national infrastructure context, their governance reference conditions must be confirmed compatible. MTCP now provides the formal mechanism to do this.
Request Sovereign Evaluation
Contact us for sovereign AI evaluation, NDMO/NCA compliance mapping, and Arabic language constraint persistence assessment.