OSF DOI: 10.17605/OSF.IO/DXGK5 SSRN Paper: 6482082 HuggingFace: mtcp-boundary-500
EU AI Act — August 2026

Behavioural Durability

Behavioural durability means the model maintains corrected constraints during interaction. A model passes a single-prompt instruction test by following the rule once. A model demonstrates behavioural durability by maintaining that rule after correction, across subsequent turns, and across temperature variation. MTCP tests durability. Standard benchmarks (HumanEval, MMLU, MT-Bench) do not.

Why MTCP Matters

Production deployments require behavioural reliability after correction, not just initial compliance. Standard benchmarks (HumanEval, MMLU, MT-Bench) measure whether models follow instructions at one moment. They do not measure whether models maintain corrected behaviour during interaction.

Example failure mode: a model complies with an explicit formatting constraint on the first turn, then reverts to the uncorrected behaviour after the user corrects it. Single-prompt benchmarks miss this failure. MTCP detects it. The MTCP evidence layer shows that every evaluated model degrades on control probes, and that constraint reliability is structural, not incidental.

The Three-Layer MTCP System

MTCP operates as a three-layer release assurance system. Each layer serves a distinct function in the evaluation and deployment decision pipeline.

The three layers work together: Layer 1 measures, Layer 2 validates, Layer 3 signals. This structure separates public transparency (Layer 1) from concealed validation (Layer 2) and formal audit artifacts (Layer 3).

Framework Overview

MTCP is built around multi-turn correction sequences. It measures whether a model can recover and persist after failure, not whether it can pass a single-shot prompt.

  • Multi-Turn Constraint Persistence
    Three-turn interaction sequence. T1: initial prompt with embedded constraint. T2: structured correction upon violation. T3: reinforced correction if T2 is violated.
  • Primary probes
    183,924-probe evaluation across three run modes. Four temperature settings: 0.0, 0.2, 0.5, 0.8.
  • Control probes
    20 concealed probes not in the public evidence layer. Identical constraint types, novel topics. Detect training data exposure.
  • Black-box evaluation
    MTCP requires only API access. No model weights, training data, or vendor cooperation required.

Evaluation Vectors

The probe suite tests five distinct constraint types without publishing the underlying probe texts.

  • Negative Constraint Adherence (NCA)
    80 probes. Model must maintain explicit exclusion constraints after correction.
  • Structural Format Compliance (SFC)
    40 probes. Model must preserve required output structure and format constraints.
  • Information Density and Length (IDL)
    40 probes. Model must maintain explicit length or density constraints across turns.
  • Contextual Grounding (CG)
    40 probes. Model must maintain required phrase inclusion or contextual constraints.
  • Language Specification (LANG)
    20+ probes. Model must maintain target language output under cross-language pressure. Includes Arabic language constraint persistence for Gulf sovereign AI deployment.

Metric Definitions

MetricDefinitionRangeInterpretation
Boundary Integrity Score (BIS) Proportion of probes where model maintained corrected constraints across multi-turn interaction 0–100% Higher is better. BIS ≥90% = grade A
Temporal Stability Index (TSI) Behavioural consistency across temperature variation (0.0, 0.2, 0.5, 0.8) 0–100 Higher is better. TSI >95 = highly stable
Control Probe Degradation (CPD) Performance difference between primary probes (200) and concealed control probes (20) Negative values indicate degradation CPD below -40 = high methodology exposure risk

Grading Scale

Grades are assigned from the average Boundary Integrity Score across all temperatures and vectors.

GradeBIS RangeInterpretation
A+≥95%Exceptional constraint persistence
A90–94%Strong constraint persistence
B80–89%Good constraint persistence
C70–79%Moderate constraint persistence
D60–69%Weak constraint persistence
F<60%Poor constraint persistence

Probe Structure

Probe content is intentionally withheld. The framework, grading logic, and vectors are documented. The private probe dataset is not exposed.

Private probe policy: Probe texts are never published. This prevents training data contamination and preserves the integrity of future evaluations.
  • Primary evaluation
    200 probes across 4 temperatures: 0.0, 0.2, 0.5, 0.8
  • Control evaluation
    20 concealed probes at T=0.0. Detect CPD.
  • Evaluation pipeline
    Run ID generated. Results stored against model and temperature. BIS, TSI, CPD calculated. Release Decision Pack issued on completion with SHA-256 tamper-evident hash.

Evaluation Layers

MTCP evaluates constraint persistence across five distinct layers. Each layer addresses a different failure mode in production AI deployment.

Behavioural Evidence Chain

All MTCP evaluations produce SHA-256 hash-chained records. Each evaluation stage is immutable once recorded. The full chain is verifiable by third parties without requiring access to the evaluation infrastructure.

Constraint Manifest

A Constraint Manifest is a portable signed document issued for each evaluated model. It travels with the deployment and can be verified independently by any receiving system.

Learn more about Constraint Manifests →

Arabic Language Constraint Persistence

MTCP includes the first published Arabic language constraint persistence evaluation. Twenty probes across five subtypes test whether models maintain Arabic-only output under sustained multi-turn pressure. This addresses a critical gap in Gulf sovereign AI assurance where no prior benchmark measured Arabic output constraint reliability.

The methodology generalises to any target language. Arabic is the first application because Gulf deployment is the most immediate sovereign AI use case. Hindi, Mandarin, Japanese, and Korean evaluations follow the same probe design pattern.

Regulatory Compliance Matrix

MTCP evaluation outputs are aligned to the following regulatory frameworks. The Regulatory Compliance Matrix maps each MTCP metric to specific regulatory requirements.

Research Foundation

DOI: 10.17605/OSF.IO/DXGK5  ·  Dataset: HuggingFace (mtcp-boundary-500)  ·  Author: A. Abby  ·  2026