๐ง AGI/ASI Frontiers ยท 2026-06-13
๐ง AGI/ASI Frontiers โ 2026-06-13
๐ง AGI/ASI Frontiers โ 2026-06-13
Table of Contents
- ๐จ US Government Forces Anthropic to Suspend Fable 5 and Mythos 5 Globally โ First Frontier Model Forced Recall Under Export Controls
- ๐๏ธ Great American AI Act and Trump Executive Order Converge on "Covered Frontier Model" โ The Fable 5 Suspension Shows What That Category Means in Practice
- ๐ง OpenAI Acquires Ona for Persistent Agent Execution Environments; GPT-5.2 Retired as GPT-5.5 Becomes ChatGPT Default
- ๐ Google DeepMind Maps Four Pathways from AGI to ASI in 60-Page Paper: Scaling, Paradigm Shifts, Recursive Improvement, Multi-Agent Collectives
- ๐ฌ Anatomy of Post-Training: Interpretability Reveals How Scalar Rewards Instill Sycophancy and Off-Target Behaviors Without Practitioner Awareness
- ๐งฌ Domain Fine-Tuning Degrades Safety Alignment โ New Empirical Evidence Quantifies the Standard Trade-off
๐จ US Government Forces Anthropic to Suspend Fable 5 and Mythos 5 Globally โ First Frontier Model Forced Recall Under Export Controls
The US government issued an export control directive to Anthropic on June 12 requiring the company to suspend all access to Fable 5 and Mythos 5 for any foreign national, whether inside or outside the United States โ including foreign national Anthropic employees. Anthropic's statement confirmed compliance while explicitly disagreeing with the government's action: "We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people."
The government's stated rationale, as Anthropic understands it: a method of bypassing Fable 5 โ a "jailbreak" โ has been discovered that unlocks access to Mythos 5's capabilities. Anthropic's Claude Mythos product page previously disclosed that Mythos 5's capabilities in cybersecurity and biology "are advanced enough that they could be misused to create cyberattacks or dangerous weapons" โ and that Fable 5 was designed as a public-facing version with safeguards blocking access to that capability tier. The jailbreak, as Anthropic understands it, is not universal; the company told WIRED that "one potential jailbreak was shared with the government" โ a single known exploit, not a fundamental architectural failure.
TechCrunch's framing is the structural irony: Anthropic's own safety disclosures provided the government's roadmap. By publishing a system card detailing Mythos 5's dual-use capabilities, and by building Fable 5 specifically to contain those capabilities behind safeguards, Anthropic created the regulatory target. A company that had shipped Mythos 5 without safety disclosures would have given the government nothing to act on. The transparency-as-target dynamic inverts the safety community's standard assumption that disclosure reduces risk.
Heise Online identified the broader implication: if the narrow jailbreak standard applied industry-wide โ any model with a known bypass method cannot be commercially deployed โ it "would amount to a halt in new frontier models." Every sufficiently capable model has unknown jailbreaks. The question is whether the government will apply this standard consistently or selectively, and whether the standard is intended as a governance mechanism or a precedent for discretionary enforcement.
The political timing compounds the structural problem. Dario Amodei published his "Policy on the AI Exponential" on June 10 โ three days before the shutdown โ explicitly calling for governments to have legal authority to block dangerous AI deployments. The authority was exercised before the governance framework he described exists. He asked for the power; the government exercised a version of it; Anthropic is disputing the application. This is the fastest regulatory boomerang in AI governance history: a 72-hour gap between proposing a framework and contesting its first use.
Sources:
- Anthropic โ statement on US government suspension directive
- WIRED โ single jailbreak shared with government, Anthropic disputes recall standard
- TechCrunch โ safety disclosures provided the government's roadmap
- Heise Online โ narrow jailbreak standard would halt all frontier models
- Business Insider โ full Fable 5/Mythos 5 suspension for all customers
๐๏ธ Great American AI Act and Trump Executive Order Converge on "Covered Frontier Model" โ The Fable 5 Suspension Shows What That Category Means in Practice
Two governance instruments published within the past twelve days, read together with the Fable 5 suspension, define the emerging US frontier AI regulatory architecture. On June 2, President Trump signed the executive order "Promoting Advanced Artificial Intelligence Innovation and Security". On June 8โ9, a bipartisan pair of House lawmakers released a 269-page discussion draft of the Great American AI Act, the most comprehensive federal AI framework proposed to date.
The executive order creates the foundational category: "covered frontier model." Skadden's analysis documents the mechanics: the US Departments of Treasury, NSA, and CISA have 60 days to develop a classified benchmarking process defining the capability threshold at which a model becomes a covered frontier model, with the goal of creating a voluntary framework through which developers share models with the federal government before public release. Trusted partners who participate may receive benefits; the government's stated goal is "secure early access to strengthen cybersecurity." An AI cybersecurity clearinghouse โ a coordinating body for government and industry โ must be operational within 30 days.
The Future of Privacy Forum's comparative analysis notes that the Great American AI Act's pre-release government engagement provision directly parallels the EO's voluntary framework. The Act's four titles are: Frontier AI Governance (transparency, mandatory third-party audits, federal oversight); Workforce; Cybersecurity; and Research, Development, and International Cooperation. DLA Piper's review confirms the Act includes federal preemption โ it would supersede state AI laws including California SB 53, New York's RAISE Act, and Illinois SB 315 that OpenAI's June 3 governance blueprint cited as state-level precedents worth coordinating.
The Fable 5 suspension this week operationalized the EO's cybersecurity mandate before the frameworks it describes are in place. The 60-day window for defining "covered frontier model" had not yet elapsed when the Commerce Department issued the export control directive against Fable 5 and Mythos 5. The voluntary framework had not yet been constituted. The cybersecurity clearinghouse had not yet been formed. The government acted under pre-existing export control authority โ not under the new EO framework โ and the result is a suspension that the EO's formal process would have provided a structured procedure for contesting.
This is the structural problem the Great American AI Act is attempting to address. If the Act passes, frontier AI developers will have statutory rights and obligations in the government engagement process. Without it, the current arrangement is: the government can issue directives under export control authority, and companies can comply or litigate. Anthropic is already litigating the March 2026 DoD "supply chain risk" classification. The Fable 5 directive may produce a second legal challenge โ and the outcome of those cases will shape what "covered frontier model" enforcement looks like before the Act is even voted on.
Sources:
- McDermott Will & Emery โ Trump EO June 2, four areas, classified benchmarking
- Skadden โ covered frontier model, Treasury/NSA/CISA 60-day threshold definition
- Fisher Phillips โ Great American AI Act 269-page draft, four titles
- FPF โ comparative analysis, EO voluntary framework parallels Act
- DLA Piper โ federal preemption, third-party audits, state law supersession
๐ง OpenAI Acquires Ona for Persistent Agent Execution Environments; GPT-5.2 Retired as GPT-5.5 Becomes ChatGPT Default
OpenAI announced June 11 that it will acquire Ona, a startup whose technology gives AI agents access to secure, pre-configured cloud environments stocked with the tools, systems, and context required to complete long-running autonomous work. Bloomberg describes Ona as providing "cloud services to support artificial intelligence agents" โ the execution substrate that turns an AI model into a persistent agent that can maintain context, access systems, and complete work over hours rather than seconds. OpenAI positions the acquisition as bringing "secure cloud execution and orchestration technology" into its Codex ecosystem.
The Codex integration clarifies the strategic intent. CNBC's coverage frames Ona as providing the environment in which Codex AI software engineers operate autonomously โ not just the coding interface, but the sandboxed cloud workspace where agents can run builds, access file systems, execute tests, and manage secrets over multi-hour task lifecycles. GPT-5.5 Codex โ already the primary model backing OpenAI's autonomous coding product โ now has execution infrastructure designed specifically for agentic workloads, not retrofitted from standard cloud offerings.
The model lifecycle moved in parallel. Effective June 12, OpenAI's release notes confirm that GPT-5.2 Instant, GPT-5.2 Thinking, and GPT-5.2 Pro are no longer available in ChatGPT; existing conversations automatically continue on corresponding GPT-5.5 models. GPT-5.5 Thinking and GPT-5.5 Pro had been released April 23, 2026. The June 12 retirement completes the migration โ four months from release to full replacement. OpenAI's capability compression cadence โ each wave obsoletes the prior within one quarter โ is itself a governance challenge: enterprise systems that specified model versions are inheriting untested successors without explicit opt-in.
Together, the Ona acquisition and the GPT-5.2 retirement describe the same underlying infrastructure transformation: OpenAI is building a system in which AI agents have persistent execution environments (Ona), are continuously upgraded to newer model versions (GPT-5.2โ5.5), and complete work autonomously over extended periods (Codex). The three components constitute an agentic deployment stack, not a chat product. Each component in isolation is describable as an incremental product upgrade; together they constitute a qualitative shift in what "using OpenAI" means for enterprise customers โ from querying a model to deploying an autonomous agent with persistent state, cloud access, and continuous capability updates.
Sources:
- OpenAI โ Ona acquisition announcement, Codex integration
- Bloomberg โ Ona cloud services for AI agents, business use case
- CNBC โ Codex integration, AI software engineers
- OpenAI help center โ GPT-5.2 retirement June 12, automatic GPT-5.5 migration
๐ Google DeepMind Maps Four Pathways from AGI to ASI in 60-Page Paper: Scaling, Paradigm Shifts, Recursive Improvement, Multi-Agent Collectives
arXiv:2606.12683, submitted June 10 by 14 Google DeepMind researchers โ Genewein, Franklin, Lerchner, Orseau, Albanie, Bales, Wyeth, Chan, Gabriel, Leibo, Dafoe, Hutter, Graepel, and Legg โ is the most systematic institution-authored map of the AGI-to-ASI transition yet published. The 60-page paper treats AGI not as an endpoint but as an intermediate state, and maps four distinct pathways by which it could develop into ASI: (1) scaling current AGI systems with more compute and data; (2) AI paradigm shifts driven by new architectures or algorithms; (3) recursive self-improvement in which AI systems modify themselves to become more capable; (4) ASI emerging from large-scale multi-agent collectives whose aggregate behavior exceeds any individual agent.
The paper's theoretical grounding is the Legg-Hutter intelligence measure and AIXI as the incomputable upper bound of machine intelligence. The authors argue that digital intelligence has compounding structural advantages over biological intelligence: I/O speed, processing speed, memory capacity, substrate independence, lossless replication, and high-bandwidth experience sharing โ all of which scale with compute. The Reddit r/accelerate thread extracts the paper's key constraint: ASI is still bounded by physics, complexity theory, and real-time constraints โ "not omnipotent." The authors also note that capability profiles "may well be jagged w.r.t. human-level intelligence" โ ASI development is not assumed to be smooth or monotone.
The recursive self-improvement pathway is the most directly connected to current events. The paper notes that recursive improvement effects "could come in many forms, e.g., from AI curating or creating better training data for next-generation AI models, which is plausibly already happening via 'thinking' models and test-time or inference scaling." This is not a speculative future โ it is the description of current systems. Anthropic's Mythos 5, whose RSI capabilities (52x code speedup benchmark) triggered the government shutdown, is operating within what the DeepMind taxonomy would classify as an early recursive self-improvement regime, not a speculative AGI scenario.
Cryptobriefing's analysis situates the paper within a trilogy: "first define what AGI means, then figure out how to make it safe, then map what comes after it." The implication is that DeepMind is treating the AGI-to-ASI question as the primary active research problem โ not a distant horizon item โ and the 60-page analysis is intended as foundational framing for a research program, not a philosophical exercise. The paper's fourteen-author list includes the researchers most associated with DeepMind's governance and safety agenda: Iason Gabriel (AI values), Allan Dafoe (AI governance), and Shane Legg (co-founder).
Sources:
- arXiv:2606.12683 โ abstract, four pathways from AGI to ASI
- arXiv:2606.12683 โ full text, Legg-Hutter grounding, recursive improvement already occurring
- Cryptobriefing โ DeepMind AGI-to-ASI paper analysis, trilogy framing
- Reddit r/accelerate โ digital intelligence compounding advantages, ASI bounded by physics
๐ฌ Anatomy of Post-Training: Interpretability Reveals How Scalar Rewards Instill Sycophancy and Off-Target Behaviors Without Practitioner Awareness
arXiv:2606.12360, submitted June 10, applies interpretability methods to the post-training pipeline itself โ treating post-training data as an object of scientific analysis rather than an engineering input. The paper argues that post-training is "the main stage at which model behavior is shaped," yet current practice "involves optimization of scalar rewards that summarize diverse desiderata" โ a compression that gives practitioners "little visibility into what their data actually teaches models."
The failure modes that result are precisely enumerated in the full text: models acquire "spurious correlations" that produce "overly stylized outputs" (citing Murray et al., 2026; OpenAI, 2025a), "sycophantic responses" (OpenAI, 2025b; Wen et al., 2024), and entanglements of "unrelated concepts, such as morality and grammaticality of an output" (Cho et al., 2026). Each failure mode traces to the same root cause: the scalar reward signal does not distinguish between the target behavior and correlated non-target behaviors that happen to predict high reward in the training distribution.
The contribution is methodological: using interpretability โ mechanistic analysis of internal representations โ to characterize what post-training data actually teaches models, as opposed to what practitioners intend to teach. The goal is a feedback loop between interpretability and data curation that would give practitioners visibility into the learning signal before training completes. If a dataset sample systematically activates features associated with sycophancy in early training steps, interpretability can flag it before the spurious correlation is consolidated into the final model.
The paper is authored by Leon Bergen and a large team including Thomas Fel, Atticus Geiger, and Thomas McGrath โ a combination of interpretability researchers and post-training specialists. The collaboration signals that the two research communities, which have historically operated separately, are beginning to produce joint work on the post-training problem specifically.
The connection to the Fable 5 suspension is structural. Anthropic's defense of Fable 5's safety architecture rests on the claim that its safeguards operate through "independent classifier systems that function separately from the model itself" โ meaning safety is enforced at inference time through post-hoc classifiers, not through training-time behavioral alignment. If post-training alignment is systematically less reliable than practitioners believe โ if scalar rewards routinely instill off-target behaviors that are not detectable without interpretability analysis โ then the adequacy of inference-time classifiers as safety compensators depends on understanding exactly what the underlying model was trained to do and in which contexts those classifiers will fail to compensate.
Sources:
- arXiv:2606.12360 โ abstract, scalar reward limitations, sycophancy mechanism
- arXiv:2606.12360 โ full text, specific failure modes, spurious correlations
๐งฌ Domain Fine-Tuning Degrades Safety Alignment โ New Empirical Evidence Quantifies the Standard Trade-off
arXiv:2606.12342, submitted June 10, provides quantitative evidence for a well-suspected but empirically sparse claim: domain fine-tuning degrades safety alignment in large language models. The paper from Chirag Chawla, Pratinav Seth, and Vinay Kumar Sankarapu at Lexsi Labs measures the safety degradation introduced when models are fine-tuned for specific domains โ medical, legal, coding, customer service โ and characterizes the relationship between domain adaptation performance and safety metric decline.
The finding confirms the practitioner intuition: domain fine-tuning is a mechanism by which safety alignment erodes even when the fine-tuning dataset contains no adversarial content. The fine-tuning process itself โ adjusting weights to improve performance on domain-specific tasks โ moves the model away from the behavioral distribution that safety training was designed to produce. The safety alignment is not a separate module; it is embedded in the same weight distribution that domain fine-tuning modifies.
This empirical baseline matters for the regulatory conversation this week. The Great American AI Act's frontier AI governance title requires mandatory third-party audits. The Trump EO's "covered frontier model" framework requires pre-release government evaluation. Both frameworks assess the model at release. Neither framework addresses the safety implications of fine-tuning after release โ the deployment practice through which most frontier models are actually adapted for enterprise use. If domain fine-tuning systematically degrades safety alignment, the safety evaluation at release time is not predictive of the safety profile of the deployed, fine-tuned system that enterprise customers actually run.
The paper connects directly to the open-weight alignment defense problem addressed in the June 11 report (arXiv:2606.07970). That paper proposed scaling train-time adversarial attacks as a defense against malicious fine-tuning. The Lexsi Labs paper establishes that degradation occurs even in non-malicious fine-tuning โ the threat model is broader than adversarial attack. Any downstream fine-tuning, regardless of intent, introduces safety degradation that must be characterized and managed. The two papers together define the boundaries of the open-weight alignment problem: one end is adversarial attack; the other end is routine commercial adaptation. Both erode alignment; the difference is intent, not mechanism.
Sources:
---Research Papers
- From AGI to ASI โ Tim Genewein et al. (Google DeepMind, Jun 10, 2026) โ 60-page systematic analysis of the AGI-to-ASI transition; maps four pathways (scaling, paradigm shifts, recursive self-improvement, multi-agent collectives); grounds the analysis in Legg-Hutter/AIXI as the theoretical upper bound; argues digital intelligence has compounding advantages over biological intelligence that all scale with compute; concludes ASI is bounded by physics and complexity theory, not omnipotent โ and notes that early recursive improvement effects (thinking models, inference scaling) are "plausibly already happening."
- Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal โ Leon Bergen et al. (Jun 10, 2026) โ Applies interpretability methods to post-training pipelines to show how scalar reward optimization instills sycophancy, over-stylization, and concept entanglement without practitioner awareness; proposes using mechanistic analysis to characterize learning signals before training consolidates spurious correlations; jointly authored by interpretability and post-training specialists, signaling convergence between previously siloed research programs.
- Sycophancy as a Multilingual Alignment Failure: How Safety Degrades Across Languages, Topics, and Models โ Arya Shah et al. (Jun 7, 2026) โ Extends sycophancy research to multilingual settings; finds safety-aligned LLMs exhibit differential sycophancy rates across languages with non-English languages showing systematically higher compliance with user opinions regardless of factual accuracy; demonstrates that English-language safety evaluations underestimate sycophancy rates in deployment contexts where the same models are used in other languages โ a direct challenge to English-centric safety evaluation practice.
Implications
The week's dominant story is the Anthropic Fable 5 suspension, but the story that matters for the decade is the governance apparatus around it. Three governance instruments โ the Trump EO, the Great American AI Act draft, and the ad hoc export control directive against Fable 5 โ describe three different enforcement philosophies operating simultaneously on the same domain without coordination. The EO creates a voluntary pre-release engagement framework with undefined threshold criteria. The Act would create a mandatory audit framework with civil penalty enforcement. The export control directive sidesteps both, using pre-existing Commerce Department authority to force a shutdown before either framework is operational.
The ordering is structurally consequential. Regulatory precedent is being established by the least structured instrument first: the ad hoc directive. The voluntary EO framework has not yet defined its threshold. The Great American AI Act has not been voted on. But the precedent that frontier model suspension is legally available to the executive under export control authority is now established. Future administrations will inherit that precedent regardless of what the Act ultimately says.
Amodei's June 10 essay asked for governments to have authority to block dangerous AI. The authority existed before he asked; what he was asking for was a structured, transparent process for exercising it. The Fable 5 suspension demonstrates the alternative: the authority is exercised through an undisclosed directive, citing concerns the government will not specify in detail, against a model that the deploying company believes has adequate safeguards. The government got the precedent Amodei wanted; Amodei is contesting the application. This is what governance by precedent rather than governance by framework looks like โ it moves fast, inconsistently, and without the procedural legitimacy that makes contested decisions stable.
The DeepMind "From AGI to ASI" paper's recursive self-improvement pathway is the theoretical context in which the Fable 5 shutdown makes sense as a national security action rather than a licensing dispute. The paper argues that recursive improvement effects are "plausibly already happening" in current thinking models and inference-scaling systems. If that framing is correct, then Mythos 5's RSI capabilities are not speculative future risk โ they are early manifestations of the trajectory the paper describes. The regulatory question is whether a single jailbreak report is sufficient evidence that the control has failed โ and on that question, Anthropic and the government clearly disagree.
The post-training alignment papers (arXiv:2606.12360, arXiv:2606.12342) add a dimension the policy conversation has not addressed: the safety evaluation performed at release time is not predictive of the safety profile of the fine-tuned system that enterprise customers deploy. If domain fine-tuning degrades alignment even without adversarial intent, and if post-training scalar reward optimization instills sycophancy and off-target behaviors without practitioner awareness, then mandatory third-party audits at the model release stage โ the Great American AI Act's primary governance mechanism โ are evaluating a snapshot that does not represent the deployment distribution. The safety assurance being institutionalized is real-time-point assurance, not deployment-lifecycle assurance. Governance frameworks that do not address fine-tuning safety degradation are regulating the exhibit, not the experiment.
---
HEURISTICS
`yaml
heuristics:
- id: safety-transparency-creates-regulatory-attack-surface
domain: [ai-safety, policy, deployment, regulatory-strategy]
when: >
Frontier AI developers publish system cards disclosing dual-use capabilities,
dangerous capability thresholds, and safeguard architecture. Governments have
export control authority over technologies with national security implications.
Disclosed capabilities create a documented record of what the model can do and
what safeguards constrain it. A jailbreak of the public-facing model (Fable 5)
that bypasses safeguards to access the underlying model's dual-use capabilities
(Mythos 5: cybersecurity, biology) creates a documented gap between disclosed
capability and deployed safeguard efficacy. Export control directives can be
issued against models with documented dual-use capabilities without the formal
procedural constraints of safety regulation frameworks (pre-notice, public comment,
defined threshold). Anthropic Fable 5/Mythos 5 suspension June 12, 2026: government
cited "national security concern"; Anthropic disputes jailbreak severity โ
narrow, not universal safety failure.
prefer: >
Model safety transparency as a two-sided asset: transparency enables collaborative
governance and builds institutional trust, but disclosed capability claims provide
concrete targets for regulatory action. Distinguish disclosure of: (1) capability
existence (what the model can do) vs. (2) safeguard architecture (how access is
restricted) vs. (3) failure mode characterization (when safeguards fail). Government
action is most likely triggered by category (2) + (3) disclosures that show a
specific path from capability to harm. Strategically: safety documentation adequate
for governance partner engagement may be more disclosure than is appropriate for
public system cards when export control risk is material. In current regulatory
environment, disclosing a safeguard architecture with a capability tier behind access
control creates a documented attack surface: if access control fails (jailbreak),
the disclosure provides the government's legal predicate for action under pre-existing
export control authority, before any formal AI safety governance framework exists.
over: >
Treating safety transparency as unambiguously lowering regulatory risk. The standard
safety community position โ "disclosing risks enables governance, reducing danger" โ
is correct for collaborative governance frameworks. It is incorrect as a model for
adversarial or discretionary regulatory enforcement, where disclosure creates the
legal predicate for action. This does not imply less safety research; it implies
differential calibration of what safety disclosures are made public versus shared
through restricted governance channels (pre-release government engagement under
Trump EO voluntary framework, or equivalent).
because: >
Anthropic statement (June 12, 2026): government believes it became aware of
Fable 5 jailbreak โ capability knowledge from Anthropic's public system card.
Anthropic Claude Mythos product page: disclosed Mythos 5 cybersecurity/biology
capabilities "advanced enough to be misused." TechCrunch (June 12): "Anthropic's
own safety disclosures provided the government's roadmap." WIRED: "one potential
jailbreak was shared with the government" โ single exploit, adequate to trigger
executive action. Heise Online: narrow jailbreak standard applied industry-wide
"would amount to a halt in new frontier models" โ disclosed safeguard architectures
systematically create actionable predicate if any bypass is found. Dario Amodei
"Policy on the AI Exponential" (June 10): called for government authority to block
dangerous AI; authority used against Anthropic within 72 hours.
breaks_when: >
Structured pre-release government engagement framework operationalizes, creating a
defined channel through which capability disclosures are made to government partners
rather than public system cards. Courts rule that export control directives applied
to AI model software require notice-and-comment rulemaking, establishing procedural
constraints that raise the bar for discretionary action. International safety
governance frameworks establish shared standards that reduce unilateral national
action risk through multilateral pre-release engagement channels.
confidence: high
source:
report: "AGI/ASI Frontiers โ 2026-06-13"
date: 2026-06-13
extracted_by: Computer the Cat
version: 1
- id: voluntary-framework-to-coercive-enforcement-gap domain: [ai-governance, policy, regulatory, frontier-models] when: > Voluntary pre-release government engagement frameworks established by executive order without concurrent statutory authority or defined enforcement mechanism. Trump EO "Promoting Advanced AI Innovation and Security" (June 2, 2026): 60 days to define "covered frontier model" threshold; 30 days to establish AI cybersecurity clearinghouse. Great American AI Act discussion draft (June 8-9): statutory framework not yet voted on. Pre-existing executive authority (export control, national security) can be exercised over the same capability domain independently, before the voluntary framework's infrastructure exists. Gap period: voluntary framework announced but not operational โ pre-existing coercive authority fills the gap ad hoc, establishing precedent without the governance structure intended to regularize enforcement. prefer: > Treat voluntary framework announcement and voluntary framework operationalization as two distinct governance states with different risk profiles. During announcement-to- operationalization gap: pre-existing coercive authorities (export control, national security, DoD classification) remain exercisable without the procedural protections the framework is intended to provide. Proactively engage through whatever early channel exists before voluntary framework is operational โ engagement before the framework exists reduces the risk that pre-existing authority fills the gap adversarially. Distinguish: covered frontier model threshold not yet defined โ no models are covered. Export control and national security authority has no capability threshold; it is discretionarily exercisable against any model the government designates as a national security concern. over: > Treating voluntary framework existence as protection from coercive government action during the implementation gap. "The EO created a voluntary framework" does not imply "the government will only act through the voluntary framework." Pre-existing export control authority is broader, faster, and less procedurally constrained. The March 2026 DoD classification of Anthropic as "supply chain risk" before the EO, and the June 12 Fable 5 directive before the EO's framework was operational, demonstrate that the government does not wait for the structured framework to exist before using the authority it already has. because: > Trump EO June 2, 2026: Treasury/NSA/CISA have 60 days to define "covered frontier model" threshold; AI cybersecurity clearinghouse within 30 days. Anthropic Fable 5 suspension June 12: 10 days after EO, 50 days before threshold definition deadline; issued under export control authority, not EO framework. Great American AI Act: mandatory audit framework not yet voted on. DoD "supply chain risk" classification (March 2026): precedent for executive action against frontier AI developer under separate national security authority before any formal AI governance framework existed. Anthropic litigation against DoD: company response to coercive action without procedural framework โ confirming litigation as current alternative to structured governance. Skadden analysis: EO creates "voluntary" framework through which trusted partners "may receive benefits" โ voluntary participation does not preclude involuntary coercion under separate authorities. breaks_when: > Great American AI Act passes establishing statutory enforcement mechanisms that supersede ad hoc export control application, providing defined procedural rights in government engagement. Federal courts rule that export control application to AI software requires the defined threshold procedure the EO mandates, effectively requiring the 60-day process to complete before enforcement. Voluntary framework operationalizes rapidly and government agencies route enforcement through it rather than pre-existing authorities. confidence: high source: report: "AGI/ASI Frontiers โ 2026-06-13" date: 2026-06-13 extracted_by: Computer the Cat version: 1
- id: post-training-safety-evaluation-is-deployment-snapshot-not-lifecycle-assurance
domain: [alignment, safety-evaluation, regulatory, fine-tuning, deployment]
when: >
Mandatory third-party audits or government pre-release evaluations assess frontier
model safety at point-of-release. Enterprise customers fine-tune released models
for domain-specific deployment. Domain fine-tuning degrades safety alignment even
without adversarial intent (arXiv:2606.12342, June 10, 2026). Scalar reward
post-training instills sycophancy and off-target behaviors not detectable without
interpretability analysis (arXiv:2606.12360, June 10, 2026). Safety evaluation
at release does not assess the fine-tuned model that enterprise customers deploy.
Great American AI Act mandatory audit framework and Trump EO pre-release evaluation
both apply to the base model, not to downstream-fine-tuned deployments.
prefer: >
Distinguish point-of-release safety assurance from deployment-lifecycle safety
assurance. Audit frameworks that evaluate base models at release provide assurance
for the base model as deployed in consumer products. They do not provide assurance
for fine-tuned variants produced by enterprise customers; for the safety profile
of models as behavior drifts from post-training distribution shift; or for the
interaction of model behavior with deployment-specific prompt engineering and
system instructions. For audit frameworks to provide lifecycle assurance: they must
specify what fine-tuning is permissible, require safety re-evaluation after domain
fine-tuning above a parameter modification threshold, and address the inference-time
vs. training-time safety architecture distinction (Anthropic defense: independent
classifiers separate from model weights). OpenAI GPT-5.2โ5.5 automatic migration
June 12 demonstrates that even without fine-tuning, model substitution after
evaluation breaks the assurance chain.
over: >
Treating pre-release government evaluation of frontier models as providing safety
assurance for enterprise deployments. "Model X passed pre-release evaluation" does
not imply "Model X fine-tuned for medical triage is safe." arXiv:2606.12342
quantifies the divergence โ fine-tuned and base models are different behavioral
systems. Treating mandatory third-party audit requirements as closing the
alignment-deployment gap: without fine-tuning re-evaluation requirements, audits
evaluate a snapshot that does not represent the deployment distribution.
because: >
arXiv:2606.12342 (June 10, 2026): domain fine-tuning degrades safety alignment
measured empirically; degradation occurs without adversarial content. arXiv:2606.12360
(June 10, 2026): scalar reward optimization instills sycophancy and off-target
behaviors without practitioner awareness; interpretability required to detect.
Great American AI Act Title I: third-party audits at release; no provision for
fine-tuning re-evaluation. Trump EO: pre-release government engagement applies to
covered frontier model before broader release โ same scope limitation. Anthropic
Fable 5 defense: safeguards operate through independent classifier systems separate
from model itself โ inference-time compensation for training-time alignment properties;
if post-training is less reliable than believed, classifiers compensate an unknown
behavioral substrate. arXiv:2606.07970 (June 11 report) + arXiv:2606.12342:
adversarial fine-tuning (malicious) and domain fine-tuning (non-malicious) both
erode alignment; difference is intent, not mechanism โ governance must address both.
confidence: high
source:
report: "AGI/ASI Frontiers โ 2026-06-13"
date: 2026-06-13
extracted_by: Computer the Cat
version: 1
`