The AI Training Data Disclosure Act

This white paper proposes federal legislation to establish transparency requirements for artificial intelligence training data. The full text of the proposed bill appears at the end of this document.

Artificial intelligence now shapes markets, policymaking, culture, and national security. Yet the foundation of modern AI - the datasets used to train advanced models - remains concealed from the public, regulators, and downstream users. This absence of transparency weakens accountability, frustrates oversight, and prevents informed evaluation of risks embedded in the systems that increasingly mediate essential public and private functions. The AI Training Data Disclosure Act responds to this gap by establishing a calibrated, legally durable framework for limited, structured disclosure of training data sources. The Act promotes innovation while grounding AI development in democratic accountability and established legal norms.

The Act imposes no requirement to publish proprietary data wholesale. Instead, it creates a duty to disclose the categories, representative samples, provenance summaries, and legality of data sources used to train large-scale AI models whose significant economic and social impact warrants heightened scrutiny. The Act balances transparency with competitive protection by allowing developers to meet disclosure obligations through synthetic, aggregated, or escrowed mechanisms that preserve trade secrets while enabling independent auditability.

The Emerging Legislative Landscape

Training data transparency has moved from academic debate to legislative action across multiple jurisdictions, establishing a clear policy direction that the AI Training Data Disclosure Act builds upon and refines.

California's AB 2013 became the first major U.S. training data disclosure law when it took effect on January 1, 2026. The Generative Artificial Intelligence Training Data Transparency Act requires developers of generative AI systems available in California to publish high-level summaries of their training data, including the sources or owners of datasets, the purposes and methods of data collection and processing, descriptions of data types and scale, whether personal or aggregate consumer information is used, whether datasets include copyrighted or patented materials, and the date ranges when datasets were employed. The law applies retroactively to models released on or after January 1, 2022, and while it provides narrow exceptions for security, aviation, and national defense applications, it contains no blanket exception for trade secrets. Major developers including OpenAI and Anthropic have already released training data summaries in compliance with the new requirements.

New York's Assembly Bill A6578, the Artificial Intelligence Training Data Transparency Act, passed the State Assembly in June 2025 and was referred to the Senate Rules Committee. Sponsored by Assemblymember Alex Bores, the bill would establish comparable disclosure requirements for AI developers operating in New York, signaling bipartisan momentum for transparency mandates across major technology markets.

The European Union's AI Act, which entered into force in August 2024 with phased implementation through 2027, establishes binding training data transparency obligations under Article 53. Providers of general-purpose AI models must draw up and maintain technical documentation including a sufficiently detailed summary of the content used for training. In July 2025, the European Commission's AI Office published a mandatory template for these disclosures, requiring developers to describe data sources, data types, collection methods, curation and filtering processes, and measures taken to identify and mitigate bias. The template reflects the EU's determination that transparency about training inputs is essential to evaluating model behavior, bias profiles, and downstream risks.

At the federal level, a new House bill introduced in January 2026 aims to establish national training data transparency standards, while the National Conference of State Legislatures reports that dozens of AI-related transparency measures were introduced across state legislatures in 2025 alone. This legislative proliferation reflects growing consensus that the public interest in understanding AI systems outweighs the default presumption of opacity.

Constitutional Guardrails and Legal Feasibility

The Act has been crafted with direct attention to constitutional and statutory boundaries around oversight, compelled disclosure, and protection of proprietary information. Recent litigation testing California's AB 2013 provides a useful lens for examining these constraints and confirming the Act's constitutional durability.

Trade Secret Protections and the Takings Clause

On December 29, 2025, xAI - the developer of the Grok chatbot - filed a federal lawsuit in the Central District of California challenging AB 2013. Represented by the Clement & Murphy law firm, xAI argues that the law compels disclosure of proprietary trade secrets in violation of the Fifth Amendment's Takings Clause and forces speech in violation of the First Amendment.

The trade secret challenge proceeds along two lines. First, xAI claims AB 2013 constitutes a "per se" taking - an outright government appropriation of property without compensation. This argument faces significant obstacles. Per se takings claims traditionally apply to government actions that assume control of tangible property or completely prevent an owner from using their property. No court has ever sustained a per se takings claim in the context of trade secrets, and the law requires only "high-level summaries" rather than detailed proprietary methods. As legal scholars at the Institute for Law & AI have observed, a company could disclose the general fact that its datasets are curated and filtered without revealing the specific heuristics or allocation methods that constitute genuine competitive advantages.

Second, xAI argues AB 2013 effects a "regulatory taking" under the Penn Central Transportation Co. v. New York City framework, which balances the economic damage to the property owner, the character of the government action and its public purpose, and whether the owner had reasonable investment-backed expectations that the property would not be regulated. Here too, the challenge faces headwinds. California possesses broad authority to regulate its marketplace by imposing conditions on companies that wish to participate in it. The state has long required disclosure of information about chemicals in cleaning products, cookware, menstrual products, and pesticides, as well as privacy policies and automatic renewal practices of digital services. Training data summaries represent a logical extension of this regulatory tradition to a new industry with significant public impact.

The investment-backed expectations prong may be xAI's weakest claim. The AI industry emerged into a regulatory vacuum, and developers could not reasonably assume that high-level training data summaries would remain perpetually protected from disclosure. The tradition xAI invokes is one of balancing commercial secrecy against public interest - not absolute immunity from transparency requirements. There is no long-standing legal tradition protecting high-level AI training data summaries specifically, because this is a novel category of information in a new industry where regulatory frameworks are still being established.

The AI Training Data Disclosure Act avoids these constitutional concerns by requiring characterization rather than publication of training data. Courts have consistently upheld regulatory schemes that compel descriptions or summaries of confidential information so long as the protected material itself remains undisclosed. Because the Act never requires raw datasets to be released, their secrecy and economic value are preserved. Escrow mechanisms allow accredited auditors to examine raw data where necessary under strict nondisclosure obligations, ensuring that independent verification remains possible without destroying proprietary value.

First Amendment Doctrine and Compelled Commercial Speech

The Act also complies with First Amendment doctrine governing compelled commercial speech. Under the framework established in Zauderer v. Office of Disciplinary Counsel (1985), government-mandated disclosures of factual commercial information are constitutional when they are reasonably related to a substantial state interest, purely factual and uncontroversial, and not unduly burdensome.

The Congressional Research Service has confirmed that disclosure requirements meeting these criteria receive deferential review under rational basis scrutiny rather than the strict scrutiny applied to content-based restrictions on speech. Food labels, securities registrations, environmental reports, and prescription drug disclosures have all survived constitutional challenge under this framework. The required training data disclosures are factual descriptions of data sources and collection methods, not ideological statements or editorial content. They are directly connected to substantial governmental interests in consumer protection, safety, marketplace integrity, and informed public debate about consequential technologies. And they impose no undue burden, requiring only high-level summaries that major developers have already demonstrated capacity to produce.

The structure and scope of the Act therefore fit comfortably within constitutionally accepted disclosure frameworks.

The Patent Analogy and Its Limits

Proponents of transparency often reference the patent system to illustrate how innovation can coexist with structured disclosure. While the analogy highlights a useful principle - that transparency and innovation need not be mutually exclusive - it has important limits that the Act explicitly acknowledges.

Patents protect novel inventions through a bargain: the inventor receives time-limited exclusive rights in exchange for publicly disclosing how to make and use the invention, enabling others to learn from and build upon it. Machine learning models, by contrast, are statistical representations produced from large datasets. The "knowledge" embedded in a model is not a discrete invention that can be specified in claims and taught to practitioners. Model capabilities emerge from complex interactions between architecture, training data, optimization processes, and fine-tuning - none of which maps cleanly onto patent disclosure requirements.

The Act does not treat AI models or datasets as patent-like property, nor does it impose disclosure obligations mirroring patent specifications. Instead, it draws a narrower insight: innovation ecosystems can thrive when transparency obligations are predictable, limited, and aligned with public-interest goals. The patent system demonstrates that disclosure need not destroy commercial value when properly calibrated.

Accordingly, the Act defines a concept tailored to AI: minimum viable disclosure. This standard requires only the information necessary for experts and regulators to assess provenance, legality, representativeness, systemic bias, and associated risks. It does not demand full corpus disclosure, reveal proprietary algorithms, or expose fine-tuning and alignment processes.

Developers may satisfy this standard through several bounded mechanisms. Representative datasets may illustrate the nature and categories of training data without exposing sensitive raw material. Synthetic summaries may capture statistical patterns and provenance characteristics while protecting proprietary value. And for higher-risk systems, raw data may be reviewed in a secure escrow environment accessible only to accredited auditors under strict confidentiality. These pathways reflect the limits of the patent analogy while preserving its core insight that structured disclosure strengthens accountability without undermining innovation.

Competitive Concerns and Economic Reality

Some critics warn that transparency regarding training data could help competitors reconstruct strategic advantages or imitate model performance. This concern deserves careful analysis, yet evidence consistently shows that the structured disclosures mandated by the Act do not threaten competitive viability.

Modern AI competitiveness arises chiefly from model architecture, training strategy, optimization techniques, reinforcement learning pipelines, compute scale, engineering talent, and integration with product ecosystems - not from datasets alone. Empirical studies and industry experience demonstrate that even when comparable or identical datasets are available, reproducing frontier models remains extremely difficult. Many foundation models trained on publicly available data such as Common Crawl, Wikipedia, and open-source code repositories have not been successfully replicated despite significant external effort, underscoring the limited competitive value of data disclosure at the level of abstraction required by the Act.

The compliance experience under California's AB 2013 reinforces this conclusion. OpenAI and Anthropic released training data summaries in January 2026 without apparent competitive harm. These disclosures describe general data categories and collection approaches without revealing the precise curation methods, quality filters, or training recipes that constitute genuine trade secrets. Competitors already infer general data categories from model outputs through techniques like membership inference and model extraction; the Act replaces uncertain inference with documented provenance, improving accountability without shifting competitive dynamics.

The Act further minimizes competitive risk by mandating only aggregated, synthetic, or escrowed disclosures. These formats do not enable replication of training regimens or proprietary alignment processes. Through narrowly tailored obligations and protective compliance options, the Act ensures that transparency does not function as forced technology transfer.

The Policy Rationale: Why Training Data Transparency Matters

The Act's benefits concentrate on three core outcomes, each grounded in substantial research and documented harms.

Safety and Reliability

Training data composition directly shapes model behavior, including failure modes, biases, and vulnerabilities. Research published in Nature Communications in January 2026 documented how open-ended prompting of large language models produces intersectional biases that reflect patterns in training corpora. Studies in the Journal of Intellectual Property Law & Practice have demonstrated that training data provenance affects copyright infringement risk and model reliability. Work published in MDPI's AI Ethics journal catalogs how biased training data produces discriminatory outputs in hiring, lending, criminal justice, and healthcare contexts.

Without transparency into inputs, meaningful risk assessment is impossible. Information about training data sources enables identification of data poisoning attacks, where adversarial content injected into training corpora causes models to behave unsafely under specific conditions. Training data metadata reveals data contamination risks, where evaluation benchmarks inadvertently appear in training sets, making published performance metrics unreliable. Date ranges of datasets help identify gaps in model capabilities and areas where responses may rely on outdated information. Understanding whether personal information or aggregate consumer data was used helps users evaluate legal and ethical implications of deploying particular models.

Democratic Oversight and Accountability

As AI systems influence increasingly consequential decisions - from content moderation and hiring to loan approvals and medical diagnoses - the public has a legitimate interest in ensuring their foundational data comply with legal and ethical standards. The European Commission's mandatory disclosure template reflects this principle: citizens and regulators cannot evaluate AI systems they cannot understand. Disclosure transforms opaque corporate practices into verifiable information accessible to regulators, researchers, journalists, and civil society.

California's legislative findings for AB 2013 specifically cite the public interest in identifying and mitigating potential risks associated with AI, helping consumers make educated choices among available options. The New York bill advances similar rationales. Disclosure enables democratic deliberation about which training practices are acceptable and which should be prohibited - a conversation impossible when training data remains entirely secret.

Market Trust and Regulatory Certainty

Clear provenance reduces uncertainty for businesses and institutions that rely on AI tools. Enterprise customers increasingly demand transparency about AI systems deployed in their operations, both to manage legal risk and to satisfy their own compliance obligations. Responsible developers gain credibility through disclosure, while those relying on dubious or unlawful data sources face appropriate scrutiny.

Predictable transparency requirements also reduce long-term regulatory uncertainty. The current patchwork of state laws and international standards creates compliance complexity that benefits no one. A clear, constitutionally grounded federal framework would establish uniform expectations, enabling developers to invest in compliance infrastructure with confidence that the rules will not shift unpredictably.

International Convergence and Competitive Positioning

The global trajectory favors training data transparency. The European Union has moved first with binding requirements under the AI Act, and the July 2025 disclosure template establishes a detailed standard that developers serving EU markets must meet. The EU approach covers data sources, data types, collection and curation methods, bias mitigation measures, and copyright compliance - broadly consistent with the approach taken by California and contemplated in New York and federal proposals.

Rather than positioning American developers at a disadvantage, a well-designed federal disclosure framework could establish the United States as a leader in responsible AI governance. Harmonizing domestic requirements with international standards reduces compliance fragmentation and enables American companies to satisfy multiple jurisdictions through unified disclosure practices. Conversely, continued regulatory uncertainty risks driving AI development toward jurisdictions with clearer rules, potentially undermining American competitiveness in the industries that will shape the coming decades.

Proposed Legislation: AI Training Data Disclosure Act

119th CONGRESS, 2d Session

H.R. __

To amend title 15, United States Code, to require transparency in the training data used to develop covered artificial intelligence systems, and for other purposes.

IN THE HOUSE OF REPRESENTATIVES

M. __ introduced the following bill; which was referred to the Committee on Energy and Commerce

A BILL

To amend title 15, United States Code, to require transparency in the training data used to develop covered artificial intelligence systems, and for other purposes.

Be it enacted by the Senate and House of Representatives of the United States of America in Congress assembled,

SECTION 1. SHORT TITLE.

This Act may be cited as the "AI Training Data Disclosure Act of 2026".

SEC. 2. FINDINGS AND PURPOSES.

(a) FINDINGS.--Congress finds the following:

(1) Artificial intelligence systems trained on large datasets increasingly mediate consequential decisions affecting commerce, employment, credit, healthcare, and public safety.

(2) The training data used to develop artificial intelligence systems directly shapes model behavior, including outputs, failure modes, biases, and vulnerabilities.

(3) Absent transparency regarding training data provenance, regulators, researchers, and the public cannot meaningfully assess the safety, reliability, or lawfulness of artificial intelligence systems.

(4) Multiple States have enacted or are considering legislation requiring disclosure of training data information, creating compliance complexity for developers operating across jurisdictions.

(5) International trading partners, including the European Union, have established binding training data transparency requirements for artificial intelligence systems offered in their markets.

(6) A uniform Federal standard for training data disclosure would reduce regulatory fragmentation, promote innovation, and establish the United States as a leader in responsible artificial intelligence governance.

(7) Structured disclosure of training data categories and provenance can be achieved without requiring publication of proprietary datasets, algorithms, or trade secrets.

(b) PURPOSES.--The purposes of this Act are--

(1) to establish uniform Federal requirements for disclosure of training data information by developers of covered artificial intelligence systems;

(2) to promote public trust in artificial intelligence systems through transparency regarding their foundational data;

(3) to enable regulators, researchers, and civil society to assess artificial intelligence systems for safety, bias, and compliance with applicable law;

(4) to protect legitimate trade secrets and proprietary information while achieving transparency objectives; and

(5) to preempt inconsistent State requirements and reduce compliance burdens on developers operating in interstate commerce.

SEC. 3. DEFINITIONS.

In this Act:

(1) COMMISSION.--The term "Commission" means the Federal Trade Commission.

(2) COVERED ARTIFICIAL INTELLIGENCE SYSTEM.--The term "covered artificial intelligence system" means a machine-based system that--

(A) is designed to operate with varying levels of autonomy;

(B) may exhibit adaptiveness after deployment;

(C) infers from input it receives how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments; and

(D) meets one or more of the following criteria:

(i) The system was trained using computational resources exceeding 10^26 floating point operations.

(ii) The system was trained using computational resources exceeding 10^23 floating point operations and uses primarily biological sequence data.

(iii) The system is made available to the public or deployed in a commercial product or service that has more than 1,000,000 monthly active users in the United States.

(iv) The system is used to make, or to substantially inform, consequential decisions, as defined in paragraph (3).

(3) CONSEQUENTIAL DECISION.--The term "consequential decision" means a decision or judgment that has a legal, material, or similarly significant effect on an individual's--

(A) access to or terms of credit, insurance, or financial services, as described in section 603(d) of the Fair Credit Reporting Act (15 U.S.C. 1681a(d));

(B) employment or employment opportunities, including hiring, termination, compensation, and performance evaluation;

(D) access to housing or real property;

(E) access to healthcare services or health insurance;

(F) access to essential utilities, including electricity, water, and telecommunications;

(G) criminal justice outcomes, including pretrial detention, sentencing, and parole; or

(H) access to government benefits or services.

(4) DEVELOPER.--The term "developer" means a person that designs, codes, produces, or substantially modifies a covered artificial intelligence system, including through fine-tuning or continued training.

(5) PERSONAL INFORMATION.--The term "personal information" has the meaning given the term in section 2(a) of the American Data Privacy and Protection Act, as introduced in the 117th Congress, or if such Act has not been enacted, means information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular individual.

(6) SYNTHETIC DATA.--The term "synthetic data" means data that has been artificially generated rather than collected from real-world events, including data generated through statistical modeling, simulation, or artificial intelligence techniques.

(7) TRADE SECRET.--The term "trade secret" has the meaning given such term in section 1839 of title 18, United States Code.

(8) TRAINING DATA.--The term "training data" means the data, datasets, or corpora used to train, fine-tune, or otherwise develop the parameters or capabilities of a covered artificial intelligence system, including data used for pre-training, supervised fine-tuning, reinforcement learning, and alignment processes.

SEC. 4. TRAINING DATA DISCLOSURE REQUIREMENTS.

(a) IN GENERAL.--Not later than 180 days after the date of enactment of this Act, and thereafter before making a covered artificial intelligence system available to the public or deploying such system in a commercial product or service in the United States, a developer shall publish a training data disclosure in accordance with subsection (b).

(b) CONTENTS OF DISCLOSURE.--A training data disclosure required under subsection (a) shall include the following information:

(1) DATA SOURCE CATEGORIES.--A description of the categories of sources from which training data was obtained, including--

(A) whether training data was obtained from publicly available sources, proprietary sources, licensed sources, user-generated content, government records, or other identifiable categories;

(B) the general nature of content included in each category, such as text, images, audio, video, code, or structured data; and

(C) for each category, an estimate of the proportion of total training data attributable to that category, expressed as a percentage range.

(2) DATA COLLECTION METHODS.--A description of the methods used to collect or acquire training data, including--

(A) whether data was collected through web scraping, application programming interfaces, licensing agreements, user submissions, synthetic generation, or other methods; and

(B) the general time period during which data collection occurred.

(3) PERSONAL INFORMATION.--A statement indicating--

(A) whether the training data includes personal information;

(B) if personal information is included, the categories of personal information included and the sources from which such information was obtained;

(C) whether individuals whose personal information is included in training data were provided notice of such inclusion; and

(D) the measures taken to protect personal information, including de-identification, anonymization, or access controls.

(4) COPYRIGHTED MATERIAL.--A statement indicating--

(A) whether the training data includes material protected by copyright under title 17, United States Code;

(B) if copyrighted material is included, the general categories of such material; and

(C) the legal basis relied upon for the inclusion of copyrighted material, including any applicable licenses, permissions, or limitations and exceptions to copyright.

(5) DATA CURATION AND FILTERING.--A description of the processes used to curate, filter, or modify training data, including--

(A) methods used to assess or improve data quality;

(B) methods used to identify and remove illegal content, including child sexual abuse material, content that violates the intellectual property rights of third parties, and content that could facilitate the commission of crimes;

(D) methods used to identify and address risks associated with training data, including data poisoning and benchmark contamination.

(6) DATA TEMPORAL SCOPE.--The date ranges during which training data was collected or generated, and the most recent date of data included in the training corpus.

(7) UPDATE PROCEDURES.--A description of procedures for updating the disclosure when training data is materially modified, including through continued training or fine-tuning.

(1) MACHINE-READABLE FORMAT.--A training data disclosure required under this section shall be published in a machine-readable format specified by the Commission.

(2) PUBLIC AVAILABILITY.--A training data disclosure shall be made publicly available--

(A) on the developer's website in a prominent and easily accessible location; and

(B) through a centralized registry maintained by the Commission pursuant to section 6.

(3) PLAIN LANGUAGE.--A training data disclosure shall be written in plain language reasonably calculated to be understood by members of the public.

(d) UPDATES.--A developer shall update its training data disclosure not later than 90 days after any material change to the training data used for a covered artificial intelligence system, including changes resulting from continued training, fine-tuning, or the addition of new data sources.

SEC. 5. ALTERNATIVE COMPLIANCE MECHANISMS.

(a) TRADE SECRET PROTECTION.--Nothing in this Act shall be construed to require a developer to disclose--

(1) the specific contents of training datasets;

(2) proprietary algorithms, model architectures, or training methodologies;

(3) specific data curation heuristics, quality filters, or weighting schemes; or

(4) any other information that constitutes a trade secret, provided that the developer satisfies the disclosure requirements of section 4 through the alternative mechanisms described in this section.

(b) AGGREGATED DISCLOSURES.--A developer may satisfy the requirements of section 4 through aggregated disclosures that describe categories of training data at a level of abstraction sufficient to inform the public and regulators without revealing specific proprietary details.

(c) SYNTHETIC SUMMARIES.--A developer may satisfy the requirements of section 4(b)(1) through (3) by publishing synthetic summaries that capture the statistical properties and provenance characteristics of training data without disclosing the underlying data.

(d) ESCROW MECHANISM.--

(1) IN GENERAL.--For information that a developer determines cannot be disclosed publicly without revealing trade secrets, the developer may satisfy the requirements of section 4 by depositing such information with a qualified escrow agent under the conditions specified in this subsection.

(2) QUALIFIED ESCROW AGENT.--The Commission shall establish criteria for qualification as an escrow agent under this subsection, which shall include--

(A) demonstrated technical capacity to securely store and protect confidential information;

(B) independence from the developer and from competitors of the developer;

(D) agreement to be bound by confidentiality obligations and to permit access only as provided in paragraph (4).

(3) ESCROW DEPOSIT.--Information deposited under this subsection shall include--

(A) a complete and accurate description of training data sources;

(B) representative samples of training data, as specified by the Commission; and

(4) ACCESS TO ESCROWED INFORMATION.--Information deposited under this subsection shall be accessible only to--

(A) the Commission, for purposes of enforcement of this Act;

(B) other Federal agencies, upon a demonstration of need for a legitimate regulatory purpose and subject to appropriate confidentiality protections;

(C) State attorneys general, for purposes of investigating or enforcing State consumer protection laws, subject to appropriate confidentiality protections; and

(D) qualified researchers accredited by the Commission pursuant to section 6(d), subject to nondisclosure agreements and data security requirements established by the Commission.

(5) CONFIDENTIALITY.--Any person who obtains access to escrowed information under paragraph (4) shall--

(A) maintain the confidentiality of such information;

(B) use such information only for the purposes authorized under this subsection;

(D) comply with data security requirements established by the Commission.

(6) PENALTIES FOR UNAUTHORIZED DISCLOSURE.--Any person who knowingly discloses escrowed information in violation of this subsection shall be subject to--

(A) civil penalties under section 7; and

(B) criminal penalties under section 1832 of title 18, United States Code, if such disclosure constitutes theft of trade secrets.

(e) SAFE HARBOR.--A developer that makes a good faith effort to comply with the requirements of this Act and promptly corrects any identified deficiencies shall not be subject to penalties under section 7 for minor or technical violations.

SEC. 6. COMMISSION AUTHORITY AND ADMINISTRATION.

(a) RULEMAKING.--

(1) IN GENERAL.--Not later than 270 days after the date of enactment of this Act, the Commission shall promulgate regulations to implement this Act, including--

(A) specifying the format and technical requirements for training data disclosures;

(B) establishing the centralized registry required under subsection (b);

(D) establishing procedures for accreditation of researchers under subsection (d);

(E) providing guidance on compliance with the alternative mechanisms under section 5; and

(F) establishing procedures for enforcement under section 7.

(2) CONSULTATION.--In promulgating regulations under this subsection, the Commission shall consult with--

(A) the National Institute of Standards and Technology;

(B) the Office of Science and Technology Policy;

(D) the Copyright Office; and

(E) other Federal agencies with relevant expertise.

(3) PUBLIC COMMENT.--The Commission shall provide an opportunity for public comment on proposed regulations under this subsection for a period of not less than 60 days.

(b) CENTRALIZED REGISTRY.--

(1) ESTABLISHMENT.--The Commission shall establish and maintain a publicly accessible, searchable registry of training data disclosures submitted under this Act.

(2) CONTENTS.--The registry shall include--

(A) all training data disclosures submitted by developers;

(B) metadata sufficient to enable searching and comparison of disclosures;

(D) educational resources to assist the public in understanding training data disclosures.

(1) model disclosure templates;

(2) examples of compliant disclosures;

(3) frequently asked questions; and

(4) best practices for protecting trade secrets while achieving transparency objectives.

(d) RESEARCHER ACCREDITATION.--

(1) PROGRAM ESTABLISHMENT.--The Commission shall establish a program to accredit researchers for access to escrowed information under section 5(d)(4)(D).

(2) CRITERIA.--Accreditation criteria shall include--

(A) demonstrated expertise in artificial intelligence, data science, or related fields;

(B) affiliation with a research institution, academic institution, or civil society organization;

(D) submission of a research proposal demonstrating a legitimate research purpose; and

(E) absence of conflicts of interest with the developer whose information is sought.

(e) COORDINATION WITH STATE LAW.--

(1) PREEMPTION.--This Act shall supersede any State law, rule, or regulation to the extent that such law, rule, or regulation--

(A) requires disclosure of training data information; and

(B) is inconsistent with the requirements of this Act.

(2) SAVINGS CLAUSE.--Nothing in this Act shall be construed to preempt--

(A) State laws of general applicability, including consumer protection laws, privacy laws, and unfair competition laws, to the extent such laws do not specifically regulate training data disclosure;

(B) State laws providing greater protection for personal information than required under this Act; or

(f) INTERNATIONAL COORDINATION.--The Commission shall coordinate with international counterparts to promote harmonization of training data disclosure requirements and mutual recognition of compliance frameworks.

SEC. 7. ENFORCEMENT.

(a) UNFAIR OR DECEPTIVE ACTS OR PRACTICES.--A violation of this Act or any regulation promulgated under this Act shall be treated as a violation of a rule defining an unfair or deceptive act or practice prescribed under section 18(a)(1)(B) of the Federal Trade Commission Act (15 U.S.C. 57a(a)(1)(B)).

(b) POWERS OF THE COMMISSION.--

(1) IN GENERAL.--The Commission shall enforce this Act in the same manner, by the same means, and with the same jurisdiction, powers, and duties as though all applicable terms and provisions of the Federal Trade Commission Act (15 U.S.C. 41 et seq.) were incorporated into and made a part of this Act.

(2) PRIVILEGES AND IMMUNITIES.--Any person who violates this Act shall be subject to the penalties and entitled to the privileges and immunities provided in the Federal Trade Commission Act.

(1) IN GENERAL.--In addition to any penalty applicable under subsection (a), the Commission may impose a civil penalty on any developer that violates this Act in an amount not to exceed--

(A) $50,000 per day for each day the violation continues; or

(B) for knowing violations, the greater of--

(i) $100,000 per day for each day the violation continues; or

(ii) 4 percent of the developer's total worldwide annual revenue in the preceding fiscal year.

(2) FACTORS.--In determining the amount of a civil penalty under this subsection, the Commission shall consider--

(A) the nature and seriousness of the violation;

(B) the number of individuals affected;

(D) the developer's history of prior violations;

(E) the developer's financial condition;

(F) the developer's good faith efforts to comply; and

(G) such other factors as the Commission considers appropriate.

(d) STATE ATTORNEY GENERAL ENFORCEMENT.--

(1) IN GENERAL.--The attorney general of a State may bring a civil action in the name of the State in an appropriate district court of the United States to--

(A) enjoin a violation of this Act;

(B) enforce compliance with this Act; or

(2) NOTICE.--The attorney general of a State shall provide prior written notice of any action under paragraph (1) to the Commission and shall provide the Commission with a copy of the complaint in the action.

(3) INTERVENTION.--Upon receiving notice under paragraph (2), the Commission may--

(A) intervene in the action;

(B) upon intervening, be heard on all matters arising in the action; and

(4) LIMITATION.--No action may be brought under this subsection if the Commission has instituted a civil action for the same violation.

(e) NO PRIVATE RIGHT OF ACTION.--Nothing in this Act shall be construed to create a private right of action.

SEC. 8. RELATIONSHIP TO OTHER LAWS.

(a) TRADE SECRET PROTECTION.--Nothing in this Act shall be construed to diminish or alter--

(1) the protections afforded to trade secrets under the Defend Trade Secrets Act of 2016 (18 U.S.C. 1836 et seq.);

(2) the protections afforded to trade secrets under State law; or

(3) the ability of a developer to seek injunctive relief or damages for misappropriation of trade secrets.

(b) COPYRIGHT.--Nothing in this Act shall be construed to--

(1) expand or limit the rights of copyright owners under title 17, United States Code;

(2) expand or limit the limitations and exceptions to copyright under title 17, United States Code, including fair use under section 107 of such title;

(3) create any presumption regarding the legality or illegality of using copyrighted material to train artificial intelligence systems; or

(4) require a developer to disclose information that would reveal infringement of copyright by the developer or any third party.

(1) authorize the collection, use, or disclosure of personal information except as required for compliance with this Act;

(2) diminish or alter protections afforded to personal information under Federal or State privacy laws; or

(3) require disclosure of personal information contained in training data.

(d) NATIONAL SECURITY.--

(1) EXEMPTION.--This Act shall not apply to a covered artificial intelligence system that is--

(A) developed by or for an element of the intelligence community, as defined in section 3 of the National Security Act of 1947 (50 U.S.C. 3003);

(B) developed by or for the Department of Defense for national security purposes;

(D) subject to classification under Executive Order 13526 (75 Fed. Reg. 707) or any successor order.

(2) DETERMINATION.--The Director of National Intelligence may determine that a covered artificial intelligence system is exempt from this Act if disclosure of training data information would damage national security.

SEC. 9. REPORTS.

(a) ANNUAL REPORT.--Not later than 1 year after the date of enactment of this Act, and annually thereafter, the Commission shall submit to the Committee on Energy and Commerce of the House of Representatives and the Committee on Commerce, Science, and Transportation of the Senate a report on the implementation of this Act, including--

(1) the number of training data disclosures received;

(2) the number and nature of enforcement actions taken;

(3) an assessment of industry compliance;

(4) recommendations for legislative or regulatory improvements; and

(5) an assessment of international developments in training data transparency.

(b) GAO STUDY.--Not later than 3 years after the date of enactment of this Act, the Comptroller General of the United States shall conduct a study and submit to Congress a report evaluating--

(1) the effectiveness of this Act in achieving its purposes;

(2) the economic impact of this Act on the artificial intelligence industry;

(3) the impact of this Act on innovation in artificial intelligence;

(4) the adequacy of enforcement mechanisms; and

(5) recommendations for improvements to this Act.

SEC. 10. SEVERABILITY.

If any provision of this Act, or the application of such provision to any person or circumstance, is held to be unconstitutional, the remainder of this Act, and the application of the remaining provisions to any person or circumstance, shall not be affected thereby.

SEC. 11. AUTHORIZATION OF APPROPRIATIONS.

There are authorized to be appropriated to the Commission $25,000,000 for each of fiscal years 2027 through 2031 to carry out this Act.

SEC. 12. EFFECTIVE DATE.

(a) IN GENERAL.--Except as provided in subsection (b), this Act shall take effect on the date that is 180 days after the date of enactment of this Act.

(b) EXISTING SYSTEMS.--With respect to a covered artificial intelligence system that was made available to the public or deployed in a commercial product or service before the effective date under subsection (a), the requirements of section 4 shall apply beginning on the date that is 1 year after the date of enactment of this Act.

Sources

Legislation and Regulatory Materials

California Assembly Bill 2013, Generative Artificial Intelligence: Training Data Transparency Act (effective January 1, 2026). Available at: https://regulations.ai/regulations/RAI-US-CA-CA2GAXX-2024

New York Assembly Bill A6578, Artificial Intelligence Training Data Transparency Act (passed Assembly June 10, 2025). Available at: https://www.nysenate.gov/legislation/bills/2025/A6578/amendment/A

European Union AI Act, Article 53: Obligations for Providers of General-Purpose AI Models. Available at: https://artificialintelligenceact.eu/article/53/

European Commission AI Office, Mandatory Template for Public Disclosure of AI Training Data (July 24, 2025). Available at: https://www.wilmerhale.com/en/insights/blogs/wilmerhale-privacy-and-cybersecurity-law/european-commission-releases-mandatory-template-for-public-disclosure-of-ai-training-data

Legal Analysis and Case Law

xAI v. California, Case No. 2:25-cv-10821 (C.D. Cal. filed December 29, 2025).

Hattingh, Julius. "xAI's Trade Secrets Challenge and the Future of AI Transparency." Institute for Law & AI (January 2026). Available at: https://law-ai.org/xais-trade-secrets-challenge-and-the-future-of-ai-transparency/

Goodwin Procter LLP. "California's AB 2013 Takes Effect: Navigating AI Training Data Transparency and Trade Secret Risk" (January 16, 2026). Available at: https://www.goodwinlaw.com/en/insights/publications/2026/01/alerts-otherindustries-californias-ab-2013-takes-effect

Sokhansanj, Bahrad. "xAI's Challenge to California's AI Training Data Transparency Law (AB2013)." Institute for Law & AI (January 2026). Available at: https://law-ai.org/xais-challenge-to-californias-ai-training-data-transparency-law-ab2013/

Zauderer v. Office of Disciplinary Counsel, 471 U.S. 626 (1985).

Penn Central Transportation Co. v. New York City, 438 U.S. 104 (1978).

Ruckelshaus v. Monsanto Co., 467 U.S. 986 (1984).

Brannon, Valerie C. "Assessing Commercial Disclosure Requirements under the First Amendment." Congressional Research Service Report R45700 (April 23, 2019). Available at: https://www.congress.gov/crs-product/R45700

Academic Research on Training Data and Bias

Nature Communications. "Intersectional biases in narratives produced by open-ended prompting of generative language models" (January 8, 2026). Available at: https://www.nature.com/articles/s41467-025-68004-9

MDPI AI Ethics. "Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies." Available at: https://www.mdpi.com/2413-4155/6/1/3

Nature Humanities and Social Sciences Communications. "Ethics and discrimination in artificial intelligence-enabled recruitment practices" (2023). Available at: https://www.nature.com/articles/s41599-023-02079-x

Villasenor, John. "Artificial Intelligence, Trade Secrecy, and the Challenge of Transparency." North Carolina Journal of Law & Technology, Vol. 25, Issue 3 (2024). Available at: https://scholarship.law.unc.edu/cgi/viewcontent.cgi?article=1484&context=ncjolt

Hrdy, Camilla A. "Trade Secrecy Meets Generative AI." Chicago-Kent Law Review, Vol. 100, Issue 1 (2025). Available at: https://scholarship.kentlaw.iit.edu/cklawreview/vol100/iss1/14

Industry and Policy Analysis

Davis+Gilbert LLP. "AI Legal Updates: California's AI Training Data Transparency Law Takes Effect" (January 27, 2026). Available at: https://www.mondaq.com/unitedstates/new-technology/1736574/ai-legal-updates-californias-ai-training-data-transparency-law-takes-effect

Gunderson Dettmer. "2026 AI Laws Update: Key Regulations and Practical Guidance" (February 4, 2026). Available at: https://www.gunder.com/en/news-insights/insights/2026-ai-laws-update-key-regulations-and-practical-guidance

Wilson Sonsini. "2026 Year in Preview: AI Regulatory Developments for Companies to Watch Out For" (January 13, 2026). Available at: https://www.wsgr.com/en/insights/2026-year-in-preview-ai-regulatory-developments-for-companies-to-watch-out-for.html

National Conference of State Legislatures. "Summary of Artificial Intelligence 2025 Legislation." Available at: https://www.ncsl.org/technology-and-communication/artificial-intelligence-2025-legislation

Bird & Bird. "Taking the EU AI Act to Practice: Decoding the GPAI Code of Practice and the Training Data Summary Template" (September 2025). Available at: https://www.twobirds.com/en/insights/2025/taking-the-eu-ai-act-to-practice-decoding-the-gpai-code-of-practice-and-the-training-data-summary-template