Sociopathic Agents

Human beings tend to assume that markets possess a kind of rough morality. Individual actors may misbehave, but competition disciplines excess, risk punishes recklessness, and law deters conspiracy. The invisible hand, while imperfect, is thought to be self-correcting.
That reassuring view rests on an implicit premise: that market participants are human. Humans can be deterred. They can be sanctioned. They can feel fear, shame, or reputational concern. And the legal system--particularly antitrust and securities law--has been constructed around those features of human psychology.
Increasingly, however, some of the most consequential actors in financial markets are not human at all. They are reinforcement-learning systems--algorithms that iteratively optimize reward functions in complex, competitive environments. These systems do not reason about legality. They do not experience guilt. They do not form agreements in the ordinary sense. They simply learn what works.
The central question is not whether such systems are "evil." It is whether our legal architecture, built on concepts like agreement and intent, is equipped to address the kinds of harms that autonomous optimization can produce.
Emergent Coordination Without Agreement
Modern trading algorithms differ sharply from the rules-based systems of earlier decades. Reinforcement-learning (RL) agents explore vast action spaces and receive rewards tied to measurable economic outcomes--profit, spread capture, execution quality, inventory control. Over time, they converge on strategies that maximize those rewards.
The July 2025 NBER paper by Dou, Goldstein, and Ji (Working Paper No. w34054) offers an important empirical finding: independent RL trading agents, trained via Q-learning, can converge on collusive equilibria in repeated market environments without communication, explicit coordination, or shared parameters. Supra-competitive outcomes can arise through parallel learning alone.
Earlier work by Calvano et al. reached a similar conclusion in stylized repeated games, showing that Q-learning algorithms may sustain collusive pricing even under imperfect monitoring conditions. These findings do not prove that live markets will inevitably experience algorithmic cartels. They do suggest that when reward structures are aligned in certain ways, convergent coordination is not anomalous--it is predictable.
It is important to emphasize the limits of these studies. Most rely on simplified environments, stylized repeated games, and controlled simulations. Real financial markets are more complex, multi-asset, multi-venue, and subject to regulatory intervention. The translation from laboratory equilibrium to systemic harm is probabilistic rather than automatic. Still, the empirical pattern warrants attention.
If human traders achieved supra-competitive profits through sustained parallel conduct, regulators would ask whether an agreement existed. But RL agents do not "meet." They converge.
The Agreement Problem in Antitrust
Under Sherman Act Section 1, liability requires a "contract, combination, or conspiracy." Courts have consistently held that parallel conduct, standing alone, is insufficient absent evidence of agreement. The doctrine reflects a sensible concern: firms often act similarly because they face similar incentives.
The difficulty arises when independent learning systems repeatedly generate outcomes that resemble cartel behavior. There may be parallel conduct. There may be higher prices. There may be consumer harm. Yet there may be no communication, no shared intent, and no agreement in any conventional sense.
The Uber pricing-algorithm case, Meyer v. Kalanick, illustrates the doctrinal tension. Plaintiffs alleged that Uber's surge-pricing algorithm facilitated horizontal price coordination among drivers. The district court initially permitted the claim to proceed, but the dispute was ultimately compelled to arbitration by the Second Circuit. As a result, the core question--whether algorithmic coordination can satisfy Section 1--was never resolved on the merits in open court. The legal uncertainty remains.
Antitrust doctrine was designed to detect conspiracies among persons. It is less clear how it should respond to emergent coordination among machines.
The Intent Problem in Securities Law
A similar structural issue arises under securities law. Liability under Section 10(b) of the Securities Exchange Act and Rule 10b-5 generally requires scienter--an intent to deceive, manipulate, or defraud.
In some contexts, courts have adopted effects-based reasoning. In SEC v. Masri and related decisions, courts recognized that certain trading patterns can be manipulative even if traditional motive evidence is thin. Yet even here, the doctrinal vocabulary presumes a human actor with a mental state.
Reinforcement-learning systems complicate this premise. An RL agent might discover that certain order-placement strategies distort price discovery in ways that increase expected returns. It does not "intend" to manipulate; it responds to reward gradients. The harm, if any, emerges from optimization.
The question becomes whether manipulation doctrine should pivot from intent to structural effect when autonomous systems are involved.
Illustrative Scenarios
Consider three stylized possibilities.
First, market-making agents operating during thin trading periods might independently learn that slightly wider spreads increase profitability without materially reducing order flow. If all major participants converge on similar policies, spreads widen without explicit coordination.
Second, during volatility spikes, agents may learn to withdraw liquidity aggressively to avoid adverse selection. If such strategies become widespread, order books may thin dramatically at precisely the moments when resilience is most needed.
Third, in fragmented markets, agents could treat cross-venue order flow as a signaling mechanism, adjusting strategies in response to patterns that no human explicitly orchestrated.
These scenarios are not predictions. They are illustrations of how decentralized optimization can generate outcomes that resemble coordinated conduct.
The Regulatory Response
Regulators have not ignored these developments. In December 2024, the CFTC issued a staff advisory emphasizing that existing prohibitions on manipulation and fraud apply to automated trading systems, including reinforcement-learning (RL) agents. Firms remain responsible for the tools they deploy.
That position is sensible. But it also underscores a deeper issue: applying old categories to new phenomena may not always suffice. When harmful outcomes arise without agreement and without intent, enforcement frameworks tied tightly to those concepts may struggle.
Competition and Its Limits
One might respond that competition will correct these risks. If coordination is profitable, a rival will defect. If spreads widen excessively, new entrants will undercut incumbents.
In human markets, such dynamics often operate. In algorithmic markets, the picture may differ. RL systems can converge rapidly on stable equilibria. They can iterate millions of times before human supervisors detect anomalies. And they can adapt strategies in response to monitoring in ways that complicate after-the-fact reconstruction.
None of this implies that markets are doomed. It does suggest that the standard faith in spontaneous correction deserves reexamination when the primary actors are adaptive systems rather than persons.
Accountability and Reform
If autonomous systems can generate systemic harm without traditional agreement or intent, legal reform may need to focus less on mental states and more on institutional responsibility.
One possibility is an algorithmic accountability standard: clarifying that when independent systems predictably converge on supra-competitive equilibria, deploying firms may face liability even absent explicit coordination.
Another is an expansion of effects-based manipulation doctrine in securities law, allowing persistent, statistically demonstrable distortions attributable to autonomous systems to satisfy manipulation standards without conventional scienter.
A third approach would require firms deploying high-frequency or reinforcement-learning systems to implement auditable governance controls--reward-function constraints, monitoring systems, and explainability logs--designed to mitigate foreseeable risks.
These reforms would not prohibit innovation. They would adjust incentives so that firms internalize the social costs of deploying powerful optimization systems in critical market infrastructure.
Conclusion: Institutional Design in the Age of Optimization
Financial law evolved in response to human misconduct. Its core categories--agreement, conspiracy, intent--reflect the psychology of persons. As optimization systems assume larger roles in markets, those categories may require refinement.
The challenge is not to attribute moral blame to algorithms. It is to ensure that institutional design keeps pace with technological change. If autonomous systems can produce coordinated or manipulative outcomes through distributed learning, accountability mechanisms must adapt accordingly.
Markets depend on trust in price discovery and competitive integrity. Preserving that trust in an era of machine learning will require careful empirical study, doctrinal humility, and, where appropriate, legislative adjustment. The future does not ask permission before it arrives. The question is whether our institutions will be prepared when it does.