Technology

AI models unmask pseudonymous users at scale

researchers match Reddit and Hacker News accounts to real identities with high precision, cheap correlation turns old posts into a permanent identifier

Images

AI ends online anonymity: the ease of unmasking pseudonymous accounts AI ends online anonymity: the ease of unmasking pseudonymous accounts english.elpais.com

A research team tested how quickly large language models can pierce the “practical obscurity” that has long protected pseudonymous internet users. Using thousands of posts from platforms such as Reddit and Hacker News, they asked models including Gemini and ChatGPT to match anonymous accounts to real identities; the system identified 68% of users at 90% precision, according to the paper cited by El País.

The result is less about a new hacking trick than a change in the unit economics of identification. What previously required time, patience, and a motivated investigator can now be done in minutes with commodity tools, because language models are good at turning scattered clues—job hints, location references, hobby details, writing tics—into a ranked shortlist. The researchers’ own example relied on Hacker News profiles that could be linked to LinkedIn, but the broader point is that most people leak enough “soft identifiers” across years of posting for a model to connect the dots. Once the cost of correlation collapses, the threat is not only doxxing; it is behavior change under the assumption that someone, somewhere, can cheaply reconstruct a profile.

This intersects with how online platforms already operate. Advertising and fraud prevention reward account linkage, device fingerprinting, and cross-service identity resolution; moderation systems benefit when a ban can follow a person rather than a username. The user, meanwhile, bears the downside risk: a pseudonym that once felt safe for whistleblowing, workplace complaints, or political dissent becomes a thin interface layer over a long-lived writing sample.

El País notes that Anthropic has cited this dynamic in its objections to government use of AI for de-anonymization. In a statement to the US Department of Defense referenced in the article, the company argued that under current law the government can buy detailed data—movements, browsing, associations—from commercial sources without a warrant, and that powerful AI makes it possible to assemble “individually innocuous data” into a comprehensive picture of a person’s life at scale. Even without new surveillance authorities, models turn existing data markets into a more effective monitoring system.

The researchers say they limited their dataset for ethical reasons, but the same constraint does not apply to private investigators, employers, or state agencies with procurement budgets. In practice, the question becomes less “is anonymity possible” and more “how much anonymity can you afford,” in time, opsec discipline, and reduced participation.

The study’s headline number—68% identification at 90% precision—comes from posts that were already publicly available and written years before anyone asked a model to connect them.