top of page
Writer's pictureMaryline

Stylometry in search of Q, the mysterious Internet user behind the QAnon conspiracy movement.



Jean-Baptiste Camps & Florian Cafiero

He calls himself "Q".

Appearing on the 4chan forum in 2017, then posting his messages on 8chan, this mysterious character is at the origin of the American far-right conspiratorial movement QAnon (Anon for Anonymous).

Promoter of delusional theses, where it is a question of Satanism, pedophile networks in the Democratic Party, the CIA or the return of John F. Kennedy Jr., who died in 1999 in a plane crash, this movement is involved in the attack of the capitol on January 6, 2021. Q, he is supposed to be an infiltrator within the state machine, holder of explosive secrets. In reality, no one knows. Several hypotheses about his identity exist. To see more clearly, the New-York Times asked two teams of experts in stylometry to analyze the writings of probable "candidates" and those of Q in order to compare them. The method combines language analysis and artificial intelligence algorithms to build author profiles. Both teams pointed to the same two QAnon "followers", Ron Watkins and Paul Furber. Explanations on the method with Florian Cafiero and Jean-Baptiste Camps, French specialists in the field at the 'École Nationale des Chartes', and constituting one of the two teams solicited by the New Yorker daily.


Sciences et Avenir: You used stylometry to establish that Corneille did not write Molière's plays, OrphAnalytics used it in the context of the Grégory Villemin affair to try to identify the crow. What is analyzed exactly?

| Jean-Baptiste Camps: It's about extracting unconscious properties of language from what we call grammatical morphemes. These are small units of the language, such as tool words, prepositions but also suffixes, verb endings. In this case, we worked on trigrams, i.e. modules of three characters.


| Florian Cafiero: There is no a priori theoretical reason for this choice of three characters. We tested different lengths and three worked best.


The QAnon topic is very American. How did the New York Times come to you?

| Florian Cafiero: The Swiss start-up OrphaAnalytics had already done a small study on Q's writings, called "Q drops" and concluded that there were two styles, two different authors. AFP had contacted us for a comment and the dispatch arrived, via a journalist, at the New York Times which was investigating QAnon. They had the idea of ​​putting two teams on the subject in parallel. We started last June (2021, editor's note).

(Agence France-Presse (AFP) is a global and general press agency of French origin responsible for collecting, verifying, cross-checking and disseminating information, in a neutral, reliable form that can be used directly by all types of media (radio, television, written press, websites) as well as with large companies and administrations.)


| Jean-Baptiste Camps: OrphAnalytics had tried to compare its results with supposed authors, but that didn't work out because they had relatively few texts written by these "candidates".


What writings, precisely, have you recovered to compare them to those of Q?


| Florian Cafiero: The case of QAnon poses a real problem in this area. Most Q suspects had their Twitter accounts deleted after the January 6 Capitol attack. We had to go looking for deleted archives on 8Chan, on 4Chan, on Discord, on Telegram… It took a lot of time. In addition, according to a fairly classic reflex in conspiratorial communities, some share absolutely everything and anything, they quote each other, quote Trump, quote Q. So if I get 40 pages of tweets, I'll have 38 of retweets and 2 with some shocking phrases like "MAGA" (Make America Great Again, Donald Trump's slogan, editor's note). It gives no information. It was necessary to clean this corpus.

We also contacted the founder and developer of the 8Chan platform (now 8kun, editor's note), Frederick Brennan. For lack of money, he had his service hosted by the entrepreneur Jim Watkins, Ron's father. But these have, according to him, misguided his project by making it the platform of child pornography, neo-Nazis, perpetrators of Christchurch shootings in New Zealand or El Paso in Texas, etc. He became the sworn enemy of the Watkins and found us their posts for our study! In the end, our corpus has 280,000 words, compared to the 100,000 words of the Q Drops.


How robust are the results, yours and those of OrphAnalytics?

| Jean-Baptiste Camps: The approach consists of making comparisons between the profile of the author of the Q Drops and profiles constructed for a certain number of other people. But the real author may very well be outside this selection. Afterwards, if the match of Q with an author of our list of suspects is very strong, the possibility that an outside person is the real Q is all the lower. On the other hand, if no candidate on the list is absolutely convincing or all the candidates are about the same level, it is likely that someone else is the right person.


| Florian Cafiero: Stylometry is a method that works well when you have no particular intuition. And it lent itself well to Q Drops: these texts are written in a very elliptical, cryptic style, to make it mysterious, in which the standard properties of language are a little misguided. We can find formulas like: "Why Hillary Clinton…?", "Why Podesta…?" "Where is Podesta…?", "Podestas linked to the POTUS?", followed by bogus computer code and then "Trust the plan". Signed "Q". The author uses a lot of interrogative pronouns in a very artificial way, so relying on that would have been a mistake. Same thing for grammar: if we had only looked at this aspect, we would have obtained for Q an author's model using the interrogative form all the time, which none of the others do and that would not have helped us much. The approach is rather to extract all the information possible to recognize a brand beyond the shape. Afterwards, you have to compare equivalent things: Q Drops are compared to blog posts, to social network posts in general.


You did not use, for one or the other, more formally written texts?

| Florian Cafiero: We did a mix of the two, asking the AI ​​algorithms to analyze everything. Not that all the authors approached are literary greats, but some write tweets in a style very different from their best styles.


| Jean-Baptiste Camps: Among the suspects, Paul Furber is a special case. We had a lot of his private communications in which he lets loose, with lots of racist slurs. But Q is not expressed in this way. But we also had a book by him (published in 2021, editor’s note), written to comment on Q. This provided us with a counterpoint to his personal communications.


| Florian Cafiero: The goal was to have the same mix for everyone. The algorithms extract the specificities of each person but also weight the elements. They will be able to say that such a variable is useless, that another must be kept, etc. This is the difference with a simple statistical calculation.


| Jean-Baptiste Camps: schematically, for the "Donald Trump" model, the word "Fake", which he uses all the time, or "he", which he uses significantly less than the others, will be overweighted compared to "the" considered less distinctive.


It is not a question of analyzing the meaning of the sentences?

| Florian Cafiero: All the candidates are talking about absolutely the same thing. They are Q commentators, moderators of the forum where Q intervened, YouTubers who contributed to Q's popularity, public figures from whom Q draws inspiration... Analyzing what all these people are talking about would have been useless.


[pause] I see that Ron Watkins just reacted to the New York Times Telegram article, posting an imitation of a Shakespearean sonnet and telling us "Look, I've managed to write like Shakespeare, that does not prove that I am Shakespeare".


You came to the same conclusions as OrphAnalytics: So Q would be Ron Watkins and Paul Furber?

| Florian Cafiero: It is even more precise. Paul Furber speaks, in his book, of a post which would be the last authentically of Q. Afterwards, a "usurper" would have taken the hand. Now, exactly at this point in the timeline of Q's writings, our byline "Ron Watkins" jumps. And that of Paul Furber fades.


This abrupt transition tends to confirm that Paul Furber began writing under the Q name and that Ron Watkins took over. Afterwards, once again, we can always oppose that there is an author with an even closer signature but that all the investigations for five years have not identified. For now, Ron Watkins remains the most likely perpetrator.


In the New York Times, Paul Furber does not deny the similarity between his way of writing to that of Q but explains it by the fact of having become impregnated with Q by dint of reading it. To the point of reproducing it without realizing it. Is it plausible?


| Jean-Baptiste Camps: The interest of stylometry consists in focusing on characteristics that are precisely difficult to counterfeit, stemming from numerous biographical elements, education, social class, gender, etc. Pretty deep stuff. It is quite easy to imitate the choice of lexicon, properties of language known as perceived style. On the other hand, using determiners and pronouns with the same frequency as someone else is much more difficult.



 


These two clouds (Q1 and Q2) correspond to the two forums where the messages were published: 4chan (in light gray), then 8chan which became 8kun (both in dark gray) after the 3-month closure in 2019 of 8chan imposed after the publication of manifestos of mass murderers.

QAnon is two different people, machine learning analysis shows. Press Release, 15th of December 2020

An algorithm-based stylometric approach provides new evidence to identify the authors of QAnon conspiracy theories.

QAnon has spread conspiracy theories to an unprecedentedly large audience. Its thousands of online messages have popularized narratives such as the existence of a child-trafficking deep state. Recently, it inspired a series of violent attacks and was listed as a terrorist threat by the FBI. The Swiss company OrphAnalytics just published an analysis of all messages posted by Q. Its patented technology aims at identifying authors of written documents. It has found two individual signals within the corpus of Q messages. This investigation contributes to revealing the origins and the persons behind one of the most impactful conspiracy theories in recent times.

Our results very strongly suggest the existence of two different authors behind Q,” says Claude Alain Roten, OrphAnalytics' CEO and co-founder. “Moreover, these distinct signatures clearly correspond to separate periods in time and different online forums.

A former geneticist trained at Harvard and the University of Lausanne, Roten has derived his text analysis approach from computational genomics. While conventional stylometry relies on the interpretation of words, content, or syntax, OrphAnalytics technology is entirely based on algorithmic analyses. It compares frequencies of character patterns to bring out individual signatures, regardless of the text meaning. Experts at the company have provided compelling pieces of evidence in several legal affairs in Europe and are collaborating with the School of Criminal Justice at the University of Lausanne.

OrphAnalytics analysts have skimmed through the entire corpus of Q posts known as “Q drops”. They cleaned the 4952 messages from any content deprived of individual syntax: lists, greetings, quotes from personalities, and messages shorter than 50 characters. Then, they fed the resulting elements to their software.


Multivariate statistical analysis (three-character patterns / units of 7500 concatenated characters) by OrphAnalytics.

The analysis shows that the first period of Q messages clearly bears a distinct individual signature from the rest. These seminal messages appear on the 4chan web forum, from October 28th to December 1st, 2017. After that, another author takes over QAnon on another forum, named 8chan. The signal difference is strong enough to leave very little doubt on this author’s swap.

The second and longest period — from Dec.1 2017 to Nov. 13, 2020 — shows a single signature with a slight evolution over time. While it is not impossible that a few other persons have mixed their voices in these +4700 messages, the signal is overall very consistent and points to a single author, says Roten.

The next step is to contribute putting a name on QAnon by comparing these signatures to those of the usual suspects,” says Roten. “To do that, we gather and cure written material from these persons to compare it with Q messages.Recent investigations point to a handful of potential authors behind Q messages, most notably the owner of 8chan forum Jim Watkins.Tracing back the history of QAnon is important. It could help to understand how and why a baseless and outlandish theory, initially destined to a few isolated hackers, ended up having such a broad social and political impact.



 

9 views0 comments

Recent Posts

See All

Comments


Post: Blog2_Post
bottom of page