How to Start

Start by asking a question. Perhaps a good starting point is to ask "Make America what again?".

Description

@unrealDonaldTrump is nothing but an algorithm trained on lastest thousands of tweets of @realDonalTrump which is, as we all know, a personal twitter of 45th president of the United States of America.

Each human being has a unique speech identity. This identity is known by the following features:

  1. Lexical
  2. Grammar
  3. Style
  4. Topographical

The current algorithm is capable of reproducing the first three items from the list above. Basically, these items from the list is a foundation of written speech analysis in criminalistics. How can you prove that this message was or was not sent by a suspect?

Why Donald Trump

Donal Trump is indeed a bright person for this kind of idea. He is a well-known figure, a big part of his fame came from, sometimes ridiculous, tweets. @unrealDonaldTrump exposes great scientific interest for speech analysis and processing. Therefore, the main motivation for choosing this particular twitter account is purely scientific.

Disclaimer

The algorithm may generate offensive, discriminatory, racist and other inappropriate sentences. Keep in mind that this algorithm is a random process. It takes parts of tweets and combines them in the best possible way. While this way may look great for a machine it may be really unpleasant for a human. That said developer has no responsibility for produced content.

Technical Details

Generation algorithm is based on Markov process. Chain for the process assembled in 3 passes: 3-gram, 2-gram, 1-gram. Therefore "Very, very nasty weather" will yield data structures such as (very, very, nasty) -> weather for 3-grams, (very, very) -> nasty for 2-grams, (very) -> very for 1-gram. Next, generation process starts from user-defined ngrams, and if this ngram is in the chain the generation process begins. When it is happens that particular ngram is missing from the chain algorithm performs fallback to (n - 1)-gram until n is not 1.

Source Code

talk-to-president on GitHub from Pavel Bazin.