Bitcoin and public figures' quotes : An extensive analysis of the potential underlaying link

A brief introduction to Bitcoin

The bitcoin is a digital currency relying on blockchain technology, developped in 2008 by an anonymous individual using the name of Satoshi Nakamoto. After getting initially the attention of tech-savvy, the digital coin gained significantly in popularity – and in valuation –  to become an investment asset comparable to Gold.

Bitcoin price evolution between 2016-2020

In the last five years, the cryptocurrency market had significantly gained in sight. Not only has it attracted institutional interest, but also retail demand. In fact, the year 2017 was unprecedented in the history of Bitcoin: Over 12 months, the prise rose about 20-fold before losing 65% of its value some months after. Media has taken advantage of the traction the asset created among individuals to cover it extensively, by soliciting public figures including famous tech savvy, politicians and investors.

Introduction to Quotebank

Quotebank is an open corpus of 178 million quotations extracted from 162 million English news articles published between 2008 and 2020. This Web-scale corpus relies on Quobert, a minimally supervised framework that uses the redundancy of the corpus by bootstrapping from a single seed pattern to extract training data for fine-tuning a BERT-based model. The model as introduced by Vaucher, Timoté, et al., correctly attributed 86.9% of quotations.

How many people mentioned Bitcoin in their speeches ?

Using the Quotebank dataset, we can visualize the distribution of the number of quotes including the keyword “Bitcoin” over time. There is a clear peak centered around the year 2018. Moreover, it seems that this peek has driven a momentum reflected by higher quotes’ occurrences per month after 2018 compared to before 2018.

Have you noticed something ? Look closely !

Voilà ! It seems that two distributions follow the same pattern. Maybe it is just a coincidence, may be not. Let’s temporarily keep this result aside, and explore other relevant aspects in the Quotebank dataset.  

It is all about Sentiments

The previous observations suggest that there is an interesting relationship between bitcoin’s fluctuations and people’s quotes about Bitcoin. 

In order to further investigate this relationship, we will analyse the semantics within our quotes dataset. One way to do so is to extract the sentiment from each quote and classify it to positive or negative.

Sentiment Results from our baseline model

A similar concept, known as “greed and fear index”, is used in the cryptocurrency market to measure the market’s sentiments based on crowdsourcing data from social media, Google trends and surveys analysis.

The latter provides a score between 0 and 100 that shows how much people are greedy with respect to their assets. If this score is close to 0, the dominating sentiment in the market is extreme fear.

Again, you could not observe the pattern, could you ?

As we can see above, when focusing on the common period, standardizing the values and smoothing the functions to avoid too much noise, we see that the variations are analogous. Combined with the pattern that we observed between the number of quotations and the Bitcoin price, we strongly doubt the presence of a correlation and potentially a causation. While we cannot claim any of them, we can definetely address this question using in-depth statistical analysis. 

But wait, how does this sentiment thing work?

We used a natural language processing model that measures the similarity between an input statement and a certain hypothesis and returns a score that reflects how much the input is similar to the provided hypothesis.
For our baseline model, we chose “The sentiment is positive about bitcoin.”, “The sentiment is negative about bitcoin.”,  as hypotheses so that each quote will be mapped to two scores measuring how positive and negative the quote is.

“bitcoin wastes too much energy” (‘negativity about bitcoin’, 0.9936).
“Bitcoin is our digital gold.” (‘positivity about bitcoin’, 0.9588).

Or even longer statements:

“So I think this is actually a really bullish development for bitcoin. I think it’s really bad for stablecoins and anyone who’s been trying to do decentralized finance.” (‘positivity about bitcoin’, 0.9962).

In order to obtain a robust sentiment model, we will compute the sentiment within Bitcoin statements in the quotebank dataset by aggregating the 8 outputs of a BERT model that computes the similarity between a given quotation and 8 hypothesis statements:

  • In conclusion, we should (buy/use/encourage/keep/promote) bitcoin.
  • In conclusion, we should not (buy/use/encourage/keep/promote) bitcoin. 
  • There are some concerns about bitcoin. 
  • There are no concerns about bitcoin.
  •  In conclusion, bitcoin has some drawbacks.
  •  In conclusion, bitcoin has some benefits.
  •  In conclusion, bitcoin is reliable. In conclusion, bitcoin is problematic.

Here are our final fear and sentiment scores, smoothed and aggregated : 

Fear and Sentiment scores by Date

Causality or Correlation : Granger Causality Test to the rescue

In order to confirm or infirm the causality, we performed a granger causality test between the greed index and the positive sentiment score. This test states, under a certain confidence level, whether a signal causes another one and if so, with what time lag.

Even if said above that fear and score don’t cause each other, we clearly observe the similarity between their smoothed time series.
Let’s check the possible correlation between them.

There is indeed a correlation for different time lags, but no causation according to the Granger’s causality test… But let’s check how the number of Bitcoin quotes influences the greed/fear index:

The charts above show that the number of quotes about bitcoin are indeed causing fear and greed in the cryptocurrency market with a time lag of one day.
In fact, by definition the fear index is not based on the sentiment, but rather on the number of posts on social media including Twitter. This definition semantically matches our variable num Occurrences… now it makes sense!

Who said what ?

When we talk about a subject as serious as bitcoin and crypto in general, the identity of the speaker is as important as the quote itself, so it’s only natural to see how data is distributed over speakers.
In the following, we will shed light on the most frequent individual speakers and investigate potential biases in their speeches.
First, we plot the distribution of Bitcoin’s quotes over the top 20 Speakers. Then, we analyse their sentiments over time taking into consideration the model’s confidence (Gray area around the curve).

Top 20 speakers by number of occurences

Evolution of each speaker's sentiment over time

Many tendencies appear among the Top 20 speakers. However, It is important to constantly keep in mind the context in which sentiments have been computed in. It becomes particularly important when we shift our focus towards particular well-known names. In fact, the media essentially relies on handpicking relevant quotes from stakeholders, or individuals enjoying expertise. Thus, the sentiment of each public speaker, as depicted above, should be interpreted as the quotes’ sentiment as conveyed by the medias. 

Diving deep into the graphs above, we can generally cluster the speakers into 4 clusters : All the time positive, All the time negative, All the time neutral and more interestingly variable sentiments. The sentiment, as a subjective truth value based on different factors, including the background of the concerned person, the socio-economical context at the moment of reporting their quotes, and on our model itself, does not carry in itself an exploitable value relevant for our research. Yet, considering its evolution over time, we can shed more light on a particularly important cluster : Those who seem to change their minds over time.

As an illustration, we will use the case of John McAfee. From the above graph, his quotes seem to convey a negative sentiment from 2015 to the end of 2016. However, since then and until 2019, the sentiment became positive before dropping again during 2019. 

While we cannot eliminate the variance and the bias contained in such data, we definitely can tackle some parts of it. In particular, we observed that many speakers among those top 20 are somehow related to the Business and Administration industry. For that reason, we decided to segregate the data by profession to investigate whether the tendencies are observable for the whole profession group or only for individuals. 

sentiment_evolution_by_field

Again, many tendencies appear. Some fields, like Arts, Communications and Sport remained all the time positive. On the other hand, only Education was all the time negative. All the remaining ones, namely Business people, Government individuals and Law makers have seen the sentiment conveyed by their speeches switch from the positive side, to the negative one. Notice that they all switched around 2017, which is consistent with the attention that Bitcoin received that time, and the controversy it provoked worldwide.  

Extra : We Took Care of your favourite Celebrities - Hover to Discover.

In 2017, Bill Gates said : " Bitcoin is exciting because it shows how cheap it can be. Bitcoin is better than currency in that you don't have to be physically present in the same place." While in 2018 he said : "Bitcoin can no longer be used as a payment instrument on your account, As an asset class, you're not producing anything and so you shouldn't expect it to go up. [ Bitcoin and cryptocurrencies ] are kind of a pure `greater fool theory' type of investment."

In 2018 and 2019, Elon Musk respectively said : " Bitcoin has shown great resilience over the past decade. It managed to ride the wave of volatility and has started to address questions regarding scalability", "Bitcoin's structure is quite brilliant". While in 2020 he said : "Bitcoin is * not * my safe word", and also "I'm neither here nor there on Bitcoin".

To be taken with a grain of salt. There is a reason why you should not base your investments on what is reported on media, let apart who said it. This is not a financial advice.

Conclusion

Diving deep into the quotebank dataset, we indeed proved that the number of quotes causes the fear and greed index to vary with a timelag of 1 day. 

On the other hand, we clustered professions based on their positions about Bitcoin over time. The resulting clusters can be possibly interpreted by the fact that the Education field is known for its rigidity and attachment to proven fundamentals, thus skeptical about new ideas, as Bitcoin. Arts,  Communications and Sport individuals probably share their ideas openly and mirror the public positivity. Conversely, Business and Law professionals switch their positions as a natural reaction to the market, which is aligned with the flexibility that those jobs require.

Given the complexity of the problem, one should constantly keep in mind the different covariates that affect the fluctuation of the market on one hand, and the sentiment that can be extracted with our model from the other hand. 

Meet the DataKillers

Trung Dung Hoang

Saad El Moutaouakil

Taha Zakariya