Are you interested in old news? Here’s one: The volume and weight of topics we discuss every day is largely determined by the pre-selection made by mainstream media. Their selection method–as far as we can tell–is mostly determined by competitive desires. Some necessity and some good will may also be relevant.

Here’s the old news containg some pupular words of the day:

The plot is showing the number of article headlines per every 2 hours containing the words corona, war, storm, olympia or gasoline prices, collected from the german online papers/magazines/news-channels bild.de, compact-online.de, faz.net, fr.de, gmx.net, heise.de, spiegel.de, spiegeldaily.de, sueddeutsche.de, t-online.de, volksstimme.de, web.de, welt.de, zeit.de and zeitfuerdieschule.de. Scraping was done by this repository and every index page as well as several sub-categories are analyzed. This plot only considers the first 10 headlines found on each page.

The olympic wintergames were helt between 4th and 20th February, there was a heavy storm around 15th February and the invasion of Ukraine began the 24th. The evergreen topic of facts and stories about the pandemic got largly diminished in popularity by the latter event.

Counting the number of pages (categories) containing headlines with the above terms leads to a similar graphic:

Using the significant terms aggregation to extract words that have some pecularity in frequency during a week compared to all other weeks, leads to the following list (including only words that appear at least in 3 weeks):

angriff, bundesliga, corona, februar, flucht, flüchtlinge, formel, fußball, gas, geflüchtete, gesundheit, impfpflicht, invasion, kiew, konflikt, krieg, kriminalität, league, mariupol, märz, olympia, peking, putin, putins, russen, russische, russischen, russland, russlands, sanktionen, selenskyj, sturmtief, ukraine, ukrainekrieg, ukrainische, unfälle, winterspiele

Just by looking at the statistics not much else has really happened. People are fleeing death and destruction and other public sports events are rowing in and out.

The above plot is hard to read but we can plot the correlation matrix of all the terms to see which ones belong together. In this case, the correlation between two terms means how much they increase or decrease in frequency at the same time.

For the regarded period, corona highly correlates with the olympic games, the word february and mandatory vaccination. whereas on the contrary side war highly correlates with ukraine, russia, putin, refugees, invasion and gas. The red areas show negative correlations, which translates to words that suppress or, at least, replace each other in usage.

Below is the same plot for the number of top-10 headlines:

So these are the major topics that are provided for our notice and consumption.

Just out of curiosity, let us look at the second level! Repeating the significant terms aggregation per week but only on headlines that do not contain one of the words above. Again, only words that are present during at least 3 weeks are displayed:

Most topics seem to be long running and do not appear or vanish within the whole time period.

There is not much inter-word correlation any more and the topics are more all over the place. If you are interested in these things i encourage to do your own analysis. I’m just giving a broad overview over the dataset but there is a lot of detailed information to be found.

For example:

This shows the number of different headlines per day and channel. zeit.de seems to be really prominent. Even more than the ever-stupid manipulation platform bild.de. You know, there is sooo much important stuff to post and update each day about corona. How can the other channels dare to not list 800 different articles per day, eh? This is irresponsible!

Lets examine the most responsible channel then:

The news ticker seems to be the most responsible category here. But they do list up to 4,700 different articles there so looking at the percentage of headlines that contain “corona” might make more sense:

The health category contains the most “corona” articles with percentages between 30% and 60%! Below is the ‘layout’ of this category over time. Each block means that there is at least one headline with the term “corona” and the y-axis shows the position on the web-page (top positions shown starting at the bottom):

The news category probably contains most or all headlines of all other categories. If we plot it’s rank histogram as above the stream of new articles becomes visible. Every day! An endless stream of most relevant and informative corona brainwash. If you zoom in on time (dragging the mouse horizontally) you can see that especially between 6 AM and 6 PM new headlines shift the previous headlines down the page:

Enough now! I actually stopped reading news some years ago. This is kind of the first contact again. From a meta perspective. Thanks for reading this post and if you are interested to experiment yourself, please look at the teletext-archive repository as well. It contains an elasticsearch export script which can be edited to export the frontpage-archive data as well. The Giterator class allows easy access to all committed states.

Slack off! and praise “Bob!”