One year of corona also means one year of recorded parking lot occupancy time series data across germany, because that’s what i worked on for fun, in late March 2020, when we where all supposed to stay home and be calm. I do not drive and i’m generally more annoyed by than interested in cars but for some reason it felt right to record the data and make it public. In my opinion it is stuff of social relevance, somehow, and time series of it are recorded, but they are not freely available.

There where ideas about a website where one could analyze and download the data but it seemed a bit too time consuming to set up. A friend just suggested to push it to github and be done. And rightly so. In the meantime i discovered elasticsearch, learned stuff about kubernetes, grafana and kibana, fought once again with understanding pandas and a couple of other statistical tools. Actually, all these areas are so overwhelming that i’m happy with just publishing CSV files and let everybody handle it their own way.

The data comes from scattered websites spread across germany. After a few days i thought i’d found most of them. With the usual beautiful-soup chopping, each website is scraped and it’s numbers saved to a json file. This is done every couple of minutes by a faithful little server and after a long day exported into a CSV and pushed to github.

The rows in those CSVs only contain changed numbers and are blank otherwise to save space. The files are currently 180 MB on disk and contain about 15,200,000 numbers (from March 23rd 2020 to March 24th 2021).

During the year at least 35 cities with at least 500 car parks where sampled each week. The website scraping mostly fails if the html changes a bit but that does not seem to happen often. Actually most of the websites look like they are hundreds of years old already. Still there is a bit of fluctuation:

So before showing any real occupancy i’ll try to clean the data first. Jump to the cleaned data if more interested in that.

These are the cities which happened to have no data during at least one week.

missing weeks
city
Paderborn 46
Berlin 45
Lübeck 20
Jena 19
Potsdam 15
Hanau 10
Bielefeld 2
Köln 1
Reutlingen 1

Let’s see… Berlin really just appeared recently on some website. A year ago, the only source i found was https://www.parkopedia.de/parken/berlin/ and this is exactly one of those services where you immediately realize by just looking at the page that they know exactly what scraping is and that they don’t allow it. So i did not, to keep the parking-data repo nice and green.

Paderborn uses this nice old-school website to disemminate their parking allocation and it’s probably not functional most of the time. In fact they just seemed to have started working again after a year ;)

Paderborn records
week
2020-03-23 560
2020-03-30 536
2020-04-06 743
2020-04-27 256
2021-03-08 1649
2021-03-15 4467
2021-03-22 2214

Lübeck seems to have changed it’s website a bit last November. They where actually scraped with a text-search inside their inline javascript but that does not seem to work anymore and this post turns a little into a todo list.

Jena (website) is where i live and they just started a parking system last summer when i was on vacation. I had to leave holidays, though, for a couple of days and so came in touch with laptop and internet and added the new website to the parking scraper some late night. Next day i was joyously way back to nature while the website developers where adjusting a few css classes and deploying it before lunch break. And so there where no numbers for another two weeks.

Jena records
week
2020-07-27 22:00:00 7
2020-07-27 23:00:00 8
2020-07-28 00:00:00 9
2020-07-28 01:00:00 2
2020-07-28 02:00:00 5
2020-07-28 03:00:00 9
2020-07-28 04:00:00 31
2020-07-28 05:00:00 45
2020-07-28 06:00:00 52
2020-07-28 07:00:00 65
2020-07-28 08:00:00 60
2020-07-28 09:00:00 46
2020-07-28 10:00:00 60
2020-07-28 11:00:00 53
2020-07-28 12:00:00 17

Somehow this made me sad when i realized. I don’t know… sometimes, when going past one of the digital displays of free parking spaces in Jena i check my clock to see if it’s a full minute and the scraper has collected the number.

Potsdam (website) also stopped working in December so i should really revisit the scraper soon.

Actually.. don’t worry! Writing this post took a bit more time than anticipated so in the meantime i fixed a couple of website scrapers.

Apart from cities there are far more places that are missing large proportions of data. Sometimes the website is from the owner of a couple of parking garages so they’ll try to make things look good and all. But most of the time the page is just some conglomerate of marketing and tourist interests and general showing-off by city agencies with data provided by some civil engineering office and once a TYPO3 entrepreneur has finished the web integration it’s never looked at again. So they might list a lot of parking garages but only a handful are connected to actual live data.

All in all, there are 348 parking lots which provided data every week and they are in exactly 30 different cities:

records
city
Dresden 898208
Wiesbaden 784066
Düsseldorf 701026
Osnabrück 657465
Münster 584723
Mannheim 574514
Aachen 475733
Karlsruhe 422823
Bremen 420642
Oldenburg 395993
Regensburg 358268
Braunschweig 349091
Trier 346084
Baden-Baden 313413
Frankfurt 312753
Konstanz 310666
Esslingen 279197
Bonn 269895
Nürnberg 268917
Ingolstadt 268606
Kassel 249458
Dortmund 236050
Bad-Homburg 216310
Ulm 206414
Kiel 174187
Heilbronn 166091
Limburg 157079
Bochum 104237
Datteln 52947
Dülmen 33560

Though, i remember that the numbers on the Dresden website where frozen for about a month so i’ll check for these cases too by counting the weeks without a change in number of free spaces. This probably means something is broken, because week averages are non-integer numbers and very unlikely to be hit twice if there is actual traffic.

weeks without change
place
bonn-bcp-parken-karstadt 52
oldenburg-service-parken-CCO-Parkdeck-2 52
swt-trier-parken-Rat1 52
oldenburg-service-parken-CCO-Parkdeck-1 52
esslingen-parken-Karstadt 52
... ...
wiesbaden-parken-Galeria-Kaufhof 1
wiesbaden-parken-Luisenplatz 1
wiesbaden-parken-Markt 1
wiesbaden-parken-PH-Liliencarree 1
wiesbaden-parken-RheinMain-CongressCenter-RMCC 1

129 rows × 1 columns

So well.. a couple of them just never really worked. 129 places have at least one frozen week. When looking at the percentiles it doesn’t seem too terrible, though:

weeks without change
stats
mean 9.689922
std 14.881195
min 1.000000
25% 3.000000
50% 3.000000
75% 7.000000
max 52.000000

Half of them only have 3 frozen weeks. I’ll allow a bit of leeway and remove all places that have more than 4 weeks without change.

records
place
apag-parken-Aachen-Parkhaus-Adalbertsteinweg 53328
apag-parken-Aachen-Parkhaus-Couvenstrasse 54175
apag-parken-Aachen-Parkhaus-Eurogress 41103
apag-parken-Aachen-Parkhaus-Galeria-Kaufhof-City 57891
apag-parken-Aachen-Parkhaus-Hauptbahnhof 52203
... ...
wiesbaden-parken-Markt 47602
wiesbaden-parken-PH-Liliencarree 36554
wiesbaden-parken-RheinMain-CongressCenter-RMCC 40871
wiesbaden-parken-TG-Liliencarree 45506
wiesbaden-parken-Theater 34612

297 rows × 1 columns

This leaves 297 places to look at.

The cleaned data

No, the data is still pretty messy. For example, some parking lots turn off their numbers after closing times or transmit a fixed value like zero or maximum capacity. But i do not have an actual statistical goal in mind so i might live with that.

Note that this shows free spaces, so higher means less car activity. The spikes are the weekends. During christmas / new year the spikes are less pronounced. This looks similar to the corona lock-down in March/April 2020. Alas, since the lock-down brought me to collect this data in the first place, i do not have timeseries before to compare.

The maximum capacity could be scraped from the website for 224 garages. That’s 75% of the already filtered set and roughly 40% of the whole. For those we can actually calculate a percentage, independent of garage size. This also helps a bit more with data cleaning because we can filter within the range [0, 100] to remove offensive numbers (which actually do exist).

I will also flip the following graphs upside-down, so high value means more traffic volume which i personally find more intuitive to interpret.

And here is just an average day on a parking lot, or rather the average free percent per hour averaged over each weekday.

I think this is pretty terrible to look at. There are people regularily driving to work at 3 o’clock in the morning! Currently i wake up when parking lots start emptying already.

Also note the fine nuances. On Fridays, people leave work a bit earlier but stay out longer. Still a bit longer on Saturdays. Though, this was a pretty fucked-up year without any public events that i know of, except demonstrations maybe.

Let’s just split this into months. You may use your own memory to paint some meaning into these graphs but basically it starts with omg-corona-fucking-stay-at-home-and-stop-shopping, gradually glides into well-lets-open-the-bars-and-amusement-parks-again, while omg-fucking-corona-mutations-creep-over-us and please-stay-at-home-after-work-and-shopping concludes the one year roundtrip.

If you happen to live in one of the cities that survived the rough data cleaning you can reflect below and compare with the graphs of other cities. Click or double-click a city in the right legend to change visibilitly.

So that was a little tour through this yet-another-dataset. If you are interested check out

github.com/defgsus/parking-data/

to play with it, or request a pull for another parking website at

github.com/defgsus/parking-scraper/

All the data is free, free as in free parking.