… just, that new scraper is mine. There are people doing this some years longer! So let me instead introduce parkendd.de by Offenes Dresden. It’s all open and the archive is available for download.

This post is similar to some of my other data investigation posts in the sense that i simply start coding and see what comes out of it. For a change, all code is included so you do not need to check the jupyter notebook.

Currently (end of 2021) data from 2015 to 2020 is packaged into a big tar.xz file, which i will convert to tar.gz because it’s easier to read in python.

wget https://parkendd.de/dumps/Archive.tar.xz
xz -dc Archive.tar.xz | gzip -cf9 > parkapi-2020.tar.gz

The xz compression actually seems to be a good choice because the filesize expands from 200 to 500 megabytes with gz. Not a problematic number, though. However, the uncompressed tar file is about 3.6 Gigabytes. I want to use pandas and experience tells me that loading a Gigabyte csv will usually not fit into memory. Even if it does, all operations that copy data will eventually kill the python kernel.

So, i’ll iterate through all files in the archive - each representing one parking lot per year - resample them to averaged 1 hour buckets and gradually merge them into a single DataFrame. I want to look at years 2016 to 2020, so that’s about 44,000 hour steps for a 100+ paring lots which should fit into anyone’s memory.

from pathlib import Path
import tarfile
import codecs
import re
from typing import Generator, Tuple, Union, Optional, Callable

from tqdm import tqdm
import requests
import pandas as pd
import numpy as np
import plotly
import plotly.express as px
from plotly.subplots import make_subplots

pd.options.display.max_columns = 30
pd.options.plotting.backend = "plotly"
plotly.templates.default = "plotly_dark"
def iter_archive_dataframes(
    filename: Union[str, Path],
    resampling: str = "1h",
) -> Generator[Tuple[str, pd.DataFrame], None, None]:
    
    # tarfile does handle the gzip automatically
    with tarfile.open(filename) as tfp:
        
        # build map of lot_id to available csv filenames 
        #   i ignore 2015 since it's incomplete
        lot_id_filenames = dict()
        for filename in sorted(tfp.getnames()):
            if "backup" not in filename:
                match = re.match("(.*)-(20\d\d).csv", filename)
                if match:
                    lot_id, year = match.groups()
                    if year != "2015":
                        lot_id_filenames.setdefault(lot_id, []).append(filename)
        
        # for each lot
        for lot_id, filenames in lot_id_filenames.items():
            # if we have years 2016 - 2020
            if len(filenames) == 5:
                # build one DataFrame, resampled to 1 hour
                dfs = []
                for filename in filenames:
                    fp = tfp.extractfile(filename)
                    dfs.append(pd.read_csv(
                        codecs.getreader("utf-8")(fp), 
                        names=["date", "free"]
                    ))
                df = pd.concat(dfs, axis=0)
                df["date"] = pd.to_datetime(df["date"])
                try:
                    df = df.set_index("date").resample(resampling).mean()
                    yield lot_id, df
                except:
                    pass
archive_file = Path("~/prog/data/parking/parkapi-2020.tar.gz").expanduser()
table_file = Path("~/prog/data/parking/parkapi-2020-1h.csv").expanduser()

if not table_file.exists():
    big_df = None
    for lot_id, df in tqdm(iter_archive_dataframes(archive_file)):
        df["lot_id"] = lot_id
        df = df.reset_index().set_index(["date", "lot_id"])
        if big_df is None:
            big_df = df
        else:
            # append rows and sort by date
            big_df = pd.concat([big_df, df]).sort_index()
    
    # x = lot_id, y = date
    big_df = big_df.unstack("lot_id")
    # drop the "free" label from columns, just keep lot_id
    big_df.columns = big_df.columns.droplevel()
    # store
    big_df.to_csv(table_file)

else:
    # read the file if it was already created
    big_df = pd.read_csv(table_file)
    big_df["date"] = pd.to_datetime(big_df["date"])
    big_df.set_index("date", inplace=True)
    big_df.columns.name = "lot_id"
    
big_df
lot_id aalborgcwobel aalborgfriis aalborgføtex aalborggåsepigen aalborgkennedyarkaden aalborgkongrescenter aalborgmusikkenshus aalborgpalads aalborgsalling aalborgsauersplads aalborgsømandshjemmet aarhusbruunsgalleri aarhusbusgadehuset aarhuskalkværksvej aarhusmagasin ... luebeckpferdemarkt luebeckradissonhotel muensterbusparkplatz oldenburgccoparkdeck1 oldenburgccoparkdeck2 oldenburgcity oldenburggaleriakaufhof oldenburghbfzob oldenburgheiligengeisthoefe oldenburgpferdemarkt oldenburgschlosshoefe oldenburgtheatergarage oldenburgtheaterwall oldenburgwaffenplatz zuerichparkgarageamcentral
date
2016-01-01 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.000000 0.000 0.0 200.0 200.0 383.083333 269.0 126.666667 268.0 384.916667 309.250000 53.750000 64.833333 650.000000 0.000000
2016-01-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.000000 0.000 0.0 200.0 200.0 384.000000 269.0 129.750000 269.0 388.750000 316.833333 58.750000 67.083333 650.000000 0.000000
2016-01-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.000000 0.000 0.0 200.0 200.0 384.000000 269.0 133.250000 269.0 391.500000 321.083333 60.166667 79.250000 650.000000 0.000000
2016-01-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.000000 0.000 0.0 200.0 200.0 384.000000 269.0 133.833333 269.0 400.083333 323.416667 61.583333 77.500000 650.000000 0.000000
2016-01-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.000000 0.000 0.0 200.0 200.0 384.000000 269.0 133.000000 269.0 401.000000 324.000000 61.000000 78.083333 650.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-12-31 19:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 63.000000 178.0 350.000000 ... 46.000000 64.875 0.0 0.0 0.0 0.000000 0.0 111.000000 154.0 399.666667 428.000000 90.000000 69.416667 502.416667 38.000000
2020-12-31 20:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 63.000000 178.0 350.000000 ... 46.000000 64.750 0.0 0.0 0.0 0.000000 0.0 111.000000 154.0 399.500000 428.000000 90.000000 67.416667 503.000000 38.000000
2020-12-31 21:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 61.666667 178.0 349.916667 ... 7.666667 10.750 0.0 0.0 0.0 0.000000 0.0 111.750000 154.0 399.416667 428.000000 90.000000 66.833333 503.000000 38.000000
2020-12-31 22:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 57.750000 178.0 350.000000 ... 0.000000 0.000 0.0 0.0 0.0 0.000000 0.0 113.000000 154.0 400.416667 428.000000 90.000000 67.166667 502.083333 12.666667
2020-12-31 23:00:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 56.000000 178.0 350.000000 ... 0.000000 0.000 0.0 0.0 0.0 0.000000 0.0 113.000000 154.0 395.333333 428.000000 90.000000 67.250000 0.000000 0.000000

43848 rows × 127 columns

Don’t mind all the NaNs, the city of Aalborg is not scraped throughout the whole period. But Oldenburg seems to look good. Without further number crunching let’s do a quick interactive plot, resampled to 1 week buckets:

(big_df
 .resample("1w").mean()
 .round()  # the round saves about 300 Kb of javascript code 
 .plot(
     title=f"average number of free spaces per week ({big_df.shape[1]} lots)", 
     labels={"value": "number of free spaces", "date": "week"}
 )
)

As usual, you can drag and zoom, and hide the inidividual lots on the right side (doubleclick to hide all except one).

Now that i’m actually able to look at parking data predating this stupid covid pandemic i’ll pose two simple research questions

  • Is the lockdown around Germany in beginning of 2020 visible in the parking lot occupation data?
  • Has anything in the parking behaviour significantly changed compared to before?

First of all, when checking the plots above, a few cities have big chunks of missing data, Aalborg for example. It’s a shame but i’ll exclude them. Moreover, there are smaller gaps. Sometimes it happens that the number of free spaces listed on a website gets stuck, or is not listed at all, while other lots on the same site work fine. I’ll count the number of times that the average value does not change during three days. Specifically since year 2018:

df = (
    big_df[(big_df.index >= "2018-01-01")]
    .resample("1d").mean()
    .replace(np.nan, 0)  # treat missing values as zero
)
num_equal_days = ((df == df.shift(1)) & (df == df.shift(2))).astype(int).sum()
num_equal_days.sort_values().plot.bar(
    title="Number of times that 3 consecutive days have unchanged number of free spaces",
    height=600,
)

By visual inspection and comparison with the plot on top i decide to cut everything above 100, and also remove the Zurich lot because it misses data exactly at the time in question:

big_df = big_df.loc[:, (num_equal_days <= 100) & (big_df.columns != "zuerichparkgarageamcentral")]
big_df.shape
(43848, 53)

Okay, 53 lots remain. Now it would be great to normalize each lot using the total capacity.

big_df.max()
lot_id
aarhusbusgadehuset                97.166667
aarhussalling                    700.000000
dresdenaltmarkt                  439.000000
dresdenaltmarktgalerie          9868.000000
dresdenanderfrauenkirche         140.000000
dresdencentrumgalerie           3771.416667
dresdenfrauenkircheneumarkt      296.000000
dresdenkaditz                    377.000000
dresdenkongresszentrum         26245.000000
dresdenparkhausmitte             432.333333
dresdenpirnaischerplatz          145.000000
dresdenprohlis                   192.250000
dresdenreitbahnstrasse           409.916667
dresdensarrasanistrasse         1360.166667
dresdenschiessgasse              999.000000
dresdenterrassenufer             244.000000
dresdentheresienstrasse          159.000000
dresdenwiesentorstrasse          185.333333
dresdenwoehrlflorentinum         323.583333
dresdenworldtradecenter          314.416667
freiburgambahnhof                242.000000
freiburgbahnhofsgarage           224.000000
freiburgkarlsbau                 977.000000
freiburgkonzerthaus              453.000000
freiburgmartinstor               142.000000
freiburgrotteck                  312.000000
freiburgschlossberg              440.000000
freiburgschwarzwaldcity          436.250000
freiburgzaehringertor            100.000000
ingolstadtcongressgarage         453.000000
ingolstadthallenbad              661.666667
ingolstadthauptbahnhofost        240.000000
ingolstadtmuenster               750.000000
ingolstadtnordbahnhof            231.083333
ingolstadtreduittilly            356.000000
ingolstadttheaterost             595.000000
ingolstadttheaterwest            514.333333
luebeckbackbord                  135.000000
luebeckfalkenstrasse             150.000000
luebeckhaerdercenter             212.000000
luebeckhafenbahnhof              108.833333
luebeckkanalstrasse2             216.000000
luebeckkanalstrasse3             197.000000
luebeckkanalstrasse4             284.000000
luebeckkanalstrasse5              45.000000
luebecklastadiep3                 34.000000
luebecklastadiep4                 17.000000
luebecklastadiep5                253.916667
luebeckleuchtenfeld              750.000000
luebecklindenarcaden             400.000000
luebeckmitte                     420.000000
luebeckmuk                       367.000000
luebeckradissonhotel              73.000000
dtype: float64

Ah, well, the congress center in Dresden probably does not had 26 thousand spaces. I’ll first clamp the dataframe to, let’s say, 2000, just to remove the most obvious outliers

big_df = big_df.clip(0, 2000)

and then ask the ParkAPI for more precise values. The endpoint is https://api.parkendd.de/<City> which returns static and live data for each lot per city:

CITIES = ["Aarhus", "Dresden", "Freiburg", "Ingolstadt", "Luebeck"]
lot_infos = dict()
for city in CITIES:
    response = requests.get(f"https://api.parkendd.de/{city}")
    for lot in response.json()["lots"]:
        lot["city"] = city
        lot_infos[lot["id"]] = lot

lot_infos["dresdenkongresszentrum"]
{'address': 'Ostra-Ufer 2',
 'coords': {'lat': 51.05922, 'lng': 13.7305},
 'forecast': False,
 'free': 234,
 'id': 'dresdenkongresszentrum',
 'lot_type': 'Tiefgarage',
 'name': 'Kongresszentrum',
 'region': 'Ring West',
 'state': 'open',
 'total': 250,
 'city': 'Dresden'}

Well 26,000 was only two magnitudes above the truth.

lot_infos["luebeckbackbord"]
{'coords': {'lat': 53.970161, 'lng': 10.880241},
 'forecast': False,
 'free': 0,
 'id': 'luebeckbackbord',
 'lot_type': 'Parkplatz',
 'name': 'Backbord',
 'region': 'Parkplätze Lübeck',
 'state': 'open',
 'total': 0,
 'city': 'Luebeck'}

Lübeck does not provide a total value. The website that is scraped can be determined from the geojson file of the Lübeck-scraper (or from https://api.parkendd.de/). It actually seems to be offline right now. So i’ll use the official numbers if present and the maximum free value otherwise:

official_capacity = pd.Series(
    big_df.columns.map(lambda c: lot_infos[c]["total"] or None), 
    index=big_df.columns
).dropna()

capacity = big_df.max()
capacity[official_capacity.index] = official_capacity

# lot occupation in range [0, 1]
occupied = 1. - (big_df / capacity).clip(0, 1)

(occupied
 .groupby(lambda c: lot_infos[c]["city"], axis=1).mean()
 .resample("1m").mean() * 100.
).plot(
    title="Average lot occupation per month and city",
    labels={"value": "occupation percentage", "date": "month"}
)

Alright. There it is. A pretty obvious dent! With the least occupation during April 2020. That’s how i remember it. Kids skating on empty parking lots, no planes in the sky, no stupid shops selling useless things.

For the interested, here’s the same plot for each lot:

(occupied.resample("1m").mean() * 100.).round().plot(
    title="Average lot occupation per month and lot",
    labels={"value": "occupation percentage", "date": "month"}
)

There are more ways of looking at the occupation data. Instead of calculating the average for each week we can build a histogram of the occupation values. This shows all levels of occupation during each week:

def plot_histogram(
        df: pd.DataFrame, 
        resample: str = "1w", 
        bins: int = 48, 
        range: Optional[Tuple[float, float]] = None,
        clip: Optional[Tuple[float, float]] = None,
        title: Optional[str] = None,
        labels: Optional[dict] = None,
):
    if range is None:
        df_n = df.replace(np.nan, 0)
        range = (np.amin(df_n.values), np.amax(df_n.values))
    df = pd.concat(
        (pd.Series(np.histogram(group, bins=bins, range=range)[0], name=key)
        for key, group in df.resample(resample, level="date")),
        axis=1
    ).replace(0, np.nan)
    df.index = np.linspace(*range, bins)
    if clip is not None:
        df = df.clip(*clip)
    return px.imshow(
        df, origin="lower",
        title=title or "Weekly histogram of occupation per lot",
        labels=labels or {"y": "occupation percentage", "x": "week"},
        color_continuous_scale=["#005", "#08f", "#8ff", "#fff", "#fff"]
    )

# ignore values that are exactly zero or one
#   as they are usually *bad data* (see below)
plot_histogram(occupied.replace({0: np.nan, 1: np.nan}) * 100.)

So, starting end of March 2020, the most reported lot occupation is between 0 and 15%. The situation kind of normalizes in June and kind of returns in November.

What are these small horizontal stripes you ask? And what happened in the beginning of 2018?

The short 2018 outage is probably some internal server problem. You know, disk full, provider problems. There is no indication in the commit history.

To investigate the stripes, i’ll spend few more Megabytes of generated javascript and look at a few lots in pariticular:

def plot_lot_data(lot_id: str, filter: Optional[Callable] = None):
    fig = make_subplots(
        rows=2, cols=1,
        vertical_spacing=0.1,
        shared_xaxes=True,
        subplot_titles=["weekly occupation histogram", "number of free spaces per hour"],
    )
    filter = filter or (lambda df: df)
    df = filter(occupied[lot_id])
    histo = plot_histogram(df * 100)
    fig.add_trace(histo.data[0], row=1, col=1)
    fig.add_trace(
        filter(big_df[lot_id]).round().plot().data[0].update(showlegend=False), 
        row=2, col=1,
    )
    return fig.update_layout(
        coloraxis=histo.layout.coloraxis, 
        title=f"{lot_id} (capacity: {lot_infos[lot_id]['total']})", height=700
    )

plot_lot_data("dresdenparkhausmitte")

Obviously, a horizontal stripe means that the free-spaces-counter stood still somehow. Except for the stripes at 0% occupation starting at the end of 2019. They are caused by the reported number of free spaces being larger than the reported lot capacity, which is (by the time of writing this article) 280. This garage must have decreased it’s capacity in the meantime. It would be helpful if the recorded capacity would be published in the archive as well. Otherwise we must trust the maximum value which is 432 for this recording. However, if you zoom in at Oct 1st to 4th 2016 when this maximum was reached, you’ll notice a completely unrealistic looking period of 200+ spaces. Also note that the little free peaks that occur each day around that period are upside-down within! It may still be possible that some real-life event has caused that but i find it more likely to be some digital mess-up.

plot_lot_data("freiburgambahnhof")

At first glance, the parking lot in Freibug looks much more lively compared to the one above. But please zoom in at the flat-line in winter 2016/17. There is obviously no real car activity but still the number of reported free spaces changes between zero and 62 each day in a super regular pattern reminding on opening hours. They only publish free places during opening hours. You know, that might make sense for drivers but it just makes interpreting the data harder. Since the outage in April 2018 they seem to be open 24/7 and data is published continously. Still, looking closely at some points it becomes hard to determine, for myself at least, if this is real car-in car-out activity. The patterns are so regular at times, e.g. from one weekend to the next, that i find it either creepy or not completely trustworthy.

plot_lot_data("ingolstadtreduittilly")

This one’s interesting. The number of cars in Ingolstadt seems to be growing. Although, once again, zooming in on the data reveals some strange jumps of the occupied spaces during the night from one week to the next which do not look like a reflection of 3d events. Or could this actually be gradual steps back towards working-life after the first lock-down?

Changes to the capacity, whether real or digital, do affect the number of free spaces. And i start to realize that it’s actually hard work to sample true car-activity just from the published number of free spaces.

Gradients do have the same problem. I thought: lets just look at the difference to the previous day or something like that. This will at least mitigate the opening-hours problem and some other automatic or purely digital changes that the free-spaces counter might be subject to. Example:

df = big_df["freiburgambahnhof"]
df = df[(df.index >= "2020-01-01") & (df.index < "2020-06-01")]

fig = make_subplots(
    rows=4, cols=1,
    vertical_spacing=0.02,
    shared_xaxes=True,
    subplot_titles=[
        "free spaces per hour", "difference to previous hour", 
        "difference to previous day", "difference to previous week"
    ],
)
fig.add_trace(df.plot().data[0], row=1, col=1)
fig.add_trace(df.diff(1).plot().data[0], row=2, col=1)
fig.add_trace(df.diff(24).plot().data[0], row=3, col=1)
fig.add_trace(df.diff(24*7).plot().data[0], row=4, col=1)
fig.update_layout(
    height=1300, showlegend=False, 
    title="'freiburgambahnhof' free spaces and gradients (2020/01 - 2020/05)"
)

One can see things, still, it is hard to interpret this data automatically.

Fine. I’m not a paid scientist, not even a scientist, but i want to scrutinize question #2 a bit: Has anything in the parking behaviour significantly changed compared to before? I mean, apart from the fact that there is less parking, anyways. So i’ll try to look at the occupation per hour-of-day. In my previous parking post i found that there are some hints if occupation is driven by work & shopping activity or by more leisurely demands.

But first i need to check the opening hours problem. If a lot does list zero free spaces at some point that translates to 1.0 in the occupied DataFrame and so i’ll simply count the number of times that a lot has full occupation for each hour of day:

zero_df = pd.concat([
    (occupied[occupied.index.hour == hour] == 1).astype(int).sum()
    for hour in range(24)
], axis=1)
zero_df.columns.rename("hour of day", inplace=True)
zero_df
hour of day 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
lot_id
aarhusbusgadehuset 6 5 5 5 5 6 6 6 6 5 4 4 5 5 5 5 5 5 5 5 5 5 5 5
aarhussalling 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
dresdenaltmarkt 15 15 15 15 10 9 8 7 9 17 58 151 182 168 160 145 138 149 136 66 26 18 15 15
dresdenaltmarktgalerie 8 8 8 8 6 5 5 4 5 15 97 210 219 182 156 121 93 63 22 8 8 8 8 8
dresdenanderfrauenkirche 27 27 26 26 26 29 30 28 29 33 41 36 29 24 27 26 25 29 23 22 24 25 26 28
dresdencentrumgalerie 30 30 30 30 26 23 21 18 20 26 56 168 193 154 115 77 54 38 30 30 30 30 30 30
dresdenfrauenkircheneumarkt 42 42 42 42 36 27 26 22 27 28 40 97 115 107 88 88 94 137 156 135 84 50 44 42
dresdenkaditz 1 0 0 0 0 1 2 2 1 1 2 0 0 2 1 2 2 0 0 3 1 1 1 1
dresdenkongresszentrum 1 0 0 0 0 0 2 17 45 56 59 53 51 41 26 19 15 24 31 28 24 17 12 6
dresdenparkhausmitte 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 3 3 2 1 1 0
dresdenpirnaischerplatz 14 6 6 7 8 9 6 2 9 55 223 386 385 327 306 311 369 514 523 332 113 45 26 18
dresdenprohlis 17 13 3 4 3 1 2 1 5 8 14 17 15 17 15 15 19 19 18 16 16 17 17 17
dresdenreitbahnstrasse 33 31 31 31 23 15 16 13 48 210 407 504 497 446 428 425 419 346 183 78 49 40 37 35
dresdensarrasanistrasse 53 52 54 55 38 16 16 17 57 105 119 123 112 109 105 122 124 132 191 221 168 96 58 54
dresdenschiessgasse 57 54 54 54 49 39 56 86 188 500 864 1003 849 626 479 454 521 694 725 509 219 92 68 63
dresdenterrassenufer 27 27 27 27 24 16 14 19 92 249 435 559 525 369 265 205 184 264 294 224 95 47 35 30
dresdentheresienstrasse 1 1 1 1 1 1 1 2 4 1 1 1 2 1 1 0 1 5 10 3 1 1 1 1
dresdenwiesentorstrasse 92 91 91 92 86 79 76 63 59 59 72 82 96 98 95 104 121 121 138 156 146 107 96 93
dresdenwoehrlflorentinum 1 1 1 1 0 0 0 0 0 0 20 40 31 23 6 3 2 1 1 1 1 1 1 1
dresdenworldtradecenter 0 0 0 0 0 0 0 0 1 9 9 9 5 8 9 9 3 1 4 15 7 0 0 0
freiburgambahnhof 491 600 603 606 599 254 8 8 7 6 7 10 10 7 6 7 7 7 7 7 7 7 7 190
freiburgbahnhofsgarage 49 49 49 49 49 49 49 54 67 71 70 84 80 70 65 61 50 50 50 51 49 49 49 49
freiburgkarlsbau 7 7 7 7 7 7 6 5 5 5 7 6 8 8 8 8 7 7 7 7 7 7 7 7
freiburgkonzerthaus 3 3 3 3 3 3 4 4 4 4 7 9 7 7 3 3 3 3 5 3 3 3 3 3
freiburgmartinstor 553 550 551 551 543 285 57 38 47 58 69 79 72 66 58 53 48 45 42 39 37 34 207 466
freiburgrotteck 1 1 1 1 1 1 0 0 0 0 7 8 11 5 2 3 1 1 1 1 1 1 1 1
freiburgschlossberg 9 9 9 9 9 9 8 8 8 10 30 37 29 21 20 16 12 10 10 9 9 9 9 9
freiburgschwarzwaldcity 1543 1538 1544 1544 1543 838 302 302 301 299 288 289 289 289 291 295 295 295 295 295 423 1132 1545 1545
freiburgzaehringertor 603 600 603 603 594 249 4 5 4 6 5 5 6 4 2 2 2 2 334 604 604 605 605 604
ingolstadtcongressgarage 0 0 0 0 0 0 1 17 13 16 7 5 7 2 0 0 0 0 0 0 0 0 0 0
ingolstadthallenbad 0 0 0 0 0 0 0 5 8 3 0 0 1 1 0 1 0 0 1 0 0 0 0 0
ingolstadthauptbahnhofost 0 0 0 0 1 7 6 32 29 29 27 7 3 1 3 1 0 0 0 0 0 0 0 0
ingolstadtmuenster 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ingolstadtnordbahnhof 0 0 0 0 0 0 45 53 67 20 5 1 0 0 0 0 0 1 1 1 0 0 0 0
ingolstadtreduittilly 0 0 0 0 0 0 0 1 2 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ingolstadttheaterost 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
ingolstadttheaterwest 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
luebeckbackbord 1779 1774 1779 1779 1779 732 11 12 12 15 19 20 25 30 29 24 18 18 23 21 72 992 1770 1775
luebeckfalkenstrasse 1779 1774 1779 1779 1779 895 293 293 291 288 290 293 297 293 300 305 303 302 304 311 387 1220 1776 1776
luebeckhaerdercenter 1780 1775 1780 1780 1780 732 20 25 30 73 113 104 94 91 83 62 47 40 36 34 100 1101 1775 1776
luebeckhafenbahnhof 1780 1775 1780 1780 1780 733 17 18 17 20 29 26 23 24 23 23 21 21 24 23 72 995 1770 1776
luebeckkanalstrasse2 1537 1532 1537 1537 1537 653 19 21 20 20 20 17 17 18 21 18 20 22 30 19 61 916 1534 1535
luebeckkanalstrasse3 1537 1532 1537 1537 1537 706 89 104 153 279 328 278 232 203 202 222 286 349 351 274 242 997 1534 1535
luebeckkanalstrasse4 1546 1541 1547 1547 1547 677 47 66 97 132 116 93 85 84 95 102 129 162 177 155 178 976 1543 1544
luebeckkanalstrasse5 1549 1544 1549 1549 1549 676 45 134 307 366 286 203 157 150 118 127 124 129 146 98 110 942 1547 1548
luebecklastadiep3 1781 1775 1780 1780 1780 736 22 33 39 52 73 99 111 115 106 101 96 136 186 146 194 1112 1774 1776
luebecklastadiep4 1785 1779 1784 1784 1784 1132 413 288 233 237 284 341 360 363 374 403 432 486 547 627 686 1255 1783 1782
luebecklastadiep5 1781 1775 1780 1780 1780 779 56 59 61 68 73 84 90 94 96 95 124 173 197 129 168 1086 1774 1776
luebeckleuchtenfeld 1779 1774 1779 1779 1779 750 36 37 38 49 70 93 110 104 89 71 69 67 67 66 116 1005 1770 1775
luebecklindenarcaden 1776 1771 1776 1776 1776 739 24 27 29 33 30 28 24 23 21 19 19 16 17 16 78 953 1686 1772
luebeckmitte 1779 1774 1779 1779 1779 881 292 292 292 289 276 272 276 273 281 279 279 276 282 288 345 1204 1773 1776
luebeckmuk 1780 1775 1780 1780 1780 754 47 52 57 58 56 56 58 64 72 66 66 77 113 125 161 1111 1772 1775
luebeckradissonhotel 1780 1775 1780 1780 1780 730 13 18 20 19 17 17 17 17 17 13 12 13 15 15 82 1080 1773 1775

Lets see. Some Dresden lots seem to be particularily busy during the day but that could also be because of the lot occupation being too small at periods. All the Lübeck lots and one in Freiburg do obviously publish zero free spaces when closed, so that’s the ones to be careful about when calculating the occupation per hour. Though we also have seen previously that freiburgambahnhof did the same until 2018 and freiburgmartinstor and freiburgzaehringertor do look similar.

Plotting the occupation data of the Lübeck lots hints at another problem:

df = occupied.loc[:, occupied.columns.map(lambda c: c.startswith("luebeck"))]
(df[(df.index >= "2018-03-01") & (df.index < "2018-03-08")] * 100).round().plot(
    title="Occupation in Lübeck lots (March 2018)",
    labels={"value": "occupation %"}
)

First it looks like the lots open at 6:00 and close at 22:00 but there are these little edges at the corners. It’s more likely they open at 6:30 and close at 20:30 or 21:30 but the final value is lost in the average bucketing of 1-hour-steps done in the beginning. Well, if they are closed, their data does not contribute to the leisure activity anyways so i’ll simply cut off all lots that have a zero-count of more than 400 at conservative times that are safe to assume, before 7:00 and after 20:00

occupied_open = occupied.copy()
for lot_id in zero_df[zero_df[0] > 400].index:
    df = occupied_open.loc[:, lot_id]
    occupied_open.loc[:, lot_id] = df[(df.index.hour >= 7) & (df.index.hour <= 20)]

Just to make sure i plot the sample range again for all lots:

df = occupied_open
(df[(df.index >= "2018-03-01") & (df.index < "2018-03-08")] * 100).round().plot(
    title="Occupation during opening times (March 2018)",
    labels={"value": "occupation %"}
)

As far as i can determine, there are no regular hard edges any more. So then gimme that occupation per hour-of-day plot, individually for every year:

def hours_year_group(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df["year"] = df.index.year
    df["hour"] = df.index.hour
    return (
        df.reset_index().set_index(["date", "year", "hour"])
        .unstack("year")
        .groupby(level="hour").mean()
        .groupby(level="year", axis=1).mean()
    )
    
(hours_year_group(occupied_open) * 100).plot(
    title="mean occupation per hour of day",
    labels={"value": "occupation %"},
    color_discrete_sequence=["#aa4", "#4a4", "#4aa", "#48a", "#f00"]
)

Amazing, isn’t it? No, not really. And the bump at 20:00 is not making much sense. Let’s plot the mean for each city individually:

def per_city_plot(occupied_open: pd.DataFrame, title: Optional[str] = None):
    fig = make_subplots(
        rows=len(CITIES), cols=1,
        vertical_spacing=0.02,
        shared_xaxes=True,
        subplot_titles=CITIES,
    )
    for i, city in enumerate(CITIES):
        df = occupied_open.loc[:, occupied_open.columns.map(lambda c: c.startswith(city.lower()))]
        for trace in (hours_year_group(df) * 100).round().plot(
            labels={"value": "occupation %"},
            color_discrete_sequence=["#aa4", "#4a4", "#4aa", "#48a", "#f00"],
        ).data:
            if i != 0:
                trace.showlegend = False
            fig.add_trace(trace, row=i+1, col=1)
    fig.update_layout(
        title=title or "mean occupation per hour of day", height=1000,
    ).show()
    
per_city_plot(occupied_open)

Obviously, Lübeck has parking lots that have closed even before 20:00 at some point. Apart from that, the Lübeck plot actually shows something i am looking for: Through working hours the occupation rate is similar to the years before 2020 while the evenings are certainly less occupied.

Freiburg also shows this little peak at 20:00 which is most likely caused by the closing hours problem and not by party goers.

Dresden shows a different picture. Seems like in 2020 more cars are simply left standing in the garage during the night. Dresden is quite a nice town with a lot of cool places to visit during the night–if there is no emergency decree, that is.

And as seen previously, Ingolstadt’s number of parked cars is growing over the years. In 2016 people stayed out longer compared to the other years.

Okay, well, please be aware! These are all just my assumptions. To proof anything, each parking lot has to be inspected individually. That is not what i want to do in this post. It has already a couple of Megabytes of javascript in it. I’ll stick with these average statistics but remember, if the river is half a meter deep on average, that does not mean that the cow is not going to drown when crossing it.

Finally, i’ll just repeat the above plot but for two particular weekdays: Wednesday and Sunday.

per_city_plot(
    occupied_open[occupied_open.index.map(lambda d: d.weekday() == 2)],
    title="mean occupation per hour of day on Wednesdays",
)
per_city_plot(
    occupied_open[occupied_open.index.map(lambda d: d.weekday() == 6)],
    title="mean occupation per hour of day on Sundays",
)

Thanks for reading!

Some applause to the parkenDD people and, really, don’t drink and drive!