There’s a new parking scraper in town

… just, that new scraper is mine. There are people doing this some years longer! So let me instead introduce parkendd.de by Offenes Dresden. It’s all open and the archive is available for download.

This post is similar to some of my other data investigation posts in the sense that i simply start coding and see what comes out of it. For a change, all code is included so you do not need to check the jupyter notebook.

Currently (end of 2021) data from 2015 to 2020 is packaged into a big tar.xz file, which i will convert to tar.gz because it’s easier to read in python.

wget https://parkendd.de/dumps/Archive.tar.xz
xz -dc Archive.tar.xz | gzip -cf9 > parkapi-2020.tar.gz

The xz compression actually seems to be a good choice because the filesize expands from 200 to 500 megabytes with gz. Not a problematic number, though. However, the uncompressed tar file is about 3.6 Gigabytes. I want to use pandas and experience tells me that loading a Gigabyte csv will usually not fit into memory. Even if it does, all operations that copy data will eventually kill the python kernel.

So, i’ll iterate through all files in the archive - each representing one parking lot per year - resample them to averaged 1 hour buckets and gradually merge them into a single DataFrame. I want to look at years 2016 to 2020, so that’s about 44,000 hour steps for a 100+ paring lots which should fit into anyone’s memory.

from pathlib import Path
import tarfile
import codecs
import re
from typing import Generator, Tuple, Union, Optional, Callable

from tqdm import tqdm
import requests
import pandas as pd
import numpy as np
import plotly
import plotly.express as px
from plotly.subplots import make_subplots

pd.options.display.max_columns = 30
pd.options.plotting.backend = "plotly"
plotly.templates.default = "plotly_dark"

def iter_archive_dataframes(
    filename: Union[str, Path],
    resampling: str = "1h",
) -> Generator[Tuple[str, pd.DataFrame], None, None]:
    
    # tarfile does handle the gzip automatically
    with tarfile.open(filename) as tfp:
        
        # build map of lot_id to available csv filenames 
        #   i ignore 2015 since it's incomplete
        lot_id_filenames = dict()
        for filename in sorted(tfp.getnames()):
            if "backup" not in filename:
                match = re.match("(.*)-(20\d\d).csv", filename)
                if match:
                    lot_id, year = match.groups()
                    if year != "2015":
                        lot_id_filenames.setdefault(lot_id, []).append(filename)
        
        # for each lot
        for lot_id, filenames in lot_id_filenames.items():
            # if we have years 2016 - 2020
            if len(filenames) == 5:
                # build one DataFrame, resampled to 1 hour
                dfs = []
                for filename in filenames:
                    fp = tfp.extractfile(filename)
                    dfs.append(pd.read_csv(
                        codecs.getreader("utf-8")(fp), 
                        names=["date", "free"]
                    ))
                df = pd.concat(dfs, axis=0)
                df["date"] = pd.to_datetime(df["date"])
                try:
                    df = df.set_index("date").resample(resampling).mean()
                    yield lot_id, df
                except:
                    pass

archive_file = Path("~/prog/data/parking/parkapi-2020.tar.gz").expanduser()
table_file = Path("~/prog/data/parking/parkapi-2020-1h.csv").expanduser()

if not table_file.exists():
    big_df = None
    for lot_id, df in tqdm(iter_archive_dataframes(archive_file)):
        df["lot_id"] = lot_id
        df = df.reset_index().set_index(["date", "lot_id"])
        if big_df is None:
            big_df = df
        else:
            # append rows and sort by date
            big_df = pd.concat([big_df, df]).sort_index()
    
    # x = lot_id, y = date
    big_df = big_df.unstack("lot_id")
    # drop the "free" label from columns, just keep lot_id
    big_df.columns = big_df.columns.droplevel()
    # store
    big_df.to_csv(table_file)

else:
    # read the file if it was already created
    big_df = pd.read_csv(table_file)
    big_df["date"] = pd.to_datetime(big_df["date"])
    big_df.set_index("date", inplace=True)
    big_df.columns.name = "lot_id"
    
big_df

lot_id	aalborgcwobel	aalborgfriis	aalborgføtex	aalborggåsepigen	aalborgkennedyarkaden	aalborgkongrescenter	aalborgmusikkenshus	aalborgpalads	aalborgsalling	aalborgsauersplads	aalborgsømandshjemmet	aarhusbruunsgalleri	aarhusbusgadehuset	aarhuskalkværksvej	aarhusmagasin	...	luebeckpferdemarkt	luebeckradissonhotel	muensterbusparkplatz	oldenburgccoparkdeck1	oldenburgccoparkdeck2	oldenburgcity	oldenburggaleriakaufhof	oldenburghbfzob	oldenburgheiligengeisthoefe	oldenburgpferdemarkt	oldenburgschlosshoefe	oldenburgtheatergarage	oldenburgtheaterwall	oldenburgwaffenplatz	zuerichparkgarageamcentral
date
2016-01-01 00:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.000000	0.000	0.0	200.0	200.0	383.083333	269.0	126.666667	268.0	384.916667	309.250000	53.750000	64.833333	650.000000	0.000000
2016-01-01 01:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.000000	0.000	0.0	200.0	200.0	384.000000	269.0	129.750000	269.0	388.750000	316.833333	58.750000	67.083333	650.000000	0.000000
2016-01-01 02:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.000000	0.000	0.0	200.0	200.0	384.000000	269.0	133.250000	269.0	391.500000	321.083333	60.166667	79.250000	650.000000	0.000000
2016-01-01 03:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.000000	0.000	0.0	200.0	200.0	384.000000	269.0	133.833333	269.0	400.083333	323.416667	61.583333	77.500000	650.000000	0.000000
2016-01-01 04:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.000000	0.000	0.0	200.0	200.0	384.000000	269.0	133.000000	269.0	401.000000	324.000000	61.000000	78.083333	650.000000	0.000000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2020-12-31 19:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	63.000000	178.0	350.000000	...	46.000000	64.875	0.0	0.0	0.0	0.000000	0.0	111.000000	154.0	399.666667	428.000000	90.000000	69.416667	502.416667	38.000000
2020-12-31 20:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	63.000000	178.0	350.000000	...	46.000000	64.750	0.0	0.0	0.0	0.000000	0.0	111.000000	154.0	399.500000	428.000000	90.000000	67.416667	503.000000	38.000000
2020-12-31 21:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	61.666667	178.0	349.916667	...	7.666667	10.750	0.0	0.0	0.0	0.000000	0.0	111.750000	154.0	399.416667	428.000000	90.000000	66.833333	503.000000	38.000000
2020-12-31 22:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	57.750000	178.0	350.000000	...	0.000000	0.000	0.0	0.0	0.0	0.000000	0.0	113.000000	154.0	400.416667	428.000000	90.000000	67.166667	502.083333	12.666667
2020-12-31 23:00:00	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0	56.000000	178.0	350.000000	...	0.000000	0.000	0.0	0.0	0.0	0.000000	0.0	113.000000	154.0	395.333333	428.000000	90.000000	67.250000	0.000000	0.000000

43848 rows × 127 columns

Don’t mind all the NaNs, the city of Aalborg is not scraped throughout the whole period. But Oldenburg seems to look good. Without further number crunching let’s do a quick interactive plot, resampled to 1 week buckets:

(big_df
 .resample("1w").mean()
 .round()  # the round saves about 300 Kb of javascript code 
 .plot(
     title=f"average number of free spaces per week ({big_df.shape[1]} lots)", 
     labels={"value": "number of free spaces", "date": "week"}
 )
)

As usual, you can drag and zoom, and hide the inidividual lots on the right side (doubleclick to hide all except one).

Now that i’m actually able to look at parking data predating this stupid covid pandemic i’ll pose two simple research questions

Is the lockdown around Germany in beginning of 2020 visible in the parking lot occupation data?
Has anything in the parking behaviour significantly changed compared to before?

First of all, when checking the plots above, a few cities have big chunks of missing data, Aalborg for example. It’s a shame but i’ll exclude them. Moreover, there are smaller gaps. Sometimes it happens that the number of free spaces listed on a website gets stuck, or is not listed at all, while other lots on the same site work fine. I’ll count the number of times that the average value does not change during three days. Specifically since year 2018:

df = (
    big_df[(big_df.index >= "2018-01-01")]
    .resample("1d").mean()
    .replace(np.nan, 0)  # treat missing values as zero
)
num_equal_days = ((df == df.shift(1)) & (df == df.shift(2))).astype(int).sum()
num_equal_days.sort_values().plot.bar(
    title="Number of times that 3 consecutive days have unchanged number of free spaces",
    height=600,
)

By visual inspection and comparison with the plot on top i decide to cut everything above 100, and also remove the Zurich lot because it misses data exactly at the time in question:

big_df = big_df.loc[:, (num_equal_days <= 100) & (big_df.columns != "zuerichparkgarageamcentral")]
big_df.shape

(43848, 53)

Okay, 53 lots remain. Now it would be great to normalize each lot using the total capacity.

big_df.max()

lot_id
aarhusbusgadehuset                97.166667
aarhussalling                    700.000000
dresdenaltmarkt                  439.000000
dresdenaltmarktgalerie          9868.000000
dresdenanderfrauenkirche         140.000000
dresdencentrumgalerie           3771.416667
dresdenfrauenkircheneumarkt      296.000000
dresdenkaditz                    377.000000
dresdenkongresszentrum         26245.000000
dresdenparkhausmitte             432.333333
dresdenpirnaischerplatz          145.000000
dresdenprohlis                   192.250000
dresdenreitbahnstrasse           409.916667
dresdensarrasanistrasse         1360.166667
dresdenschiessgasse              999.000000
dresdenterrassenufer             244.000000
dresdentheresienstrasse          159.000000
dresdenwiesentorstrasse          185.333333
dresdenwoehrlflorentinum         323.583333
dresdenworldtradecenter          314.416667
freiburgambahnhof                242.000000
freiburgbahnhofsgarage           224.000000
freiburgkarlsbau                 977.000000
freiburgkonzerthaus              453.000000
freiburgmartinstor               142.000000
freiburgrotteck                  312.000000
freiburgschlossberg              440.000000
freiburgschwarzwaldcity          436.250000
freiburgzaehringertor            100.000000
ingolstadtcongressgarage         453.000000
ingolstadthallenbad              661.666667
ingolstadthauptbahnhofost        240.000000
ingolstadtmuenster               750.000000
ingolstadtnordbahnhof            231.083333
ingolstadtreduittilly            356.000000
ingolstadttheaterost             595.000000
ingolstadttheaterwest            514.333333
luebeckbackbord                  135.000000
luebeckfalkenstrasse             150.000000
luebeckhaerdercenter             212.000000
luebeckhafenbahnhof              108.833333
luebeckkanalstrasse2             216.000000
luebeckkanalstrasse3             197.000000
luebeckkanalstrasse4             284.000000
luebeckkanalstrasse5              45.000000
luebecklastadiep3                 34.000000
luebecklastadiep4                 17.000000
luebecklastadiep5                253.916667
luebeckleuchtenfeld              750.000000
luebecklindenarcaden             400.000000
luebeckmitte                     420.000000
luebeckmuk                       367.000000
luebeckradissonhotel              73.000000
dtype: float64

Ah, well, the congress center in Dresden probably does not had 26 thousand spaces. I’ll first clamp the dataframe to, let’s say, 2000, just to remove the most obvious outliers

big_df = big_df.clip(0, 2000)

and then ask the ParkAPI for more precise values. The endpoint is https://api.parkendd.de/<City> which returns static and live data for each lot per city:

CITIES = ["Aarhus", "Dresden", "Freiburg", "Ingolstadt", "Luebeck"]
lot_infos = dict()
for city in CITIES:
    response = requests.get(f"https://api.parkendd.de/{city}")
    for lot in response.json()["lots"]:
        lot["city"] = city
        lot_infos[lot["id"]] = lot

lot_infos["dresdenkongresszentrum"]

{'address': 'Ostra-Ufer 2',
 'coords': {'lat': 51.05922, 'lng': 13.7305},
 'forecast': False,
 'free': 234,
 'id': 'dresdenkongresszentrum',
 'lot_type': 'Tiefgarage',
 'name': 'Kongresszentrum',
 'region': 'Ring West',
 'state': 'open',
 'total': 250,
 'city': 'Dresden'}

Well 26,000 was only two magnitudes above the truth.

lot_infos["luebeckbackbord"]

{'coords': {'lat': 53.970161, 'lng': 10.880241},
 'forecast': False,
 'free': 0,
 'id': 'luebeckbackbord',
 'lot_type': 'Parkplatz',
 'name': 'Backbord',
 'region': 'Parkplätze Lübeck',
 'state': 'open',
 'total': 0,
 'city': 'Luebeck'}

Lübeck does not provide a total value. The website that is scraped can be determined from the geojson file of the Lübeck-scraper (or from https://api.parkendd.de/). It actually seems to be offline right now. So i’ll use the official numbers if present and the maximum free value otherwise:

official_capacity = pd.Series(
    big_df.columns.map(lambda c: lot_infos[c]["total"] or None), 
    index=big_df.columns
).dropna()

capacity = big_df.max()
capacity[official_capacity.index] = official_capacity

# lot occupation in range [0, 1]
occupied = 1. - (big_df / capacity).clip(0, 1)

(occupied
 .groupby(lambda c: lot_infos[c]["city"], axis=1).mean()
 .resample("1m").mean() * 100.
).plot(
    title="Average lot occupation per month and city",
    labels={"value": "occupation percentage", "date": "month"}
)

Alright. There it is. A pretty obvious dent! With the least occupation during April 2020. That’s how i remember it. Kids skating on empty parking lots, no planes in the sky, no stupid shops selling useless things.

For the interested, here’s the same plot for each lot:

(occupied.resample("1m").mean() * 100.).round().plot(
    title="Average lot occupation per month and lot",
    labels={"value": "occupation percentage", "date": "month"}
)

There are more ways of looking at the occupation data. Instead of calculating the average for each week we can build a histogram of the occupation values. This shows all levels of occupation during each week:

def plot_histogram(
        df: pd.DataFrame, 
        resample: str = "1w", 
        bins: int = 48, 
        range: Optional[Tuple[float, float]] = None,
        clip: Optional[Tuple[float, float]] = None,
        title: Optional[str] = None,
        labels: Optional[dict] = None,
):
    if range is None:
        df_n = df.replace(np.nan, 0)
        range = (np.amin(df_n.values), np.amax(df_n.values))
    df = pd.concat(
        (pd.Series(np.histogram(group, bins=bins, range=range)[0], name=key)
        for key, group in df.resample(resample, level="date")),
        axis=1
    ).replace(0, np.nan)
    df.index = np.linspace(*range, bins)
    if clip is not None:
        df = df.clip(*clip)
    return px.imshow(
        df, origin="lower",
        title=title or "Weekly histogram of occupation per lot",
        labels=labels or {"y": "occupation percentage", "x": "week"},
        color_continuous_scale=["#005", "#08f", "#8ff", "#fff", "#fff"]
    )

# ignore values that are exactly zero or one
#   as they are usually *bad data* (see below)
plot_histogram(occupied.replace({0: np.nan, 1: np.nan}) * 100.)

So, starting end of March 2020, the most reported lot occupation is between 0 and 15%. The situation kind of normalizes in June and kind of returns in November.

What are these small horizontal stripes you ask? And what happened in the beginning of 2018?

The short 2018 outage is probably some internal server problem. You know, disk full, provider problems. There is no indication in the commit history.

To investigate the stripes, i’ll spend few more Megabytes of generated javascript and look at a few lots in pariticular:

def plot_lot_data(lot_id: str, filter: Optional[Callable] = None):
    fig = make_subplots(
        rows=2, cols=1,
        vertical_spacing=0.1,
        shared_xaxes=True,
        subplot_titles=["weekly occupation histogram", "number of free spaces per hour"],
    )
    filter = filter or (lambda df: df)
    df = filter(occupied[lot_id])
    histo = plot_histogram(df * 100)
    fig.add_trace(histo.data[0], row=1, col=1)
    fig.add_trace(
        filter(big_df[lot_id]).round().plot().data[0].update(showlegend=False), 
        row=2, col=1,
    )
    return fig.update_layout(
        coloraxis=histo.layout.coloraxis, 
        title=f"{lot_id} (capacity: {lot_infos[lot_id]['total']})", height=700
    )

plot_lot_data("dresdenparkhausmitte")

Obviously, a horizontal stripe means that the free-spaces-counter stood still somehow. Except for the stripes at 0% occupation starting at the end of 2019. They are caused by the reported number of free spaces being larger than the reported lot capacity, which is (by the time of writing this article) 280. This garage must have decreased it’s capacity in the meantime. It would be helpful if the recorded capacity would be published in the archive as well. Otherwise we must trust the maximum value which is 432 for this recording. However, if you zoom in at Oct 1st to 4th 2016 when this maximum was reached, you’ll notice a completely unrealistic looking period of 200+ spaces. Also note that the little free peaks that occur each day around that period are upside-down within! It may still be possible that some real-life event has caused that but i find it more likely to be some digital mess-up.

plot_lot_data("freiburgambahnhof")

At first glance, the parking lot in Freibug looks much more lively compared to the one above. But please zoom in at the flat-line in winter 2016/17. There is obviously no real car activity but still the number of reported free spaces changes between zero and 62 each day in a super regular pattern reminding on opening hours. They only publish free places during opening hours. You know, that might make sense for drivers but it just makes interpreting the data harder. Since the outage in April 2018 they seem to be open 24/7 and data is published continously. Still, looking closely at some points it becomes hard to determine, for myself at least, if this is real car-in car-out activity. The patterns are so regular at times, e.g. from one weekend to the next, that i find it either creepy or not completely trustworthy.

plot_lot_data("ingolstadtreduittilly")

This one’s interesting. The number of cars in Ingolstadt seems to be growing. Although, once again, zooming in on the data reveals some strange jumps of the occupied spaces during the night from one week to the next which do not look like a reflection of 3d events. Or could this actually be gradual steps back towards working-life after the first lock-down?

Changes to the capacity, whether real or digital, do affect the number of free spaces. And i start to realize that it’s actually hard work to sample true car-activity just from the published number of free spaces.

Gradients do have the same problem. I thought: lets just look at the difference to the previous day or something like that. This will at least mitigate the opening-hours problem and some other automatic or purely digital changes that the free-spaces counter might be subject to. Example:

df = big_df["freiburgambahnhof"]
df = df[(df.index >= "2020-01-01") & (df.index < "2020-06-01")]

fig = make_subplots(
    rows=4, cols=1,
    vertical_spacing=0.02,
    shared_xaxes=True,
    subplot_titles=[
        "free spaces per hour", "difference to previous hour", 
        "difference to previous day", "difference to previous week"
    ],
)
fig.add_trace(df.plot().data[0], row=1, col=1)
fig.add_trace(df.diff(1).plot().data[0], row=2, col=1)
fig.add_trace(df.diff(24).plot().data[0], row=3, col=1)
fig.add_trace(df.diff(24*7).plot().data[0], row=4, col=1)
fig.update_layout(
    height=1300, showlegend=False, 
    title="'freiburgambahnhof' free spaces and gradients (2020/01 - 2020/05)"
)

One can see things, still, it is hard to interpret this data automatically.

Fine. I’m not a paid scientist, not even a scientist, but i want to scrutinize question #2 a bit: Has anything in the parking behaviour significantly changed compared to before? I mean, apart from the fact that there is less parking, anyways. So i’ll try to look at the occupation per hour-of-day. In my previous parking post i found that there are some hints if occupation is driven by work & shopping activity or by more leisurely demands.

But first i need to check the opening hours problem. If a lot does list zero free spaces at some point that translates to 1.0 in the occupied DataFrame and so i’ll simply count the number of times that a lot has full occupation for each hour of day:

zero_df = pd.concat([
    (occupied[occupied.index.hour == hour] == 1).astype(int).sum()
    for hour in range(24)
], axis=1)
zero_df.columns.rename("hour of day", inplace=True)
zero_df

hour of day	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23
lot_id
aarhusbusgadehuset	6	5	5	5	5	6	6	6	6	5	4	4	5	5	5	5	5	5	5	5	5	5	5	5
aarhussalling	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
dresdenaltmarkt	15	15	15	15	10	9	8	7	9	17	58	151	182	168	160	145	138	149	136	66	26	18	15	15
dresdenaltmarktgalerie	8	8	8	8	6	5	5	4	5	15	97	210	219	182	156	121	93	63	22	8	8	8	8	8
dresdenanderfrauenkirche	27	27	26	26	26	29	30	28	29	33	41	36	29	24	27	26	25	29	23	22	24	25	26	28
dresdencentrumgalerie	30	30	30	30	26	23	21	18	20	26	56	168	193	154	115	77	54	38	30	30	30	30	30	30
dresdenfrauenkircheneumarkt	42	42	42	42	36	27	26	22	27	28	40	97	115	107	88	88	94	137	156	135	84	50	44	42
dresdenkaditz	1	0	0	0	0	1	2	2	1	1	2	0	0	2	1	2	2	0	0	3	1	1	1	1
dresdenkongresszentrum	1	0	0	0	0	0	2	17	45	56	59	53	51	41	26	19	15	24	31	28	24	17	12	6
dresdenparkhausmitte	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	2	3	3	2	1	1	0
dresdenpirnaischerplatz	14	6	6	7	8	9	6	2	9	55	223	386	385	327	306	311	369	514	523	332	113	45	26	18
dresdenprohlis	17	13	3	4	3	1	2	1	5	8	14	17	15	17	15	15	19	19	18	16	16	17	17	17
dresdenreitbahnstrasse	33	31	31	31	23	15	16	13	48	210	407	504	497	446	428	425	419	346	183	78	49	40	37	35
dresdensarrasanistrasse	53	52	54	55	38	16	16	17	57	105	119	123	112	109	105	122	124	132	191	221	168	96	58	54
dresdenschiessgasse	57	54	54	54	49	39	56	86	188	500	864	1003	849	626	479	454	521	694	725	509	219	92	68	63
dresdenterrassenufer	27	27	27	27	24	16	14	19	92	249	435	559	525	369	265	205	184	264	294	224	95	47	35	30
dresdentheresienstrasse	1	1	1	1	1	1	1	2	4	1	1	1	2	1	1	0	1	5	10	3	1	1	1	1
dresdenwiesentorstrasse	92	91	91	92	86	79	76	63	59	59	72	82	96	98	95	104	121	121	138	156	146	107	96	93
dresdenwoehrlflorentinum	1	1	1	1	0	0	0	0	0	0	20	40	31	23	6	3	2	1	1	1	1	1	1	1
dresdenworldtradecenter	0	0	0	0	0	0	0	0	1	9	9	9	5	8	9	9	3	1	4	15	7	0	0	0
freiburgambahnhof	491	600	603	606	599	254	8	8	7	6	7	10	10	7	6	7	7	7	7	7	7	7	7	190
freiburgbahnhofsgarage	49	49	49	49	49	49	49	54	67	71	70	84	80	70	65	61	50	50	50	51	49	49	49	49
freiburgkarlsbau	7	7	7	7	7	7	6	5	5	5	7	6	8	8	8	8	7	7	7	7	7	7	7	7
freiburgkonzerthaus	3	3	3	3	3	3	4	4	4	4	7	9	7	7	3	3	3	3	5	3	3	3	3	3
freiburgmartinstor	553	550	551	551	543	285	57	38	47	58	69	79	72	66	58	53	48	45	42	39	37	34	207	466
freiburgrotteck	1	1	1	1	1	1	0	0	0	0	7	8	11	5	2	3	1	1	1	1	1	1	1	1
freiburgschlossberg	9	9	9	9	9	9	8	8	8	10	30	37	29	21	20	16	12	10	10	9	9	9	9	9
freiburgschwarzwaldcity	1543	1538	1544	1544	1543	838	302	302	301	299	288	289	289	289	291	295	295	295	295	295	423	1132	1545	1545
freiburgzaehringertor	603	600	603	603	594	249	4	5	4	6	5	5	6	4	2	2	2	2	334	604	604	605	605	604
ingolstadtcongressgarage	0	0	0	0	0	0	1	17	13	16	7	5	7	2	0	0	0	0	0	0	0	0	0	0
ingolstadthallenbad	0	0	0	0	0	0	0	5	8	3	0	0	1	1	0	1	0	0	1	0	0	0	0	0
ingolstadthauptbahnhofost	0	0	0	0	1	7	6	32	29	29	27	7	3	1	3	1	0	0	0	0	0	0	0	0
ingolstadtmuenster	2	2	2	2	2	2	2	2	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
ingolstadtnordbahnhof	0	0	0	0	0	0	45	53	67	20	5	1	0	0	0	0	0	1	1	1	0	0	0	0
ingolstadtreduittilly	0	0	0	0	0	0	0	1	2	3	1	0	0	0	0	0	0	0	0	0	0	0	0	0
ingolstadttheaterost	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0
ingolstadttheaterwest	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0
luebeckbackbord	1779	1774	1779	1779	1779	732	11	12	12	15	19	20	25	30	29	24	18	18	23	21	72	992	1770	1775
luebeckfalkenstrasse	1779	1774	1779	1779	1779	895	293	293	291	288	290	293	297	293	300	305	303	302	304	311	387	1220	1776	1776
luebeckhaerdercenter	1780	1775	1780	1780	1780	732	20	25	30	73	113	104	94	91	83	62	47	40	36	34	100	1101	1775	1776
luebeckhafenbahnhof	1780	1775	1780	1780	1780	733	17	18	17	20	29	26	23	24	23	23	21	21	24	23	72	995	1770	1776
luebeckkanalstrasse2	1537	1532	1537	1537	1537	653	19	21	20	20	20	17	17	18	21	18	20	22	30	19	61	916	1534	1535
luebeckkanalstrasse3	1537	1532	1537	1537	1537	706	89	104	153	279	328	278	232	203	202	222	286	349	351	274	242	997	1534	1535
luebeckkanalstrasse4	1546	1541	1547	1547	1547	677	47	66	97	132	116	93	85	84	95	102	129	162	177	155	178	976	1543	1544
luebeckkanalstrasse5	1549	1544	1549	1549	1549	676	45	134	307	366	286	203	157	150	118	127	124	129	146	98	110	942	1547	1548
luebecklastadiep3	1781	1775	1780	1780	1780	736	22	33	39	52	73	99	111	115	106	101	96	136	186	146	194	1112	1774	1776
luebecklastadiep4	1785	1779	1784	1784	1784	1132	413	288	233	237	284	341	360	363	374	403	432	486	547	627	686	1255	1783	1782
luebecklastadiep5	1781	1775	1780	1780	1780	779	56	59	61	68	73	84	90	94	96	95	124	173	197	129	168	1086	1774	1776
luebeckleuchtenfeld	1779	1774	1779	1779	1779	750	36	37	38	49	70	93	110	104	89	71	69	67	67	66	116	1005	1770	1775
luebecklindenarcaden	1776	1771	1776	1776	1776	739	24	27	29	33	30	28	24	23	21	19	19	16	17	16	78	953	1686	1772
luebeckmitte	1779	1774	1779	1779	1779	881	292	292	292	289	276	272	276	273	281	279	279	276	282	288	345	1204	1773	1776
luebeckmuk	1780	1775	1780	1780	1780	754	47	52	57	58	56	56	58	64	72	66	66	77	113	125	161	1111	1772	1775
luebeckradissonhotel	1780	1775	1780	1780	1780	730	13	18	20	19	17	17	17	17	17	13	12	13	15	15	82	1080	1773	1775

Lets see. Some Dresden lots seem to be particularily busy during the day but that could also be because of the lot occupation being too small at periods. All the Lübeck lots and one in Freiburg do obviously publish zero free spaces when closed, so that’s the ones to be careful about when calculating the occupation per hour. Though we also have seen previously that freiburgambahnhof did the same until 2018 and freiburgmartinstor and freiburgzaehringertor do look similar.

Plotting the occupation data of the Lübeck lots hints at another problem:

df = occupied.loc[:, occupied.columns.map(lambda c: c.startswith("luebeck"))]
(df[(df.index >= "2018-03-01") & (df.index < "2018-03-08")] * 100).round().plot(
    title="Occupation in Lübeck lots (March 2018)",
    labels={"value": "occupation %"}
)

First it looks like the lots open at 6:00 and close at 22:00 but there are these little edges at the corners. It’s more likely they open at 6:30 and close at 20:30 or 21:30 but the final value is lost in the average bucketing of 1-hour-steps done in the beginning. Well, if they are closed, their data does not contribute to the leisure activity anyways so i’ll simply cut off all lots that have a zero-count of more than 400 at conservative times that are safe to assume, before 7:00 and after 20:00

occupied_open = occupied.copy()
for lot_id in zero_df[zero_df[0] > 400].index:
    df = occupied_open.loc[:, lot_id]
    occupied_open.loc[:, lot_id] = df[(df.index.hour >= 7) & (df.index.hour <= 20)]

Just to make sure i plot the sample range again for all lots:

df = occupied_open
(df[(df.index >= "2018-03-01") & (df.index < "2018-03-08")] * 100).round().plot(
    title="Occupation during opening times (March 2018)",
    labels={"value": "occupation %"}
)

As far as i can determine, there are no regular hard edges any more. So then gimme that occupation per hour-of-day plot, individually for every year:

def hours_year_group(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df["year"] = df.index.year
    df["hour"] = df.index.hour
    return (
        df.reset_index().set_index(["date", "year", "hour"])
        .unstack("year")
        .groupby(level="hour").mean()
        .groupby(level="year", axis=1).mean()
    )
    
(hours_year_group(occupied_open) * 100).plot(
    title="mean occupation per hour of day",
    labels={"value": "occupation %"},
    color_discrete_sequence=["#aa4", "#4a4", "#4aa", "#48a", "#f00"]
)

Amazing, isn’t it? No, not really. And the bump at 20:00 is not making much sense. Let’s plot the mean for each city individually:

def per_city_plot(occupied_open: pd.DataFrame, title: Optional[str] = None):
    fig = make_subplots(
        rows=len(CITIES), cols=1,
        vertical_spacing=0.02,
        shared_xaxes=True,
        subplot_titles=CITIES,
    )
    for i, city in enumerate(CITIES):
        df = occupied_open.loc[:, occupied_open.columns.map(lambda c: c.startswith(city.lower()))]
        for trace in (hours_year_group(df) * 100).round().plot(
            labels={"value": "occupation %"},
            color_discrete_sequence=["#aa4", "#4a4", "#4aa", "#48a", "#f00"],
        ).data:
            if i != 0:
                trace.showlegend = False
            fig.add_trace(trace, row=i+1, col=1)
    fig.update_layout(
        title=title or "mean occupation per hour of day", height=1000,
    ).show()
    
per_city_plot(occupied_open)

Obviously, Lübeck has parking lots that have closed even before 20:00 at some point. Apart from that, the Lübeck plot actually shows something i am looking for: Through working hours the occupation rate is similar to the years before 2020 while the evenings are certainly less occupied.

Freiburg also shows this little peak at 20:00 which is most likely caused by the closing hours problem and not by party goers.

Dresden shows a different picture. Seems like in 2020 more cars are simply left standing in the garage during the night. Dresden is quite a nice town with a lot of cool places to visit during the night–if there is no emergency decree, that is.

And as seen previously, Ingolstadt’s number of parked cars is growing over the years. In 2016 people stayed out longer compared to the other years.

Okay, well, please be aware! These are all just my assumptions. To proof anything, each parking lot has to be inspected individually. That is not what i want to do in this post. It has already a couple of Megabytes of javascript in it. I’ll stick with these average statistics but remember, if the river is half a meter deep on average, that does not mean that the cow is not going to drown when crossing it.

Finally, i’ll just repeat the above plot but for two particular weekdays: Wednesday and Sunday.

per_city_plot(
    occupied_open[occupied_open.index.map(lambda d: d.weekday() == 2)],
    title="mean occupation per hour of day on Wednesdays",
)

per_city_plot(
    occupied_open[occupied_open.index.map(lambda d: d.weekday() == 6)],
    title="mean occupation per hour of day on Sundays",
)

Thanks for reading!

Some applause to the parkenDD people and, really, don’t drink and drive!