The DWD (German weather Service) provides precipitation data with 1 minute resolution at the following url.

To get information about the parameters in the dataset download the meta data. Inside this zip folder you will find a file called Metadaten_Parameter_... .html with the relevant information.

In this tutorial we will use the following packages:

library(dplyr) # for a tidy workflow
library(rvest) # to scrape web page data
library(readr) # to read csv files
library(stringr) # for string conversions
library(lubridate) # to do datetime tasks

To download the data we first set the base url:

baseurl <- "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/1_minute/precipitation/historical/"

get folder structure

Files are stored in folders named after the year. To scrape the available years we use the rvest package:

years <- read_html(baseurl) %>%
  html_elements("a") %>%
  html_attr("href")

This queries all the a elements (links) of the page and extracts the href attribute.

We get a vector with the follwing content:

years
##  [1] "../"   "1993/" "1994/" "1995/" "1996/" "1997/" "1998/" "1999/" "2000/"
## [10] "2001/" "2002/" "2003/" "2004/" "2005/" "2006/" "2007/" "2008/" "2009/"
## [19] "2010/" "2011/" "2012/" "2013/" "2014/" "2015/" "2016/" "2017/" "2018/"
## [28] "2019/" "2020/" "2021/" "2022/"

Since the first element is a link to the parent folder, we exclude it:

years <- years[-1]

To extract the month folders we use a similar approach. Let’s explain it first without a loop:

station <- "00880"
year <- years[1]
months <- read_html(paste0(baseurl, year)) %>%
    html_elements("a") %>%
    html_attr("href") %>%
    str_subset(paste0("1minutenwerte_nieder_", station))

With paste0(baseurl, year) we add the year to the url. With str_subset we extract only those filenames that correspond to the station that we want to download.

The station id can be taken from this file.

Downloading all data

To start downloading data we first have to create a download folder:

data_folder = "DWD_data_1minute"
dir.create(data_folder)

With a nested loop we download the data for every month for every year.

for(year in years){
  months <- read_html(paste0(baseurl, year)) %>%
    html_elements("a") %>%
    html_attr("href") %>%
    str_subset(paste0("1minutenwerte_nieder_", station))
  
  for(month in months){
    url <- paste0(baseurl, year, month)
    download.file(url, destfile = file.path(data_folder, basename(url)))
  }
}

read data

To read the downloaded data we use the readr package. The function read_delim offers two functionalities that come in handy here:

  • read several files at once and combine the data into a data frame.
  • read directly from zip file without the need to extract the files first.
files <- list.files(data_folder, full.names = T)
data <- read_delim(files, delim = ";", trim_ws = T, na = "-999", col_types = "dccddddd-") %>%
  mutate(MESS_DATUM_BEGINN = as.POSIXct(MESS_DATUM_BEGINN, format="%Y%m%d%H%M")) %>%
  mutate(MESS_DATUM_ENDE = as.POSIXct(MESS_DATUM_ENDE, format="%Y%m%d%H%M")) 

The options we use here are:

  • delim: set the delimiter to ;
  • na: set the value -999 to be NA
  • trim_ws: trim whitespace around the data values
  • col_types: define column types as double (d) or character (c)

We use the character column type to read the datetime columns, since the date time column format (T) fails in this case. We use as.POSIXct() with a custom format string to convert the datetime columns without errors.

save data

We can save the data with the following command:

save(data, file = "DWD_1_minute_rainfall.RData")

This way we can load it directly into our next data analysis project.