It is not uncommon for datasets to contain a change in resolution when measurement devices are upgraded. If you want handle data with finer and coarser resolution differently you migth want to find the exact time of the devices upgrade. This is what I will be showing in this post.
We will be using the following packages:
example 1: wind data
Download wind data:
Read wind data:
To find the min resolution we use a combination of unique, diff and min:
With the min resolution as bin width we can plot a histogram of the wind speed data:
From the spikes in the histogram we see that the dataset contains data with coarser resolution. This might invalidate statitistics we want to derive from the dataset. Hence it is important to handle the coarser and finer data differently.
A scatterplot can show us when the switch to finer resolution took place:
To find the exact timing of the switch we query for the first date with non-integer values:
To confirm the date we use it to color the scatterplot:
Here a zoomed in view:
Using that date we can devide the histogram plot into an early and late dataset:
Plotting the histogram and density of both subsets (with binwidth/bandwith of 1 according tho the coarser data) we see that the distributions are different:
Another approach would be to define a function to find the min or median resolution:
We can use that to find the resolution switch (by year):
year
speed_resolution
direction_resolution
1983
1.0
10
1984
1.0
10
1985
1.0
10
1986
1.0
10
1987
1.0
10
1988
1.0
10
1989
1.0
10
1990
1.0
10
1991
0.1
10
1992
0.1
10
1993
0.1
10
1994
0.1
10
1995
0.1
10
1996
0.1
10
1997
0.1
10
This approach is not suitable to find the exact date of the switch, since we need a lot of data for the median or min statistics. Using data from one day or one hour might not be enough.
example 2: radiation data
Data can be downloaded and read similar to the wind data above.
Again we find the min resolution:
And plot a histogram.
The histogram shows that the dataset contains data in multiple different resolutions.
To find the resolution changes we again plot a scatterplot:
Here we see that there are multiple earlier phases with coarser resolution. The resolution seems to be around 4, but there seems to be 3 phases with different alignment. Also there seem to be data with finer resolution between 0 and 4.
To find the date of the change to resolution 1 we select from the graph some values that are only in the later finer resolution data and not in the coarser resolution data: values 5, 6 or 7:
This gives us the start of the finer resolution measurements on January first of 1980:
Again we can confirm that by coloring the scatterplot:
And look into the zoomed in plot:
Again we can use the found date to devide our histogram plot:
Again we can use the second approach to find the median resolution with our function:
But this approach is rather unclear if we switch to weekly statistics. There is not enough data in a week to give a consistant median resolution: