Franchu's blog

United Kingdom's Met Office dataset preliminary analysis

The United Kingdom's Met Office recently released temperature data for about 1700 weather stations across the globe from 1701 to 2009.

As lately I have been trying to learn how to use R and Processing, I decided that I would give a try to this dataset.

The first thing that I tried to do was to understand how the stations of the dataset are geographically distributed. Stations distribution

Even if the projection is not very friendly, it is possible to recognise the main mass bodies on Earth. The density of stations is non uniform with some areas over represented and some areas under represented. This might affect the stability and validity of global averages over time.

Next I was interested in seeing how many data was available each year, so I did a quick plot: Number of measurements per year

It can be seen how the number of measurements increased dramatically in the middle 20th century. But what caused that sharp increase in the amount of data?

Checking the evolution of the number of stations over time, we get our answer. Number of stations per year

It can be seen how the number of stations has increased over time for each region.

After understanding a bit better the evolution of the number of stations, I was interested in trying to see if I could find any meaningful pattern in the temperature data. So first I did an exploratory plot with the average monthly temperature for each region. Temperature evolution per region

The first thing that caught my attention is the seasonal variation of the temperatures, and that displaying them in a scatter plot makes it unintuitive to understand that the right end of a plot is connected to the left end of the same plot (December - January). Then I decided to give it a try using polar coordinates. Temperature evolution (North region)  - polar plot

Temperatures are represented radially, the angular magnitude corresponds to the months in a calendar year, while colors represent the years. The fact that the ellipse is not centered shows the seasonality of the data.

Then I decided to try to get one step further and try to show in an animation the temporal evolution of this data, and with my first Processing script ever, I created the following animation.

Global average temperature evolution 1701 - 2009 from Miguel Eduardo Gil Biraud on Vimeo.

Just to finish the preliminary analysis, I decided to check the evolution of the temperature averages over time, and to do that I did the analysis for the north region. Temperature yearly average (North region)

This graph shows a clear increase in the yearly temperature averages in the last 50 year! In a similar way that the visualization done by EagerEyes does. But is it the real story? Temperature yearly average (North region) with latitude information

This is the same plot as before, but we have added in a color coded scale the mean latitude value for each measurement. The fact that the number of stations in the dataset changes over time, brings the mean latitude of the stations south (almost 10 degrees). Therefore, not all the temperatures have the same reference level. All in all, the chart is a case of apples to oranges comparison and it is telling a misleading story. If we plot explicitly the mean latitude variation, and the mean temperature variation, we can see that the variations follow each other. Temperature yearly average (North region) with latitude information

This relationship between both magnitudes can be measured by the correlation factor, and in this case it is -0.7115842. Even if correlation does not imply causation, it should be a clear indicator for anyone to pay extra attention to the way in which the manipulate and present the information, as it is very easy to produce visualisations that will support a given idea even if the data says something different.

For the rest of the regions the correlation between temperature and latitude mean values are:


The difference in sign for the North and South are due to the fact that latitudes have opposite sign. While in the north lower latitudes bring the position closer to the tropic (higher temperatures), in the south this effect is achieved with higher latitudes.

In the next days I will post the R scripts I used to analyse the data, as well as the Processing program so that you can reproduce this analysis.

In these graphs, climate change cannot be seen, and it is meant as an exercise to illustrate how easy it is to produce plots that are misleading. Unfortunately, as of today, I lack the skills to reproduce the analysis that have been published in peer-reviewed papers with this dataset, but if you know how to do it, please go ahead and show us! I am eager to learn :)

As usual, I will be very grateful for any comments you have about how to improve the visualisations and the analysis.