Performing Analysis Of Meteorological Data

To start with the article ,I’d like to thank Suven Consultants to give me this wonderful opportunity to get to know about this Project Idea and to work on it as a Beginner.

So, getting started with this have a look about our objective and dataset given.

Main objective of this project:

  • Perform data cleaning,

Null Hypothesis given(H0): ”Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”

The H0 means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period. So,we are basically resampling your data from hourly to monthly, then comparing the same month over the 10 year period.

About the Dataset:

One type of data that’s easier to find on the net is Weather data. Many sites provide historical data on many meteorological parameters such as pressure, temperature, humidity, windspeed, visibility, etc. The SuvenML team has downloaded one such weather dataset from kaggle.

The dataset has hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe. You can download the dataset from the Google drive link.

Implementation of the Project:

Importing Libraries and Dataset:

To make use of the functions in a module, you’ll need to import the module with an import statement. An import statement is made up of the import keyword along with the name of the module.

Syntax : import (module name)

Fig. 1: Importing libraries and the dataset.

Cleaning of Dataset:

(a) Finding the missing values:We need to know total values missing out from the dataset in order to reduce complexities for further working on it.

Fig.2 : List of values missing from the dataset.

So, here we can see the ‘Precip Type’ Column has 517 null values,so to avoid further complications, we’ll replace these null values to NaN values, standing for Not a Number, is a member of a numeric data type that can be interpreted as a value that is undefined or unpresentable because we really don't know what could be the type of value there.

Fig. 3: List of all null values after assigning NaN values.

Change the format of data for better analysis

Converted the ‘Formatted Date’ column to standard Python datetime format for easier analysis.

Fig. 4: Changing Format of Date

Resample data from hourly to month wise

The data in the dataset is hourly values, we resample the entire dataset to monthly values to meet our analysis requirements.

Fig. 5: Formatted Date

Analysis plots of temperature & humidity over the range of years in the dataset

Variation in apparent temperature & humidity with time (in years)

Fig. 6 : Variation in Apparent Temperature and Humidity with Time.

Monthly analysis for all 12 months over the 10 year period

Plots of all the months spanning over 10 years.

Fig. 7: Variations in Apparent Temperature and Humidity monthwise over 10 years

Conclusions Drawn:

  • No change in average humidity observable.