To start with the article ,I’d like to thank Suven Consultants to give me this wonderful opportunity to get to know about this Project Idea and to work on it as a Beginner.
So, getting started with this have a look about our objective and dataset given.
Main objective of this project:
- Perform data cleaning,
- Perform analysis for testing the given Null Hypothesis (H0) &
- Derive Conclusions after Analysis.
Null Hypothesis given(H0): ”Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”
The H0 means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period. So,we are basically resampling your data from hourly to monthly, then comparing the same month over the 10 year period.
About the Dataset:
One type of data that’s easier to find on the net is Weather data. Many sites provide historical data on many meteorological parameters such as pressure, temperature, humidity, windspeed, visibility, etc. The SuvenML team has downloaded one such weather dataset from kaggle.
The dataset has hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe. You can download the dataset from the Google drive link.
Implementation of the Project:
Importing Libraries and Dataset:
To make use of the functions in a module, you’ll need to import the module with an import statement. An import statement is made up of the import keyword along with the name of the module.
Syntax : import (module name)
Cleaning of Dataset:
(a) Finding the missing values:We need to know total values missing out from the dataset in order to reduce complexities for further working on it.
So, here we can see the ‘Precip Type’ Column has 517 null values,so to avoid further complications, we’ll replace these null values to NaN values, standing for Not a Number, is a member of a numeric data type that can be interpreted as a value that is undefined or unpresentable because we really don't know what could be the type of value there.
Change the format of data for better analysis
Converted the ‘Formatted Date’ column to standard Python datetime format for easier analysis.
Resample data from hourly to month wise
The data in the dataset is hourly values, we resample the entire dataset to monthly values to meet our analysis requirements.
Analysis plots of temperature & humidity over the range of years in the dataset
Variation in apparent temperature & humidity with time (in years)
Monthly analysis for all 12 months over the 10 year period
Plots of all the months spanning over 10 years.
- No change in average humidity observable.
- Increase in average apparent temperature can be seen in the year 2009 then again it dropped in 2010 then there was a slight increase in 2011 then a significant drop is observed in 2015 and again it increased in 2016.
- According to Null Hypothesis (H0) both increases due to Global Warming is proven wrong here, and thus null hypothesis failed.