TLDR: This post is a reprint of my article for Tech Rajan G, our education platform for science and technology interwoven. The article describes a detailed dataset based on climate change which we will subsequently play with while navigating the data science and engineering track one step at a time.
We will introduce you to the Climate Change Dataset from Kaggle, one of the datasets that we will be using to get familiar with data analysis and wrangling. In our follow-up posts, we are going to walk with you through the forest of data cleaning, analysis, engineering all the way to building a final project. The project is designed to be of real world relevance as well as be a demonstration of solid data science and engineering skills that you will pick up with us.
Despite (or in addition to, perhaps) the Pandemic’s toll on the world, climate change still continues to be an extremely serious threat to human existence. Continuously increasing faverage temperatures, intense fires and other disasters have now established the dangerous effects of rising surface temperatures beyond any reasonable doubt. Despite the science and data, however, there is a perception that the idea is a fad that will also fade into oblivion over time. Up to 6 % of people across countries believe that there is no climate change taking place with skepticism and doubt, of course, being more common in some countries than others.
Among the various datasets Kaggle has featured, climate change data requires extensive cleaning and maintenance for an adequate analysis on long-term trends. From the early days of mercury thermometers to the present electronic thermometers, lots of technological and contextual changes have accompanied climate change; there are, therefore, lots of complicated biasing factors concerning these measurements. The Kaggle dataset we will be taking up in our first few tutorials is compiled by Berkeley Earth. Finding reason in the arguments posed by critics, Richard and Elizabeth Muller conceived Berkeley Earth in early 2010.
The dataset include several files; Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv) contains the most comprehensive information on land as well as ocean-and-land temperatures. Date starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures. The data is presented with uncertainties associated with each of the measurements. Please check the details of the all the fields here. We will also be using the land temperatures of major cities in our data analysis. In our subsequent exercises, we will try and answer various questions about climate change (through data wrangling methods), and subsequently ramp up on our data engineering and data science skills until we can build a weather app to settle the question posed in the title once and for all.
Stay tuned and happy coding with us! Bonus points to you if you can start thinking of data questions that we can answer by analyzing the dataset in detail.