1. Introduction

Climate changes has been a urgent topic for the current century. There are consistent new regulations and directives which aim to reduce the CO2 emissions in hopes of slowing down the increasing of land temperatures. For that purpose, the researchers have decided to analyse two datasets a uncover any relationships which may exist.

1.1 Research Questions

The researchers have proposed 2 research questions to direct the analysis of such large data:

RQ1: What is the correlation between global CO2 emissions and global land warming from 1913-2012?

The researchers hypothesise that there is a positive correlation between CO2 emissions and global land warming, therefore an increase in CO2 emissions results in increased land temperatures.

The null hypothesis is that there is no correlation between CO2 emissions and land temperature and that they are independant of each other.

RQ2: Who are the evil emitters and who are the unlucky receivers?

2.1. Which country produce high CO2 emission but received less effected from global land warming? (evil emitters)

2.2. Which country produce low CO2 emission but received high effected from global land warming? (unlucky receivers)

2. Datasets and exploratory analysis

2.1 Global land temperature data by country

The researchers are using a dataset which contains data regarding the Global land temperature by country. Source: https://data.world/data-society/global-climate-change-data

## Rows: 577,462
## Columns: 4
## $ dt                            <date> 1743-11-01, 1743-12-01, 1744-01-01, 174…
## $ AverageTemperature            <dbl> 4.384, NA, NA, NA, NA, 1.530, 6.702, 11.…
## $ AverageTemperatureUncertainty <dbl> 2.294, NA, NA, NA, NA, 4.680, 1.789, 1.5…
## $ Country                       <chr> "Åland", "Åland", "Åland", "Åland", "Åla…

The Country is a categorical data and has 243 unique values, but it also includes continents: Africa, Antarctica, Asia, Europe, North America, Oceania, South America.

Interestingly, there are some country which represented in 2 different values: Denmark (Europe) - Denmark, France (Europe) - France, Netherlands (Europe), Netherlands, and United Kingdom (Europe) - United Kingdom. Most of them are almost the same except Denmark.

The dt is a discrete numerical data representing date. The dataset contains record of every 1st of the month. and here are the latest 10 record dates:

##  [1] "2013-09-01" "2013-08-01" "2013-07-01" "2013-06-01" "2013-05-01"
##  [6] "2013-04-01" "2013-03-01" "2013-02-01" "2013-01-01" "2012-12-01"

The AverageTemperature is a continues numerical data representing country’s land temperature in each month. The value is swinging up and down in pattern according to the seasons.

We can see that AverageTemperature contains a lot of NA’s because only some country data is available in the early years.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -37.66   10.03   20.90   17.19   25.81   38.84   32651

The AverageTemperatureUncertainty is also given but we are not going to use it.

2.2 Global CO2 emission by country

Source: https://ourworldindata.org/co2-dataset-sources

## Rows: 66,984
## Columns: 4
## $ Entity                 <chr> "Afghanistan", "Afghanistan", "Afghanistan", "A…
## $ Code                   <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG"…
## $ Year                   <dbl> 1750, 1751, 1752, 1753, 1754, 1755, 1756, 1757,…
## $ `Annual CO2 emissions` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…

The Entity is categorical data with 247 unique values including countries but also - Continents: Africa, Antarctica, Asia, Europe, North America, Oceania, South America - Continents with exception: Asia (excl. China and India), Europe (excl. EU-27), Europe (excl. EU-28), European Union (27), European Union (28), North America (excl. USA) - Country with income group: Low-income countries, High-income countries, Upper-middle-income countries etc. - GCP areas: French Equatorial Africa (GCP), French West Africa (GCP), etc. - Not a country: International transport

Year column is a discrete numerical data. The dataset contain annual data. Here are the 10 latest record year:

##  [1] 2021 2020 2019 2018 2017 2016 2015 2014 2013 2012

Annual CO2 emissions is a continues numerical data representing CO2 emission in tons. There is no NA’s but range from 0 to 3.712e+10.

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 1.230e+08 1.205e+06 3.712e+10

Code representing country code, but we don’t have any plan to use it.

3. Data cleaning and transformation

3.1 Annual temperature by country

First, we remove row continent and redundant country in Country column, as discussed in previous section. Now we have valid 232 unique values of country.

Then, we calculate annual average temperature from 1913 - 2012 and drop NAs rows

## # A tibble: 23,130 × 4
##    Country      Year AverageTemperature AverageTemperatureUncertainty
##    <chr>       <int>              <dbl>                         <dbl>
##  1 Afghanistan  1913               13.9                         0.580
##  2 Afghanistan  1914               14.3                         0.757
##  3 Afghanistan  1915               14.9                         0.687
##  4 Afghanistan  1916               13.3                         0.717
##  5 Afghanistan  1917               13.9                         0.716
##  6 Afghanistan  1918               13.5                         0.722
##  7 Afghanistan  1919               13.9                         0.767
##  8 Afghanistan  1920               13.0                         0.808
##  9 Afghanistan  1921               14.2                         0.608
## 10 Afghanistan  1922               14.3                         0.539
## # … with 23,120 more rows

3.2 CO2 emission by country

First, we remove rows with invalid country in Entity column, as discussed in previous section. Now we have 221 unique values of country.

Then we filter only record from 1913-2012

## # A tibble: 22,100 × 4
##    Entity      Code   Year `Annual CO2 emissions`
##    <chr>       <chr> <dbl>                  <dbl>
##  1 Afghanistan AFG    1913                      0
##  2 Afghanistan AFG    1914                      0
##  3 Afghanistan AFG    1915                      0
##  4 Afghanistan AFG    1916                      0
##  5 Afghanistan AFG    1917                      0
##  6 Afghanistan AFG    1918                      0
##  7 Afghanistan AFG    1919                      0
##  8 Afghanistan AFG    1920                      0
##  9 Afghanistan AFG    1921                      0
## 10 Afghanistan AFG    1922                      0
## # … with 22,090 more rows

4. Research Question 1: CO2 emissions and land temperature correlation

4.1 Correlation Analysis

To answer the first research question, the researchers conducted correlation analysis between CO2 emission and land temperature in the following locations: Estonia, Thailand, South Africa, Australia, United States, China, and India.

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

4.2 Results

For this report, the researchers have decided to look at the CO2 emissions and land temperatures of Estonia, Thailand, South Africa, Australia, United states, China and India. All of these countries have a positive correlation between CO2 emissions and land temperatures, however some places with a higher coefficient than others.

Estonia is the only country which seems to have no significant (p=0.21) result for the correlation and also with a smaller coefficient (r(98)=.13) than other places. All other locations had highly significant (p<.0001) correlation results. Australia showed the highest CO2 emission and land temperature correlation with a result of r(98)=0.69. After Australia, in descending order, there is China (r(98)=0.68), India (r(98)=0.66), South Africa (r(98)=0.66), Thailand (r(98)=0.55), and United States (r(98)=0.47).

5. Research Questions 2: The evil emitters and unlucky receivers

From the EDA, we will revise 2 sub-questions as following:

5.1 Temperature change estimation with Linear regression

Since the annual temperatures were swinging between each year. We estimation the temperature change of each country between 1913 and 2012 using the linear regression model using the land temperature dataset.

From the model, temperature change from 1913 - 2012 of each country are as following:

## # A tibble: 232 × 2
##    Country             `Temperature change`
##    <chr>                              <dbl>
##  1 Afghanistan                        1.51 
##  2 Åland                              1.03 
##  3 Albania                            0.663
##  4 Algeria                            1.20 
##  5 American Samoa                     0.888
##  6 Andorra                            1.20 
##  7 Angola                             0.859
##  8 Anguilla                           1.15 
##  9 Antigua And Barbuda                1.16 
## 10 Argentina                          0.892
## # … with 222 more rows
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4897  0.8779  1.0009  1.0182  1.1577  1.8429

The distribution is very close to the normal distribution which is surprising.

5.2 Total CO2 emission

Next step, we find total CO2 emission from 1913 - 2012 of each country from CO2 emission dataset.

## # A tibble: 221 × 2
##    Country             `CO2 emissions (MT)`
##    <chr>                              <dbl>
##  1 Afghanistan                       125.  
##  2 Albania                           248.  
##  3 Algeria                          3383.  
##  4 Andorra                            11.3 
##  5 Angola                            444.  
##  6 Anguilla                            2.31
##  7 Antigua and Barbuda                17.7 
##  8 Argentina                        6880.  
##  9 Armenia                           327.  
## 10 Aruba                              69.0 
## # … with 211 more rows
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##       0.2      31.1     246.4   11943.9    2327.1 1336251.6

The distribution is ultimately skewed to the right. Meaning most country emit just a few co2 but a few country, which can be considered as outlier, emit a lot.

5.3 Temperature changes per CO2 emissions

Then we join both tibble, estimated temperature changes and CO2 emission for each country from 1913 - 2012, together with additional columns:

  • Ratio is the temperature change per CO2 emissions (Million tons)
  • Group can be “Evil emitter” (In CO2 emissions Q4 and Temperature change Q1), “Unlucky receiver” (In CO2 emissions Q1 and Temperature change Q4) or “Neutral” (Other)
## # A tibble: 186 × 5
##    Country     `Temperature change` `CO2 emissions (MT)`     Ratio Group        
##    <chr>                      <dbl>                <dbl>     <dbl> <chr>        
##  1 Afghanistan                1.51                125.   0.0120    Neutral      
##  2 Albania                    0.663               248.   0.00267   Neutral      
##  3 Algeria                    1.20               3383.   0.000354  Neutral      
##  4 Andorra                    1.20                 11.3  0.107     Unlucky rece…
##  5 Angola                     0.859               444.   0.00194   Neutral      
##  6 Anguilla                   1.15                  2.31 0.498     Neutral      
##  7 Argentina                  0.892              6880.   0.000130  Neutral      
##  8 Armenia                    1.38                327.   0.00422   Neutral      
##  9 Aruba                      1.04                 69.0  0.0150    Neutral      
## 10 Australia                  0.954             15053.   0.0000634 Neutral      
## # … with 176 more rows

5.4 Results

Here are the countries from each group with temperature changes and CO2 emission from 1913-2012

The above chart is difficult to read. Since a lot of countries has relatively low emission, and we are not interested in neutral countries, here is the plot that contain only evil emitters and unlucky receivers:

The evil emitters, ranking by smallest temperature changes per CO2 emission ratio, are:

## # A tibble: 9 × 4
##   Country  `Temperature change` `CO2 emissions (MT)`     Ratio
##   <chr>                   <dbl>                <dbl>     <dbl>
## 1 India                   0.854               35051. 0.0000244
## 2 Mexico                  0.650               16390. 0.0000397
## 3 Turkey                  0.793                7696. 0.000103 
## 4 Thailand                0.760                5208. 0.000146 
## 5 Greece                  0.558                3492. 0.000160 
## 6 Nigeria                 0.553                2931. 0.000189 
## 7 Egypt                   0.869                4520. 0.000192 
## 8 Denmark                 0.720                3629. 0.000198 
## 9 Bulgaria                0.800                3472. 0.000230

The unlucky receiver, ranking by highest temperature changes per CO2 emission ratio, are:

## # A tibble: 6 × 4
##   Country       `Temperature change` `CO2 emissions (MT)` Ratio
##   <chr>                        <dbl>                <dbl> <dbl>
## 1 Montserrat                    1.16                 1.31 0.887
## 2 Dominica                      1.16                 3.49 0.333
## 3 Liechtenstein                 1.27                 4.86 0.261
## 4 Grenada                       1.16                 5.98 0.194
## 5 Saint Lucia                   1.16                10.5  0.111
## 6 Andorra                       1.20                11.3  0.107

However, judging CO2 for the whole country might not be fair since smaller country are more likely to produce less co2. CO2 emission per capita can be used in the future study.

6. Discussion

The data cleaning process in this report has given a lot of insight into how the global land temperature has annually been increasing in our chosen countries: Australia, China, Estonia, India, South Africa, Thailand, and United States. Similarly, CO2 emission levels have been increasing in all of the countries, however some countries increasing at rather more drastic increments than others. China, Unites States and India have seen a great rise in CO2 emissions, although United States seems to be doing a better job in attempting to slow down their rise with a sudden drop in emissions in the last decade.

While looking at the correlation between CO2 emission and annual land temperatures, there is a clear positive correlation between the two variables. However, the correlation is not strong enough to make any clear inferences of CO2 emissions affecting the land temperature. Rather it seems probable there are also other variables which should be included in such an analysis. There is opportunity to build on the current research with the integration of further datasets.

Even though climate changes is the global disaster, degree of effect on each country are different. From the analysis, we found some countries that has high CO2 emission but has low land temperature changes, and on the other hand, some countries that has low CO2 emission but has high land temperature changes. The limitation of our methodology is we used the whole country CO2 emission without concerning population size.

7. Conclusion

By combining the past countries’ CO2 emission and land temperature change dataset, we can draw insights uncovering relationships between it. The study show that there is a clear positive relationship between CO2 emission and land temperature change when we focus in each country, but it doesn’t seem fair when we compare ratio of CO2 emission and land temperature changed of each country around the world. These insight emphasize that climate changes is still a global issue, which require serious action from every countries.