Project 3: GDP and life expectancy¶

by RJ, November 2nd 2015

This is the project notebook for Week 3 of The Open University's Learn to code for Data Analysis course.

Does a high population density have an impact on life expectancy? The following analysis checks whether there is any correlation between population density of a country in 2013 and the life expectancy of people born in that country in 2013.

Getting the data¶

Two datasets of the World Bank are considered. One dataset, available at http://data.worldbank.org/indicator/EN.POP.DNST, lists the Population density of the world's countries for various years. The other dataset, available at http://data.worldbank.org/indicator/SP.DYN.LE00.IN, lists the life expectancy of the world's countries.

The datasets are downloaded directly, using the unique indicator name given in the URL.

from pandas import *
from pandas.io.wb import download

YEAR = 2013
POPDENS_INDICATOR = 'EN.POP.DNST'
popdens = download(indicator=POPDENS_INDICATOR, country='all', start=YEAR, end=YEAR)
LIFE_INDICATOR = 'SP.DYN.LE00.IN'
life = download(indicator=LIFE_INDICATOR, country='all', start=YEAR, end=YEAR)

Cleaning the data¶

Inspecting the data with head() and tail() shows that:

country names are the row indices, not column values;
the first 34 rows are aggregated data, for the Arab World, the Caribbean small states, and other country groups used by the World Bank;
Population density and life expectancy values are missing for some countries.

The data is therefore cleaned by:

transforming the dataframe index into columns and creating a new index 0, 1, 2, etc.;
removing the first 34 rows;
removing rows with unavailable values.

popdens.head(3)

popdens = popdens.reset_index()[34:].dropna()
life = life.reset_index()[34:].dropna()
popdens.head(3)

The unnecessary columns can be dropped.

COUNTRY = 'country'
POPDENS = 'Population Desity (people per sq. km of land area)'
popdens[POPDENS] = popdens[POPDENS_INDICATOR].apply(round)
headings = [COUNTRY, POPDENS]
popdens = popdens[headings]
popdens.head()

The World Bank reports the population desnity and life expectancy with several decimal places. After rounding, the original column is discarded.

LIFE = 'Life expectancy (years)'
life[LIFE] = life[LIFE_INDICATOR].apply(round)
headings = [COUNTRY, LIFE]
life = life[headings]
life.head()

Combining the data¶

The tables are combined through an inner join on the common 'country' column.

pdVsLife = merge(popdens, life, on=COUNTRY, how='inner')
pdVsLife.head()

Calculating the correlation¶

To measure if the life expectancy and the GDP grow together, the Spearman rank correlation coefficient is used. It is a number from -1 (perfect inverse rank correlation: if one indicator increases, the other decreases) to 1 (perfect direct rank correlation: if one indicator increases, so does the other), with 0 meaning there is no rank correlation. A perfect correlation doesn't imply any cause-effect relation between the two indicators. A p-value below 0.05 means the correlation is statistically significant.

from scipy.stats import spearmanr

pdColumn = pdVsLife[POPDENS]
lifeColumn = pdVsLife[LIFE]
(correlation, pValue) = spearmanr(pdColumn, lifeColumn)
print('The correlation is', correlation)
if pValue < 0.05:
    print('It is statistically significant.', pValue)
else:
    print('It is not statistically significant.', pValue)

('The correlation is', 0.2940620186159793)
('It is statistically significant.', 0.00012623396817285377)

The value shows a statistically significant pValue and a correlation of 0.29. Not sure what to conclude from this, other than that it appears there supposedly is some connection between population density and life expectency.

Showing the data¶

Measures of correlation can be misleading, so it is best to see the overall picture with a scatterplot.

%matplotlib inline
pdVsLife.plot(x=POPDENS, y=LIFE, kind='scatter', grid=True, logx=True, figsize=(10, 4))

<matplotlib.axes._subplots.AxesSubplot at 0x10b2fb610>

All I conlcude from the scatterplot is that the countries that have a high population density are also in the upper ranges of life expectancy, but a low pop. density doesn't lead to a low life expectancy.

It should also be noted that population density is measured per country, and as there are many inhabitable areas in the world this number is likely to misrepresent the majority of the population in some cases.

# the 10 countries with lowest GDP
pdVsLife.sort(POPDENS).head(10)

# the 10 countries with lowest life expectancy
pdVsLife.sort(LIFE).head(10)

Conclusions¶

Based on the information shown below I would not dare conlcusing there is a relationship between country-wide population density and life expectancy in those countries. It would be interesting to apply this analysis on a regional basis (consistent population density and life expectancy data per region) if that data was available.

	index	country	year	EN.POP.DNST
34	68	Canada	2013	3.866307
35	69	Cayman Islands	2013	243.204167
36	70	Central African Republic	2013	7.561524

	country	Population Desity (people per sq. km of land area)
34	Canada	4
35	Cayman Islands	243
36	Central African Republic	8
37	Chad	10
38	Channel Islands	853

	country	Life expectancy (years)
34	Afghanistan	61
35	Albania	78
36	Algeria	71
39	Angola	52
40	Antigua and Barbuda	76

	country	Population Desity (people per sq. km of land area)	Life expectancy (years)
0	Canada	4	81
1	Central African Republic	8	50
2	Chad	10	51
3	Channel Islands	853	80
4	Chile	24	80

	country	Population Desity (people per sq. km of land area)	Life expectancy (years)
88	Mongolia	2	68
47	Iceland	3	83
93	Namibia	3	64
135	Suriname	3	71
42	Guyana	4	66
71	Libya	4	75
83	Mauritania	4	62
0	Canada	4	81
31	Gabon	6	63
58	Kazakhstan	6	70

		EN.POP.DNST
country	year
Arab World	2013	27.684115
Caribbean small states	2013	17.230626
Central Europe and the Baltics	2013	93.943347

	country	Population Desity (people per sq. km of land area)	Life expectancy (years)
122	Sierra Leone	86	46
69	Lesotho	69	49
136	Swaziland	73	49
1	Central African Republic	8	50
91	Mozambique	34	50
8	Congo, Dem. Rep.	32	50
11	Cote d'Ivoire	68	51
2	Chad	10	51
100	Nigeria	190	52
22	Equatorial Guinea	28	53