Susan Li bio photo

Susan Li

Data Analyst works in an AI company.

Twitter LinkedIn Github

My Favorite Blogs

I recently came across a R package called “unvote” that consists the voting history of countries in the United Nations General Assembly from 1946 to 2015. The packaged was developed by David Robinson.

Explore the data

library(ggplot2)
library(unvotes)
library(dplyr)
library(lubridate)
library(ggthemes)
library(tidyr)

The package contains three data set. The first is the history of each country’s vote, with more than 700,000 rows.

un_votes
### A tibble: 738,764 × 4
##    rcid                  country country_code   vote
##   <int>                    <chr>        <chr> <fctr>
##1      3 United States of America           US    yes
##2      3                   Canada           CA     no
##3      3                     Cuba           CU    yes
##4      3                    Haiti           HT    yes
##5      3       Dominican Republic           DO    yes
##6      3                   Mexico           MX    yes
##7      3                Guatemala           GT    yes
##8      3                 Honduras           HN    yes
##9      3              El Salvador           SV    yes
##10     3                Nicaragua           NI    yes
### ... with 738,754 more rows

The second dataset contains information about each roll call vote, including the date, description, and relevant resolution that was voted on.

un_roll_calls
### A tibble: 5,429 × 9
##    rcid session importantvote       date   unres amend  para
##   <int>   <dbl>         <dbl>     <date>   <chr> <dbl> <dbl>
##1      3       1             0 1946-01-01  R/1/66     1     0
##2      4       1             0 1946-01-02  R/1/79     0     0
##3      5       1             0 1946-01-04  R/1/98     0     0
##4      6       1             0 1946-01-04 R/1/107     0     0
##5      7       1             0 1946-01-02 R/1/295     1     0
##6      8       1             0 1946-01-05 R/1/297     1     0
##7      9       1             0 1946-02-05 R/1/329     0     0
##8     10       1             0 1946-02-05 R/1/361     1     1
##9     11       1             0 1946-02-05 R/1/376     0     0
##10    12       1             0 1946-02-06 R/1/394     1     1
### ... with 5,419 more rows, and 2 more variables: short <chr>, descr <chr>

The last data set contains relationships between each vote and six issues, they are “Palestinian conflict”, “Nuclear weapons and nuclear material”, “Arms control and disarmament”, “Human rights”, “Colonialism” and “Economic development”.

un_roll_call_issues
### A tibble: 5,281 × 3
##    rcid short_name                issue
##   <int>      <chr>                <chr>
##1   3372         me Palestinian conflict
##2   3658         me Palestinian conflict
##3   3692         me Palestinian conflict
##4   2901         me Palestinian conflict
##5   3020         me Palestinian conflict
##6   3217         me Palestinian conflict
##7   3298         me Palestinian conflict
##8   3429         me Palestinian conflict
##9   3558         me Palestinian conflict
##10  3625         me Palestinian conflict
### ... with 5,271 more rows

First, which issue(issues) have been voted the most?

un_roll_call_issues %>% count(issue, sort=TRUE)
### A tibble: 6 × 2
##                                 issue     n
##                                 <chr> <int>
##1                 Palestinian conflict  1104
##2                          Colonialism   991
##3                         Human rights   986
##4         Arms control and disarmament   956
##5 Nuclear weapons and nuclear material   762
##6                 Economic development   482

How often a country voted “yes” from 1946 to 2015?

by_country <- un_votes %>% group_by(country) %>% summarize(votes = n(),
pct_yes = mean(vote == 'yes'))
by_country
### A tibble: 200 × 3
##               country votes   pct_yes
##                 <chr> <int>     <dbl>
##1          Afghanistan  4972 0.8417136
##2              Albania  3514 0.7157086
##3              Algeria  4527 0.8981666
##4              Andorra  1564 0.6445013
##5               Angola  3075 0.9219512
##6  Antigua and Barbuda  2658 0.9194883
##7            Argentina  5361 0.7789591
##8              Armenia  1629 0.7587477
##9            Australia  5399 0.5523245
##10             Austria  4939 0.6329216
### ... with 190 more rows

Percentage yes vote high countries from 1946 to 2015

arrange(by_country, desc(pct_yes))
### A tibble: 200 × 3
##                 country votes   pct_yes
##                   <chr> <int>     <dbl>
##1             Seychelles  1790 0.9782123
##2            Timor-Leste   837 0.9701314
##3  Sao Tome and Principe  2389 0.9673504
##4             Cabo Verde  3292 0.9599028
##5               Djibouti  3345 0.9563528
##6          Guinea Bissau  3070 0.9560261
##7                Comoros  2530 0.9450593
##8             Mozambique  3456 0.9427083
##9   United Arab Emirates  4031 0.9414537
##10              Suriname  3410 0.9410557
# ... with 190 more rows

Percentage yes vote low countries from 1946 to 2015

by_country[order(by_country$pct_yes),]
### A tibble: 200 × 3
##                                                country votes   pct_yes
##                                                  <chr> <int>     <dbl>
##1                                              Zanzibar     2 0.0000000
##2                              United States of America  5390 0.2836735
##3                                                 Palau   896 0.3225446
##4                                                Israel  4944 0.3460761
##5                           Federal Republic of Germany  2067 0.3962264
##6                      Micronesia (Federated States of)  1462 0.4138167
##7  United Kingdom of Great Britain and Northern Ireland  5372 0.4285182
##8                                                France  5325 0.4336150
##9                                      Marshall Islands  1600 0.4893750
##10                                              Belgium  5391 0.4952699
### ... with 190 more rows

Percentage yes vote high countries and years

join1 <- un_votes %>% inner_join(un_roll_calls, by = 'rcid')
by_country_year <- join1 %>% group_by(country, year=year(date)) %>% summarise(votes=n(), pct_yes = mean(vote=='yes'))
arrange(by_country_year, desc(pct_yes))
##Source: local data frame [9,689 x 4]
##Groups: country [200]

##       country  year votes pct_yes
##         <chr> <dbl> <int>   <dbl>
##1  Afghanistan  2002    40       1
##2  Afghanistan  2004    58       1
##3      Albania  1990    83       1
##4       Angola  1976     4       1
##5       Angola  1977    50       1
##6   Azerbaijan  2013    56       1
##7      Bahrain  1990    86       1
##8      Bahrain  1991    73       1
##9      Bahrain  1992    70       1
##10     Bahrain  1993    60       1
# ... with 9,679 more rows

Percentage yes vote low countries and years

by_country_year[order(by_country_year$pct_yes),]
##Source: local data frame [9,689 x 4]
##Groups: country [200]

##                            country  year votes    pct_yes
##                              <chr> <dbl> <int>      <dbl>
##1  Democratic Republic of the Congo  1998     1 0.00000000
##2                            Jordan  1955     6 0.00000000
##3                           Liberia  1998     1 0.00000000
##4                      South Africa  1974     2 0.00000000
##5                             Spain  1955     5 0.00000000
##6                         Sri Lanka  1955     5 0.00000000
##7                        Yugoslavia  1992     2 0.00000000
##8                          Zanzibar  1963     2 0.00000000
##9          United States of America  1989   115 0.08695652
##10         United States of America  1988   134 0.09701493
### ... with 9,679 more rows

Let’s look at three countries - Canada, US and UK’s “Yes” vote trend in percent over year.

countries <- c('Canada', 'United States of America', 'United Kingdom of Great Britain and Northern Ireland')
by_country_year %>% filter(country %in% countries) %>% 
  ggplot(aes(x=year, y=pct_yes, color=country)) + geom_line() + 
  ylab("% of votes are 'Yes'") + ggtitle("Trend in percentage Yes Votes of Canada, US and UK 1946-2015") + theme_bw()

vote-1

Let’s look at those six issues, how were they voted overtime by the above three countries?

join1 %>% filter(country %in% countries) %>% 
  inner_join(un_roll_call_issues, by='rcid') %>% 
  group_by(year=year(date), country, issue) %>% 
  summarise(votes=n(), pct_yes=mean(vote=='yes')) %>% 
  ggplot(aes(x=year, y=pct_yes, color=country)) + 
  geom_point() + 
  geom_smooth(se=FALSE) + facet_wrap(~issue) + ylab("% of votes are 'Yes'") +
  ggtitle('Trend in Percentage Yes Votes by Issues for Canada, US and UK')

vote-2

Among these three countries, which countries voted “yes” the most and the least for what issues?

join2 <- join1 %>% filter(country %in% countries) %>% 
  inner_join(un_roll_call_issues, by='rcid') %>% 
  group_by(country, issue) %>% 
  summarise(votes=n(), pct_yes=mean(vote=='yes'))

ggplot(aes(x=country, y=pct_yes, fill = issue), data = join2) + geom_bar(stat = 'identity', position = position_dodge()) + ggtitle('Canada, US, UK and the UN Issues')

vote-3

Let’s try to estimate the probability of these three countries’ changes in voting yes to the UN issues(i.e.whether there is a correlation between trend in year and percentage yes vote’)

us_by_year <- by_country_year %>% filter(country=='United States of America')
ca_by_year <- by_country_year %>% filter(country=='Canada')
uk_by_year <- by_country_year %>% filter(country=='United Kingdom of Great Britain and Northern Ireland')
us_model <- lm(pct_yes ~ year, data=us_by_year)
ca_model <- lm(pct_yes ~ year, data=ca_by_year)
uk_model <- lm(pct_yes ~ year, data = uk_by_year)
us_prob <- tidy(us_model) %>% filter(term=='year')
ca_prob <- tidy(ca_model) %>% filter(term=='year')
uk_prob <- tidy(uk_model) %>% filter(term=='year')
us_prob
ca_prob
uk_prob
##us_prob
##  term     estimate    std.error statistic      p.value
##1 year -0.007103352 0.0006991439 -10.16007 3.357004e-15
##  ca_prob
##  term      estimate    std.error  statistic   p.value
##1 year -0.0001975947 0.0006603795 -0.2992139 0.7657031
##  uk_prob
##  term   estimate    std.error statistic  p.value
##1 year 0.00103754 0.0007739133  1.340641 0.184565

Interpretation of the results

  • For the USA, the probablity of voting yes to UN issues will decrease 0.0071 percent in the coming years; trend in year and percentage yes vote are highly correlated.
  • For Canada, the probability of voting yes to UN issues will decrease 0.0002 percent in the coming years, and there is no correlation between trend in year and percentage yes vote.
  • For the UK, the probability of voting yes to UN issues will decrease 0.001 percent in the coming years, and there is no correlation between trend in year and percentage yes vote.

The End

I realized that this package allows me to perform several statistical analysises including linear regression, logistic regression and I will save it to the next time.