Research Results, R coding, and mistakes you can blame on your research assistant

I have just graded and returned the second lab assignment for my introductory research methods class in International Studies (IS240). The lab required the students to answer questions using the help of the R statistical program (which, you may not know, is every pirate’s favourite statistical program).

The final homework problem asked students to find a question in the World Values Survey (WVS) that tapped into homophobic sentiment and determine which of four countries under study–Canada, Egypt, Italy, Thailand–could be considered to be the most homophobic, based only on that single question.

More than a handful of you used the code below to try and determine how the respondents in each country answered question v38. First, here is a screenshot from the WVS codebook:

wvs_v38Students (rightfully, I think) argued that those who mentioned “Homosexuals” amongst the groups of people they would not want as neighbours can be considered to be more homophobic than those who didn’t mention homosexuals in their responses. (Of course, this may not be the case if there are different levels of social desirability bias across countries.) Moreover, students hypothesized that the higher the proportion of mentions of homosexuals, the more homophobic is that country.

But, when it came time to find these proportions some students made a mistake. Let’s assume that the student wanted to know the proportion of Canadian respondents who mentioned (and didn’t mention) homosexuals as persons they wouldn’t want to have as neighbours.

Here is the code they used (four.df is the data frame name, v38 is the variable in question, and country is the country variable):

prop.table(table(four.df$v38=="mentioned" | four.df$country=="canada"))

0.372808 0.627192

Thus, these students concluded that almost 63% of Canadian respondents mentioned homosexuals as persons they did not want to have as neighbours. That’s downright un-neighbourly of us allegedly tolerant Canadians, don’tcha think?. Indeed, when compared with the other two countries (Egyptians weren’t asked this question), Canadians come off as more homophobic than either the Italians or the Thais.

prop.table(table(four.df$v38=="mentioned" | four.df$country=="italy"))

0.6106025 0.3893975

prop.table(table(four.df$v38=="mentioned" | four.df$country=="thailand"))

0.5556995 0.4443005

So, is it true that Canadians are really more homophobic than either Italians or Thais? This may be a simple homework assignment but these kinds of mistakes do happen in the real academic world, and fame (and sometimes even fortune–yes, even in academia a precious few can make a relative fortune) is often the result as these seemingly unconventional findings often cause others to notice. There is an inherent publishing bias towards results that seem to run contrary to conventional wisdom (or bias). The finding that Canadians (widely seen as amongst the most tolerant of God’s children) are really quite homophobic (I mean, close to 2/3 of us allegedly don’t want homosexuals, or any LGBT persons, as neighbours) is radical and a researcher touting these findings would be able to locate a willing publisher in no time!

But, what is really going on here? Well, the problem is a single incorrect symbol that changes the findings dramatically. Let’s go back to the code:

prop.table(table(four.df$v38=="mentioned" | four.df$country=="canada"))

The culprit is the | (“or”) character. What these students are asking R to do is to search their data and find the proportion of all responses for which the respondent either mentioned that they wouldn’t want homosexuals as neighbours OR the respondent is from Canada. Oh, oh! They should have used the & symbol instead of the | symbol to get the proportion of Canadian who mentioned homosexuals in v38.

To understand visually what’s happening let’s take a look at the following venn diagram (see the attached video above for Ali G’s clever use of what he calls “zenn” diagrams to find the perfect target market for his “ice cream glove” idea; the code for how to create this diagram in R is at the end of this post). What we want is the intersection of the blue and red areas (the purple area). What the students’ coding has given us is the sum of (all of!) the blue and (all of!) the red areas.

To get the raw number of Canadians who answered “mentioned” to v38 we need the following code:

table(four.df$v38=="mentioned" & four.df$v2=="canada")

7457   304


But what if you then created a proportional table out of this? You still wouldn’t get the correct answer, which should be the proportion that the purple area on the venn diagram comprises of the total red area.

prop.table(table(four.df$v38=="mentioned" & four.df$v2=="canada"))

0.96082979 0.03917021

Just eyeballing the venn diagram we can be sure that the proportion of homophobic Canadians is larger than 3.9%. What we need is the proportion of Canadian respondents only(!) who mentioned homosexuals in v38. The code for that is:


mentioned not mentioned
0.1404806     0.8595194

So, only about 14% of Canadians can be considered to have given a homophobic response, not the 62% our students had calculated. What are the comparative results for Italy and Thailand, respectively?


mentioned not mentioned
0.235546      0.764454


mentioned not mentioned
0.3372781     0.6627219

The moral of the story: if you mistakenly find something in your data that runs against conventional wisdom and it gets published, but someone comes along after publication and demonstrates that you’ve made a mistake, just blame it on a poorly-paid research assistant’s coding mistake.

Here’s a way to do the above using what is called a for loop:

for (i in 1:length(four)) {
+ print(prop.table(table(four.df$v38[four.df$v2==four[i]])))
+ print(four[i])
+ }

mentioned not mentioned
0.1404806     0.8595194
[1] "canada"

mentioned not mentioned

[1] "egypt"

mentioned not mentioned
0.235546      0.764454
[1] "italy"

mentioned not mentioned
0.3372781     0.6627219
[1] "thailand"

Here’s the R code to draw the venn diagram above:



v1<-venneuler(c("Mentioned"=sum(four.df$v38=="mentioned",na.rm=T),"Canada"=sum(four.df$v2=="canada",na.rm=T),"Mentioned&Canada"=sum(four.df$v2=="canada" & four.df$v38=="mentioned",na.rm=T)))

plot(v1,main="Venn Diagram of Canada and v38 from WVS", sub="v38='I wouldn't want to live next to a homosexual'", col=c("blue","red"))

Does Political Ideology Change as we Age?

It has long been accepted conventional wisdom that as we age we become more conservative in our political views. Remember the quote that has been (allegedly) wrongly attributed to Winston Churchill:

“If you’re not a liberal when you’re 25, you have no heart.  If you’re not a conservative by the time you’re 35, you have no brain.”

But as with many things, conventional wisdom doesn’t seem to be very wise. According to recent research, individuals do not become more conservative as they age. In fact, just the opposite may be true. From an article on the Discovery magazine website, we learn:

Ongoing research, however, fails to back up the stereotype [about age and conservatism]. While there is some evidence that today’s seniors may be more conservative than today’s youth, that’s not because older folks are more conservative than they use to be. Instead, our modern elders likely came of age at a time when the political situation favored more conservative views.

In fact, studies show that people may actually get more liberal over time when it comes to certain kinds of beliefs. That suggests that we are not pre-determined to get stodgy, set in our ways or otherwise more inflexible in our retirement years. [emphasis added]

The studies do reference data collected in the United States, but there’s no reason to think that the same phenomenon wouldn’t apply in other advanced capitalist democracies.

How do your political beliefs compare to those of your parents? What was the political climate like at the time your parents were becoming politically aware? In which country (if it wasn’t Canada) did your parents come of political age?

Two Opportunities for Summer Study Abroad

Via the polcan listserv (Canadian Political Science Association) comes word about two opportunities for study abroad in the area of (ethnic) conflict. The first is a course offered in Kenya by the University of Toronto. The course, PCS361Y–Special Topics in Peace and Conflict Studies: Conflict in Africa: Causes, Consequences, and Responses–is described as “an intensive inquiry into the causes, consequences, and especially possible to conflict in Africa.” The course will be taught in Nairobi, Masai Mara, and Mombasa from May 13 through June 6. For more information, go here.

The second course will be taught as part of the American University in Kosovo summer program. Here is a description of the program:

American University in Kosovo is now accepting applications for the Summer of 2011 to study Peacebuilding, Post-conflict Transformation, and Development in the fun and safe ‘living laboratory’ of the Balkans. This four-week program offers a wide selection of courses in related areas from an impressive array of global scholars, diplomats, retired military officers, ex-combatants, practitioners, and representatives of international organizations. The goal of the program is to bridge the gap between theory and practice. Last year’s program included about 60 students from over 30 countries — including 6 Canadians. About 2/3 of the students were undergraduates — the remaining graduate students. Undergraduate course credits are transferrable. Several participants from 2010 referred to their experiences in the program as ‘life transforming.’

For more about this program, go here.

Accessing online journal articles from off-campus

Hello Students:

Some of you have e-mailed inquiring about how to access the subscription-only online journals from off-campus. With the roll-out of the library’s new “fast search” feature, it’s now as easy as 1-2-3…4-5!

Here’s what you do:

1) Go to the library’s website and check that the “Fast Search” has been selected in the search area (it should be the default).

2) Type the name of the article you’re seeking in the appropriate place (see pic below) and click “Search.”

3) A new browser window will open. If you see the name of the article, click the appropriate link (inside the orange rectangle in the picture below):

Continue reading

Tips for Students on Writing Good Papers

Henry Farrell, who teaches political science at George Washington University, has posted an essay with tips for students writing political science papers. There are some important insights, such as “cut to the chase”, “organize, organize, organize”, and “avoid data dumps.” In my opinion, his most important tip (and this would also apply to examinations) is “read the requirements for the assignment.” If you’re unsure about the requirements, or there is something you don’t understand, seek clarification from your professor/instructor. The whole essay can be found here:


Resources for First Paper (IS 210)–Risk Assessment

Here are some data resources that may be helpful to you while researching and writing your first paper assignment. I’ll be showing you how to use/access some of these sources in class on Thursday, September 23rd.

Continue reading

Mock German Election Simulation–Government Formation Results

In the second part of our mock German election simulation–the government formation negotiations–we were able to get a new government voted into power by the recently elected Bundestag.  (I refer you to this post for more information about the electoral results.)

To remind you, following the election, we had the composition of the Bundestag was:

FDP–6 mandate (formateur party)

CDU/CSU–4 mandates

SPD–3 mandates

Greens–3 mandates

In order to have a secure governing coalition, a governing coalition of at least 9 mandates would be needed in this sixteen-member parliament.

The FDP were unable to convince any of the other parties to form a governing coalition with them, and the government that was voted into office, by a majority vote of 10-6 was a three-party coalition of the Greens, CDU/CSU, and the SPD.

In the end, it was the personal ambition of the CDU/CSU leader–Patrick S.–that ruled the day.  He wanted to become Chancellor and this steely determination served him well as he, with his fellow party members and advisory committee, was able to effectively forge a rather wieldy three-party governing coalition.

Why did Patrick S. want to become Chancellor so desperately?  There have been reports in some of the leading journals that it has been his dream since childhood.  But in a sit-down interview with Deutsche Welle following his ascension to the Chancellorship, Chancellor S. claimed that it was because this election was crucial to the future of the German state.  According to the Chancellor, he and his party believe that a moral crisis of epic proportions has descended upon Germany and only his party had the necessary moral acuity to set Germany back on the correct path.

The Chancellor and the six-member Cabinet is composed of the following:

Chancellor–Patrick S. (CDU/CSU)

Minister of Education–Becky W. (Greens)

Minister of the Interior–Erick K. (CDU/CSU)

Minister of the Environment–Zhivko I. (Greens)

Minister of Foreign Affairs–Kyle B. (CDU/CSU)

Minister of Health–Rip F. (SPD)

Minister of Labor–Andrew S. (SPD)

One of the advisers to the SPD commented that the SPD actually had refused to sign a coalition agreement offered to them by the FDP, which in retrospect, was better for the SPD than the one they signed ultimately.  There seemed to be a consensus within the SPD that the arrogance of the FDP had created friction between the two potential coalition partners.

I look forward to reading your impressions of the simulation exercise on your blogs.