Using R to help simulate the NHL Draft Lottery

Upon discussing the NHL game results file, I mentioned to a few of you that I have used R to generate an NHL draft lottery simulator. It’s quite simple, although you do have to install the XML package, which allows us to use R to ‘scrape’ websites. We use this functionality in order to create the lottery simulator dynamically, depending on the previous evening’s (afternoon’s) game results.

Here’s the code: (remember to un-comment the install.packages(“XML”) command the first time you run the simulator). Copy and paste this code into your R console, or save it as an R script file and run it as source.

# R code to simulate the NHL Draft Lottery
# The current draft order of teams obviously changes on a
# game-to-game basis. We have to create a vector of teams in order
# from 31st to 17th place that can be updated on a game-by-game
# (or dynamic) basis.

# To do this, we can use R's ability to interrogate, scrape,
# and parse web pages.

#install.packages("XML") # NOTE: Uncomment and install this
#                                package before running this
#                                script the first time.

require(XML) # We need this for parsing of the html code

url <- ("") #retrieve the web page we are using as the data source
doc <- htmlParse(url) #parse the page to extract info we'll need.

# From investigation of the web page's source code, we see that the
# team names can be found in the element [td class="text-left"]
# and the odds of each team winning the lottery are in the
# element [td class="text-right"]. Without this
# information, we wouldn't know where to tell R to find the elements
# of data that we'd like to extract from the web page.
# Now we can use xml to extract the data values we need.

result.teams <- unlist(xpathApply(doc, "//td[contains(@class,'text-left')]",xmlValue)) #unlist used to create vector
result.odds <- unlist(xpathApply(doc, "//td[contains(@class,'text-right')]",xmlValue))

# The teams elements are returned as strings (character), which is
# appropriate. Also only non-playoff teams are included, which makes
# it easier for us. The odds elements are returned as strings as
# well (and percentages), which is problematic.
# First, we have 31 elements (the values of 16 of which--the playoff
# teams --are returned as missing). We only want 15 (the non-playoff
# teams).
# Second, in these remaining # 15 elements we have to remove the
# "%" character from each.
# Third, we have to convert the character format to numeric.
# The code below does the clean-up. 

result.odds <- result.odds[1:15]
result.odds <- as.numeric(gsub("%"," ",result.odds)) #remove % symbol
teamodds.df <- data.frame("teams"=result.teams[1:15],"odds"=result.odds, stringsAsFactors=FALSE) #Create data frame for easier display 

# Let's print a nice table of the teams, with up-to-date
# corresponding odds. 

print(teamodds.df) # odds are out of 100 

#Now, let's finally 'run' the lottery, and print the winner's name.

cat("The winner of the 2018 NHL Draft Lottery is the:", sample(teamodds.df$team,1,prob=teamodds.df$odds),sep="") 



Polity IV Democracy Scores, Participation, and the Suffragettes

We noted today in lecture that Polity IV gives countries like the United States very high scores on the ”democraticness” variable, even during periods when a majority of the adult population–African-Americans, and women–were legally not allowed to vote. While Switzerland (1971) was the last European democracy to grant universal suffrage for women, Portugal was the last European country to do so (1976)–Portugal was run by a military dictatorship during in the early years of the 1970s.

In this era of social media abuse and bullying, it’s interesting to learn about some of the abuse hurled at the Suffragettes:


The Economist Intelligence Unit’s Global “Livability” Survey Omits Cost-of-Living

Before we can say anything definitive about the concepts and ideas that we’re studying, it is imperative that we have some understanding about whether the data that we observe and collect are actually “tapping into” the concept of interest.

For example, if my desire were to collect data that are meant to represent how democratic a country is, it would probably not be beneficial to that enterprise to collect measures of annual rainfall. [Though, in some predominantly agricultural countries, that might be an instrument for economic growth.] Presumably, I would want to collect data like whether elections were regularly held, free, and fair, whether the judiciary was independent of elected leaders, etc. That seems quite obvious to most.

The Economist’s Intelligence Unit puts out an annual  “Global Livability Report” , which claims to comparatively assess “livability” in about 140 cities worldwide. The EIU uses many different indicators (across five broad categories) to arrive at a single index value that allegedly reflects the level of livability of each city in the survey.  Have a look at the indicators below. Do you notice that the cost-of-living is not include? Why might that be?