Using R to help simulate the NHL Draft Lottery

Upon discussing the NHL game results file, I mentioned to a few of you that I have used R to generate an NHL draft lottery simulator. It’s quite simple, although you do have to install the XML package, which allows us to use R to ‘scrape’ websites. We use this functionality in order to create the lottery simulator dynamically, depending on the previous evening’s (afternoon’s) game results.

Here’s the code: (remember to un-comment the install.packages(“XML”) command the first time you run the simulator). Copy and paste this code into your R console, or save it as an R script file and run it as source.

# R code to simulate the NHL Draft Lottery
# The current draft order of teams obviously changes on a
# game-to-game basis. We have to create a vector of teams in order
# from 31st to 17th place that can be updated on a game-by-game
# (or dynamic) basis.

# To do this, we can use R's ability to interrogate, scrape,
# and parse web pages.

#install.packages("XML") # NOTE: Uncomment and install this
#                                package before running this
#                                script the first time.

require(XML) # We need this for parsing of the html code

url <- ("http://nhllotterysimulator.com/") #retrieve the web page we are using as the data source
doc <- htmlParse(url) #parse the page to extract info we'll need.

# From investigation of the web page's source code, we see that the
# team names can be found in the element [td class="text-left"]
# and the odds of each team winning the lottery are in the
# element [td class="text-right"]. Without this
# information, we wouldn't know where to tell R to find the elements
# of data that we'd like to extract from the web page.
# Now we can use xml to extract the data values we need.

result.teams <- unlist(xpathApply(doc, "//td[contains(@class,'text-left')]",xmlValue)) #unlist used to create vector
result.odds <- unlist(xpathApply(doc, "//td[contains(@class,'text-right')]",xmlValue))

# The teams elements are returned as strings (character), which is
# appropriate. Also only non-playoff teams are included, which makes
# it easier for us. The odds elements are returned as strings as
# well (and percentages), which is problematic.
# First, we have 31 elements (the values of 16 of which--the playoff
# teams --are returned as missing). We only want 15 (the non-playoff
# teams).
# Second, in these remaining # 15 elements we have to remove the
# "%" character from each.
# Third, we have to convert the character format to numeric.
# The code below does the clean-up. 

result.odds <- result.odds[1:15]
result.odds <- as.numeric(gsub("%"," ",result.odds)) #remove % symbol
teamodds.df <- data.frame("teams"=result.teams[1:15],"odds"=result.odds, stringsAsFactors=FALSE) #Create data frame for easier display 

# Let's print a nice table of the teams, with up-to-date
# corresponding odds. 

print(teamodds.df) # odds are out of 100 

#Now, let's finally 'run' the lottery, and print the winner's name.

cat("The winner of the 2018 NHL Draft Lottery is the:", sample(teamodds.df$team,1,prob=teamodds.df$odds),sep="") 

 

Advertisements