Upon discussing the NHL game results file, I mentioned to a few of you that I have used R to generate an NHL draft lottery simulator. It’s quite simple, although you do have to install the XML package, which allows us to use R to ‘scrape’ websites. We use this functionality in order to create the lottery simulator dynamically, depending on the previous evening’s (afternoon’s) game results.
Here’s the code: (remember to un-comment the install.packages(“XML”) command the first time you run the simulator). Copy and paste this code into your R console, or save it as an R script file and run it as source.
# R code to simulate the NHL Draft Lottery # The current draft order of teams obviously changes on a # game-to-game basis. We have to create a vector of teams in order # from 31st to 17th place that can be updated on a game-by-game # (or dynamic) basis. # To do this, we can use R's ability to interrogate, scrape, # and parse web pages. #install.packages("XML") # NOTE: Uncomment and install this # package before running this # script the first time. require(XML) # We need this for parsing of the html code url <- ("http://nhllotterysimulator.com/") #retrieve the web page we are using as the data source doc <- htmlParse(url) #parse the page to extract info we'll need. # From investigation of the web page's source code, we see that the # team names can be found in the element [td class="text-left"] # and the odds of each team winning the lottery are in the # element [td class="text-right"]. Without this # information, we wouldn't know where to tell R to find the elements # of data that we'd like to extract from the web page. # Now we can use xml to extract the data values we need. result.teams <- unlist(xpathApply(doc, "//td[contains(@class,'text-left')]",xmlValue)) #unlist used to create vector result.odds <- unlist(xpathApply(doc, "//td[contains(@class,'text-right')]",xmlValue)) # The teams elements are returned as strings (character), which is # appropriate. Also only non-playoff teams are included, which makes # it easier for us. The odds elements are returned as strings as # well (and percentages), which is problematic. # First, we have 31 elements (the values of 16 of which--the playoff # teams --are returned as missing). We only want 15 (the non-playoff # teams). # Second, in these remaining # 15 elements we have to remove the # "%" character from each. # Third, we have to convert the character format to numeric. # The code below does the clean-up. result.odds <- result.odds[1:15] result.odds <- as.numeric(gsub("%"," ",result.odds)) #remove % symbol teamodds.df <- data.frame("teams"=result.teams[1:15],"odds"=result.odds, stringsAsFactors=FALSE) #Create data frame for easier display # Let's print a nice table of the teams, with up-to-date # corresponding odds. print(teamodds.df) # odds are out of 100 #Now, let's finally 'run' the lottery, and print the winner's name. cat("The winner of the 2018 NHL Draft Lottery is the:", sample(teamodds.df$team,1,prob=teamodds.df$odds),sep="")