Installing the Statsbomb Python Library

Installing the Statsbomb Python Library

I have started using Python more and wanted to find away of creating Jupyter notebooks on VS Code and converting them to Markdown files for my Blogdown created in R Studio. Didn’t take me long to find a really simple tutorial on how to easily do this. For this tutorial, I am going to download and install the Statsbomb Python library, using a zip folder I have downloaded from their github page.

Before I get started, I like to work with Jupyter Notebooks within my VS Code environment. This might not be normal, I’m not sure but for those that are interested you can find how to implement this on your device here. For this type of environment, my terminal working directory is always pointed to where the current file is saved, or where my VS Code project is based. This makes it quite easy to use and direct to data files saved within the working directory.

Once I have downloaded the statsbombpy zip file from github, I added this to my current Jupyter notebook working directory. In the terminal, I changed my working directory using cd statsbombpy-master, then ran pip install . to install the statsbombpy library successfully.

Now we have the library installed, let’s see how easy it is to run and pull the free competitions in to our notebook.

### First we must import the relevant library.
from statsbombpy import sb

### Then we can now call all free competitions 
comps = sb.competitions()
comps.head(5)
credentials were not supplied. open data access only
competition_idseason_idcountry_namecompetition_namecompetition_genderseason_namematch_updatedmatch_available
03742EnglandFA Women’s Super Leaguefemale2019/20202020-03-11T14:09:41.9321382020-03-11T14:09:41.932138
1374EnglandFA Women’s Super Leaguefemale2018/20192020-02-27T15:59:58.1482020-02-27T15:59:58.148
2433InternationalFIFA World Cupmale20182019-12-16T23:09:16.1687562019-12-16T23:09:16.168756
3114SpainLa Ligamale2018/20192020-02-27T12:19:39.4580172020-02-27T12:19:39.458017
4111SpainLa Ligamale2017/20182020-02-27T12:19:39.4580172020-02-27T12:19:39.458017

We can then find the matches using the matches function from the statsbombpy library. Let’s do this for the 2019/2020 season of the FA WSL.

### Find the free matches from a league in the competitions table above
## Add the competition id below.
comp = 37

## Add the season id below
season = 42

### Run the matches function to pull all the matches from the competition and season. 
matches = sb.matches(competition_id = comp, season_id = season)
matches.head(5)
credentials were not supplied. open data access only
match_idmatch_datekick_offcompetitionseasonhome_teamaway_teamhome_scoreaway_scorematch_statuslast_updatedmatch_weekcompetition_stagestadiumrefereedata_versionshot_fidelity_versionxy_fidelity_version
022750382020-02-1220:30:00.000England - FA Women’s Super League2019/2020Reading WFCWest Ham United LFC20available2020-02-14T17:43:49.36816Regular SeasonAdams ParkA. Bryne1.1.022
122750372020-02-0215:00:00.000England - FA Women’s Super League2019/2020Manchester City WFCArsenal WFC21available2020-02-04T17:25:33.26314Regular SeasonAcademy StadiumS. Pearson1.1.022
222750272020-02-0215:00:00.000England - FA Women’s Super League2019/2020Brighton & Hove Albion WFCEverton LFC10available2020-02-04T17:28:02.43414Regular SeasonNaNA. Bryne1.1.022
322750302020-02-2315:00:00.000England - FA Women’s Super League2019/2020Brighton & Hove Albion WFCTottenham Hotspur Women01available2020-02-26T15:02:00.12217Regular SeasonNaNL. Saunders1.1.022
422751202019-09-0815:00:00.000England - FA Women’s Super League2019/2020Birmingham City WFCEverton LFC01available2019-12-16T23:09:16.1687561Regular SeasonSportNation.bet StadiumE. Swallow1.1.022

As easy as that, we have all the matches available in the Statsbomb free data set, detailing everything we might want to know about the specific matches. If we change the comp and season values and supply them to the matches function, we can get the details from a different competition or season very quickly.

But who played for each team in these matches, we can find that too using the lineups function supplied in the library.

### Run the lineups function to get the lineups for each team in a given match. 
## Add match_id here
match = 2275038

## Run function and assign to lineups
lineup = sb.lineups(match_id = match)['West Ham United LFC']
lineup.head(10)
credentials were not supplied. open data access only
player_idplayer_nameplayer_nicknamejersey_numbercountry
08297Adriana LeonNone19Canada
115421Kenza DaliNone21France
218146Leanne KiernanNone8Ireland
318147Kate LonghurstNone12England
418150Julia SimicNone10Germany
518151Gilly Louise Scarlett FlahertyGilly Flaherty5England
618153Alisha LehmannNone7Switzerland
722027Anne MoorhouseNone1England
823217Tessel MiddagNone23Netherlands
931553Cecilie Redisch KvammeNone2Norway

The output of this call is a little different, and needs to be subset to print nicely in markdown. But as a JSON format this is very easy in Python, by adding the “[‘West Ham United LFC’]” at the end of the call, we were able to subset all the West Ham line up data.

Lastly and most importantly, the event data can be called using one of two event functions in the library, either on a given match or an entire league.

sb.events will call the events for a given match, by passing the match id within the function. sb.competition_events will get all the events from a specified league with details found on the library github site.

### Run function to call the events from a single match
## Run the event function using the assigned match from above
match_events = sb.events(match_id = match)
match_events.head(5)
credentials were not supplied. open data access only
50_50bad_behaviourball_receiptball_recoveryblockcarryclearancecounterpressdribbleduelpossession_teamrelated_eventssecondshotsubstitutiontacticsteamtimestamptypeunder_pressure
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNReading WFCNaN0NaNNaN{‘formation’: 41212, ‘lineup’: [{‘player’: {’i…Reading WFC00:00:00.000Starting XINaN
1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNReading WFCNaN0NaNNaN{‘formation’: 4231, ‘lineup’: [{‘player’: {’id…West Ham United LFC00:00:00.000Starting XINaN
2NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNReading WFC[035f18f5-8767-475f-b96b-b1548c2fd642]0NaNNaNNaNWest Ham United LFC00:00:00.000Half StartNaN
3NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNReading WFC[da9c5398-dae9-4a3d-b821-fd600b54a55d]0NaNNaNNaNReading WFC00:00:00.000Half StartNaN
4NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNReading WFC[f0bd2ba7-a946-4414-b04f-aeeae0928f31]0NaNNaNNaNWest Ham United LFC00:00:00.000Half StartNaN

5 rows × 41 columns

As we can see, there is a lot of information to be found in the event files, which will require a lot of data transformation before we can use this effectively. This will be the aim of my next tutorial in Python.

I hope to add more Python tutorials in the next little while, but until then, stay safe out there!

comments powered by Disqus

Related