Plotting Event Data in Matplotlib

Plotting Event Data with Python

It’s been some time since I last posted a tutorial, let alone one in Python. So I thought now the time is better than ever to get back to it. In this tutorial I am going to run through plotting match events from StatsBomb using Python and Matplotlib. We are going to call the StatsBomb open data set using their Python package and then plot data from a few different scenarios. So let’s get started.

First we need to load the important libraries into Python.

# Read in libraries
import json
from statsbombpy import sb          # Used to obtain StatsBomb data. 
import statsbomb as sbp
import pandas as pd                 # Read and manipulate data.
import numpy as np                  # Read and manipulate data.
from pandas.io.json import json_normalize

import matplotlib.pyplot as plt     # Plotting data
from mplsoccer.pitch import Pitch

Now that we have the libraries, we can start to call the StatsBomb library for some data. We have a few options for this but first let’s see what competitons we have available to us. From our free datasets, we have the following Female competitions to look at.

comps = sb.competitions()
comps[comps.competition_gender == 'female']
credentials were not supplied. open data access only
competition_idseason_idcountry_namecompetition_namecompetition_genderseason_namematch_updatedmatch_available
153742EnglandFA Women’s Super Leaguefemale2019/20202020-08-12T11:24:04.4830902020-08-12T11:24:04.483090
16374EnglandFA Women’s Super Leaguefemale2018/20192020-07-29T05:002020-07-29T05:00
32493United States of AmericaNWSLfemale20182020-07-29T05:002020-07-29T05:00
347230InternationalWomen’s World Cupfemale20192020-07-29T05:002020-07-29T05:00

So now we have this, let’s find a single match to pull data from.

matches = sb.matches(competition_id=37, season_id=42)
matches.head(5)
credentials were not supplied. open data access only
match_idmatch_datekick_offcompetitionseasonhome_teamaway_teamhome_scoreaway_scorematch_statuslast_updatedmatch_weekcompetition_stagestadiumrefereedata_versionshot_fidelity_versionxy_fidelity_version
022750542020-01-0515:00:00.000England - FA Women’s Super League2019/2020Brighton & Hove Albion WFCLiverpool WFC10available2020-07-29T05:0011Regular SeasonNaNNaN1.1.022
122750722020-01-0513:30:00.000England - FA Women’s Super League2019/2020Chelsea FCWReading WFC31available2020-07-29T05:0011Regular SeasonThe Cherry Red Records StadiumS. Pearson1.1.022
222750852020-01-0515:00:00.000England - FA Women’s Super League2019/2020Tottenham Hotspur WomenManchester City WFC14available2020-07-29T05:0011Regular SeasonThe Hive StadiumH. Conley1.1.022
322751132020-01-1916:00:00.000England - FA Women’s Super League2019/2020West Ham United LFCBrighton & Hove Albion WFC21available2020-07-29T05:0013Regular SeasonThe Rush Green StadiumRyan Atkin1.1.022
4198002019-03-1420:30:00.000England - FA Women’s Super League2019/2020Arsenal WFCBristol City WFC40available2020-08-12T11:24:04.4830901Regular SeasonMeadow ParkR. Whitton1.1.0NoneNone

We can just use the first match on the list to pull all the events from. For this tutorial, we will pull the event data as a split dataset, split the data in to the events we want to look at. This will allow us to create a few different visuals for this match.

Shots

The first thing we will plot is shots from a single match. We have the match from above, so now we can pull the events from this match and split a specific type or event. First we will split the shots from our eventdata set to create a single shot plot.

# Call the event API through the statsbomb package.
eventdata = sb.events(match_id=2275054, split=True)

# Split the shot events from the rest of the data.
shotevents = eventdata['shots']

# Split the location data in to x/y values.
# Location data is provided as a list which is harder to use. 
shotevents[['location_x', 'location_y']] = shotevents['location'].apply(pd.Series)

# Define columns we want to keep further down.
shotCols = ['statsbomb_xg', 'end_location_y', 'end_location_x', 'end_location_z']

# Create a function to split specific columns into values. 
# This function will split the end_location values specifically from 
# the shot column. 
def parse_function(data) -> pd.DataFrame:
    df = pd.DataFrame(data)
    dfcolumns = df.columns
    for i in dfcolumns:
        try:
            df[[str(i) + '_y', str(i) + '_x', str(i) + '_z']] = df[i].apply(pd.Series)
            df = df.drop(i, axis = 1)
        except ValueError:
            pass
    return df
# Run the data through the parse function and keep the columns above.
shot_df = parse_function(shotevents['shot'].apply(pd.Series))
shot_df = shot_df[shotCols]

# Merge the data together in to one dataframe.
shotevents['statsbomb_xg'], shotevents['end_location_x'], shotevents['end_location_y'], shotevents['end_location_z'] = shot_df['statsbomb_xg'], shot_df['end_location_y'], shot_df['end_location_x'], shot_df['end_location_z'] 
shotevents.head(5)
credentials were not supplied. open data access only
idindexperiodtimestampminutesecondtypepossessionpossession_teamplay_patternshotmatch_idunder_pressureoutlocation_xlocation_ystatsbomb_xgend_location_xend_location_yend_location_z
03a4692e6-631c-47f4-8d34-644531797698115100:03:37.333337Shot10Liverpool WFCFrom Goal Kick{‘one_on_one’: True, ‘statsbomb_xg’: 0.1886289…2275054NaNNaN108.952.30.188629120.028.10.2
1a49554c0-8b60-4eb0-9949-526cfcb6d54e262100:08:22.408822Shot22Brighton & Hove Albion WFCFrom Throw In{‘statsbomb_xg’: 0.007219963, ‘end_location’: …2275054NaNNaN86.556.20.007220117.842.10.2
2de542aa0-a50e-4318-b006-c4fe6cb23b41642100:18:49.1691849Shot48Liverpool WFCFrom Corner{‘statsbomb_xg’: 0.12033855, ‘end_location’: […2275054NaNNaN115.739.10.120339120.038.64.9
36f812987-8b59-42cc-b699-bc9337b6269a705100:20:56.0642056Shot52Liverpool WFCFrom Corner{‘statsbomb_xg’: 0.37038276, ‘end_location’: […2275054NaNNaN113.345.40.370383120.045.10.2
4b4bd0579-da0a-46a0-9669-776989838113870100:27:55.3772755Shot60Liverpool WFCRegular Play{‘statsbomb_xg’: 0.011415341, ‘end_location’: …2275054NaNNaN93.021.30.011415120.045.04.5

5 rows × 27 columns

With our dataset, we had a few steps to work through to get a clean dataframe. For example, our shot column is a dict, meaning we need to parse out these values before we can use them easily in our pitch plots below.

Now we have our values, we can create our shot plot using Matplotlib and mplsoccer libraries.

# Setup the pitch
figsize = (16, 8)
pitch = Pitch(figsize=figsize, tight_layout=False, goal_type='box', pitch_color='#aabb97', line_color='white', stripe_color='#c2d59d', stripe=True)
fig, ax = pitch.draw()

# Store team names
t1name = shotevents.team.iloc[0]
t2name = list(set(shotevents.team.unique()) - set([t1name]))[0]

# Split data by team
team1 = shotevents[shotevents.team == t1name] 
team1['location_x'] = 120 - team1['location_x']
team1['location_y'] = 80 - team1['location_y']
team1['end_location_x'] = 120 - team1['end_location_x']
team1['end_location_y'] = 80 - team1['end_location_y']
team2 = shotevents[shotevents.team == t2name]

# Plot starting locations 
t1 = pitch.scatter(team1.location_x, team1.location_y, s=team1.statsbomb_xg*500, ax=ax, color="red", edgecolors="k", label="LFC")
t2 = pitch.scatter(team2.location_x, team2.location_y, s=team2.statsbomb_xg*500, ax=ax, color="darkblue", edgecolors="k", label="BHA")

# Plot the shot directions 
lt1 = pitch.lines(team1.location_x, team1.location_y, team1.end_location_x, team1.end_location_y, ax=ax, alpha=0.2, color="red", comet=True, label="LFC Shot")
lt2 = pitch.lines(team2.location_x, team2.location_y, team2.end_location_x, team2.end_location_y, ax=ax, alpha=0.2, color="blue", comet=True, label="BHA Shot")

# Add a legend and a title to our plot
legend = ax.legend(loc='lower center', labelspacing=1, fontsize=12, ncol=4)
title = ax.set_title(f'Shots of {t1name} vs {t2name}', fontsize = 18)

Shots Plotted by Team

There we have a nice lookng shot plot, with the lines for each shot and the size of the dot related to the xG for the shot taken. We can see this didn’t take too much time and the mplsoccer library really made the pitch plot look great.

Using comet=True also adds a really nice looking line that adds to the image well. Let’s give passes ago next using just the lines.

Passes

This time with our pass plot, we will do something slightly different and create a subplot to stack one team on top of the other. This will stop the plot looking crowded with both teams on the same figure. First we need to get our data, so let’s do the same thing as with our shots.

# Split the pass events from the rest of the data.
passevents = eventdata['passes']

# Split the location data in to x/y values.
# Location data is provided as a list which is harder to use. 
passevents[['location_x', 'location_y']] = passevents['location'].apply(pd.Series)

# Define columns we want to keep further down.
passCols = ['end_location_y', 'end_location_x', 'outcome_name']

# Create a function to split specific columns into values. 
# This function will split the end_location values specifically from 
# the shot column. 
def pass_parse_function(data) -> pd.DataFrame:
    df = pd.DataFrame(data)
    dfcolumns = df.columns
    for i in dfcolumns:
        try:
            df[[str(i) + '_x', str(i) + '_y']] = df[i].apply(pd.Series)
        except ValueError:
            pass

    return df
# Run the data through the parse function and keep the columns above.
pass_df = pass_parse_function(passevents['pass'].apply(pd.Series))
passoutcomes = pass_df['outcome'].apply(pd.Series)
pass_df = pass_df

# Merge the data together in to one dataframe.
passevents['end_location_x'], passevents['end_location_y'], passevents['outcome_name'] = pass_df['end_location_x'], pass_df['end_location_y'], passoutcomes['name']

passevents.head(5)
idindexperiodtimestampminutesecondtypepossessionpossession_teamplay_patternpassmatch_idunder_pressureoff_cameracounterpresslocation_xlocation_yend_location_xend_location_youtcome_name
0cb8110ef-c586-479d-8aaf-52d991c1a6da5100:00:00.01400Pass2Brighton & Hove Albion WFCFrom Kick Off{‘recipient’: {‘id’: 22337, ‘name’: ’Maya Le T…2275054NaNNaNNaN61.040.137.042.3NaN
12f58f14d-8cad-4d89-be9c-aa942e9acc328100:00:02.66402Pass2Brighton & Hove Albion WFCFrom Kick Off{‘recipient’: {‘id’: 16383, ‘name’: ’Danique K…2275054NaNNaNNaN36.239.729.656.2NaN
2e3fc9388-b818-49b4-bded-0eb34194cfa612100:00:06.96606Pass2Brighton & Hove Albion WFCFrom Kick Off{‘recipient’: {‘id’: 22337, ‘name’: ’Maya Le T…2275054NaNNaNNaN21.458.819.534.8NaN
3513cb3e7-e938-4a1a-a163-b598d7f8ed7616100:00:09.93909Pass2Brighton & Hove Albion WFCFrom Kick Off{‘recipient’: {‘id’: 16400, ‘name’: ’Kayleigh …2275054NaNNaNNaN21.234.265.575.7Incomplete
401634478-ec2a-4fa2-b9ec-5d9064a8e6b618100:00:13.524013Pass2Brighton & Hove Albion WFCFrom Kick Off{‘recipient’: {‘id’: 15631, ‘name’: ’Niamh Cha…2275054NaNNaNNaN54.64.471.10.1Out

5 rows × 26 columns

Now we have our data, we can create our plot. This time, we are going to build our subplot as the axis and then add our pitch to each subplot. We also need to specify our figure size within the subplot creation so we don’t get a small plot. Let’s see how this turns out.

# Setup the pitch
figsize = (25, 16)
pitchpass = Pitch(figsize=figsize, goal_type='box', pitch_color='#aabb97', line_color='white', stripe_color='#c2d59d', stripe=True)
fig, ax = plt.subplots(nrows=2, ncols=1, figsize=figsize)
pitch.draw(ax=ax[0])
pitch.draw(ax=ax[1])

# Split data by team
passteam1 = passevents[passevents.team == t1name] 
passteam2 = passevents[passevents.team == t2name]

# Create a boolean value to filter the data below for 
# complete and incomplete passes.
compass = passteam1.outcome_name.isna()
compass2 = passteam2.outcome_name.isna()

# Plot starting locations 
t1 = pitchpass.lines(passteam1[compass].location_x, passteam1[compass].location_y, passteam1[compass].end_location_x, passteam1[compass].end_location_y, ax=ax[0], color="gold", label="Completed Passes", comet=True, lw=2, transparent=True)
t1incom = pitchpass.lines(passteam1[~compass].location_x, passteam1[~compass].location_y, passteam1[~compass].end_location_x, passteam1[~compass].end_location_y, ax=ax[0], color="red", label="Incomplete Passes", comet=True, lw=2, transparent=True)

t2 = pitchpass.lines(passteam2[compass2].location_x, passteam2[compass2].location_y, passteam2[compass2].end_location_x, passteam2[compass2].end_location_y, ax=ax[1], color="gold", label="Completed Passes", comet=True, lw=2, transparent=True)
t2incom = pitchpass.lines(passteam2[~compass2].location_x, passteam2[~compass2].location_y, passteam2[~compass2].end_location_x, passteam2[~compass2].end_location_y, ax=ax[1], color="red", label="Incomplete Passes", comet=True, lw=2, transparent=True)

# Add a legend and a title to our plot
legend = ax[0].legend(loc='lower center', labelspacing=1, fontsize=12, ncol=4)
title = ax[0].set_title(f'Passes of {t1name}', fontsize = 18)
# Add a legend and a title to our plot
legend = ax[1].legend(loc='lower center', labelspacing=1, fontsize=12, ncol=4)
title = ax[1].set_title(f'Passes of {t2name}', fontsize = 18)

Passes Plotted by Team

How good is this, with the comet line we can see the start and end of the pass. While with the colours we can see the complete and incomplete passes easily.

Coming from R, coding these plots feels like it takes a lot, but in reality it is very similar just missing the pipe feature. But overall, I have to say I really like how these turned out.

Hope you all enjoyed this tutorial / walkthrough of creating plots using Matplotlib in Python. I had fun creating these and will be looking to use these more in the future.

comments powered by Disqus

Related