Reformatting Statsbomb Data in Python

For this tutorial, I am going to carry on from where we finished in my first tutorial, which you can find that tutorial here. In that tutorial, we downloaded and installed the statsbombpy library and ran through the basic calls to download the free data released by Statsbomb.

In this tutorial, I am going to start with the basic call to get the match events from a single game, and start to parse out some of the information embeded within the file we receive from the call. It’s important to note, that the basic call in the Statsbomb library parses the JSON file in to a “tidy” dataframe. This means we are working with a Pandas dataframe and not with the raw JSON file. I will parse a raw JSON file in a future tutorial.

So let’s get started, first we need to import the libraries that we need to use.

# Read in appropriate libraries
from statsbombpy import sb # Statsbomb library to obtain data
import pandas as pd # Used to read in and manipulate data
import numpy as np # Used to help manipulate data

Once we have imported the libraries, we need to call the sb.events function to get the events from a single match. Once we do that, we need to have a look at the file, so we can do that by calling the head function on the first so many rows.

To view some of this data, you will need to scroll to the right of the tables presented below. This will apply to all tables in this blog and unfortunately was not something I could adjust.

### Run function to call the events from a single match
## Add match_id here
match = 2275038

## Run the event function using the assigned match from above
match_events = sb.events(match_id = match)
match_events.head(10)
credentials were not supplied. open data access only
50_50bad_behaviourball_receiptball_recoveryblockcarryclearancecounterpressdribbledueldurationfoul_committedfoul_wongoalkeeperhalf_startidindexinjury_stoppageinterceptionlocationmatch_idminutemiscontroloff_cameraoutpassperiodplay_patternplayerpositionpossessionpossession_teamrelated_eventssecondshotsubstitutiontacticsteamtimestamptypeunder_pressure
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaN9579b6c0-b747-4ab7-9aa4-9aff4b8528271NaNNaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFCNaN0NaNNaN{‘formation’: 41212, ‘lineup’: [{‘player’: {’i…Reading WFC00:00:00.000Starting XINaN
1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaNa4a12e95-b01a-4042-879f-45c3d992e9692NaNNaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFCNaN0NaNNaN{‘formation’: 4231, ‘lineup’: [{‘player’: {’id…West Ham United LFC00:00:00.000Starting XINaN
2NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.100000NaNNaNNaN{‘late_video_start’: True}da9c5398-dae9-4a3d-b821-fd600b54a55d3NaNNaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFC[035f18f5-8767-475f-b96b-b1548c2fd642]0NaNNaNNaNWest Ham United LFC00:00:00.000Half StartNaN
3NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.100000NaNNaNNaN{‘late_video_start’: True}035f18f5-8767-475f-b96b-b1548c2fd6424NaNNaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFC[da9c5398-dae9-4a3d-b821-fd600b54a55d]0NaNNaNNaNReading WFC00:00:00.000Half StartNaN
4NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaN5a4cee02-6737-48e2-918f-724080b374711524NaNNaNNaN227503845NaNNaNNaNNaN2Regular PlayNaNNaN96Reading WFC[f0bd2ba7-a946-4414-b04f-aeeae0928f31]0NaNNaNNaNWest Ham United LFC00:00:00.000Half StartNaN
5NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaNf0bd2ba7-a946-4414-b04f-aeeae0928f311525NaNNaNNaN227503845NaNNaNNaNNaN2Regular PlayNaNNaN96Reading WFC[5a4cee02-6737-48e2-918f-724080b37471]0NaNNaNNaNReading WFC00:00:00.000Half StartNaN
6NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.771676NaNNaNNaNNaNc0cc1e3b-af5c-448c-82cc-08e546a72f5b5NaNNaN[61.0, 40.1]22750380NaNNaNNaN{‘recipient’: {‘id’: 10251, ‘name’: ’Fara Will…1From Kick OffJade MooreCenter Defensive Midfield2Reading WFC[99c4a406-f3d4-4bd0-b5b0-3a0598ae54dd]0NaNNaNNaNReading WFC00:00:00.046PassNaN
7NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2.831446NaNNaNNaNNaN7172fc12-eaf2-4e9c-9f82-16950a04cfa78NaNNaN[54.8, 40.5]22750380NaNNaNNaN{‘recipient’: {‘id’: 15725, ‘name’: ’Natasha H…1From Kick OffFara WilliamsCenter Attacking Midfield2Reading WFC[91ca5def-0b84-4d2d-9313-68418f3e1b3a]0NaNNaNNaNReading WFC00:00:00.897PassNaN
8NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.307655NaNNaNNaNNaN188815ee-705c-4deb-951b-97348bf7838f16NaNNaN[33.2, 2.8]22750380NaNNaNNaN{‘recipient’: {‘id’: 18147, ‘name’: ’Kate Long…1From Kick OffLaura VetterleinLeft Back2Reading WFC[a0d2f369-a161-424e-85a2-419a4fc693da]8NaNNaNNaNWest Ham United LFC00:00:08.899PassNaN
9NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2.176197NaNNaNNaNNaN91685ff0-0d95-497a-9a8d-af14a6851ef625NaNNaN[77.4, 74.1]22750380NaNNaNNaN{‘recipient’: {‘id’: 15725, ‘name’: ’Natasha H…1From Kick OffFara WilliamsCenter Attacking Midfield2Reading WFC[dc4d1ac5-1444-438d-8d5b-572b9707048b]13NaNNaNNaNReading WFC00:00:13.385PassNaN

So we have 41 columns of data, but we can’t see them all as Pandas will cut some of the columns out so as to not display too much information on the page. We can however print the column headers and see what we have to work with, or we can change some Pandas options to print the entire 41 columns for us. So lets change some options so we can also see what the data in these columns might look like.

### Change Pandas options to print max columns
pd.set_option('display.max_columns', None)

### Reprint head of data sorting by the minute column
match_events.sort_values('minute').head(10)
50_50bad_behaviourball_receiptball_recoveryblockcarryclearancecounterpressdribbledueldurationfoul_committedfoul_wongoalkeeperhalf_startidindexinjury_stoppageinterceptionlocationmatch_idminutemiscontroloff_cameraoutpassperiodplay_patternplayerpositionpossessionpossession_teamrelated_eventssecondshotsubstitutiontacticsteamtimestamptypeunder_pressure
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaN9579b6c0-b747-4ab7-9aa4-9aff4b8528271NaNNaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFCNaN0NaNNaN{‘formation’: 41212, ‘lineup’: [{‘player’: {’i…Reading WFC00:00:00.000Starting XINaN
764NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNa0d2f369-a161-424e-85a2-419a4fc693da18NaNNaN[44.3, 3.7]22750380NaNNaNNaNNaN1From Kick OffKate LonghurstLeft Defensive Midfield2Reading WFC[188815ee-705c-4deb-951b-97348bf7838f, 2103761…10NaNNaNNaNWest Ham United LFC00:00:10.207Ball Receipt*True
765NaNNaN{‘outcome’: {‘id’: 9, ‘name’: ‘Incomplete’}}NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNdc4d1ac5-1444-438d-8d5b-572b9707048b26NaNNaN[108.1, 70.0]22750380NaNNaNNaNNaN1From Kick OffNatasha HardingRight Back2Reading WFC[91685ff0-0d95-497a-9a8d-af14a6851ef6]15NaNNaNNaNReading WFC00:00:15.562Ball Receipt*NaN
766NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3bcacf01-dbde-4597-8c43-408c6212da6828NaNNaN[6.8, 31.6]22750380NaNNaNNaNNaN1From Free KickAnne MoorhouseGoalkeeper3West Ham United LFC[126b174d-4d51-43a5-9952-4a5657dc93b9]29NaNNaNNaNWest Ham United LFC00:00:29.562Ball Receipt*NaN
767NaNNaN{‘outcome’: {‘id’: 9, ‘name’: ‘Incomplete’}}NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNf559c7a8-cd68-4256-a0ac-96232523949e32NaNNaN[43.0, 75.0]22750380NaNNaNNaNNaN1From Free KickCecilie Redisch KvammeRight Back3West Ham United LFC[0f00f803-10bc-42dd-bdde-0123bee8b0c5]33NaNNaNNaNWest Ham United LFC00:00:33.215Ball Receipt*NaN
768NaNNaN{‘outcome’: {‘id’: 9, ‘name’: ‘Incomplete’}}NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN41477a37-41b3-48ef-842b-203dc3da9db635NaNNaN[71.0, 67.8]22750380NaNNaNNaNNaN1From Throw InLeanne KiernanCenter Attacking Midfield4West Ham United LFC[e4045b66-f292-48b2-af77-228460807a6f]58NaNNaNNaNWest Ham United LFC00:00:58.621Ball Receipt*NaN
1132NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1042ba43-e741-4487-b5c5-6977d53e64b91521NaNNaN[35.7, 44.3]22750380NaNNaNNaNNaN1Regular PlayKristine LeineLeft Center Back96Reading WFC[b43398f4-1953-4e94-83a0-944a3f72cb77]0NaNNaNNaNReading WFC00:00:00.212Ball Receipt*NaN
1418NaNNaNNaNNaNNaN{‘end_location’: [84.6, 72.5]}NaNNaNNaNNaN0.610300NaNNaNNaNNaN901ab741-eee0-4ae2-920f-24e23f4da69511NaNNaN[77.7, 75.4]22750380NaNNaNNaNNaN1From Kick OffNatasha HardingRight Back2Reading WFC[018f464d-055c-459e-b966-3f3dc6f60b19, 91ca5de…3NaNNaNNaNReading WFC00:00:03.729CarryTrue
1419NaNNaNNaNNaNNaN{‘end_location’: [33.2, 2.8]}NaNNaNNaNNaN4.470062NaNNaNNaNNaN29dbf328-c64f-4974-a703-14d060348f1d14NaNNaN[33.2, 7.6]22750380NaNNaNNaNNaN1From Kick OffLaura VetterleinLeft Back2Reading WFC[188815ee-705c-4deb-951b-97348bf7838f, 5b12bc3…4NaNNaNNaNWest Ham United LFC00:00:04.429CarryTrue
1420NaNNaNNaNNaNNaN{‘end_location’: [43.1, 4.3]}NaNNaNNaNNaN0.543583NaNNaNNaNNaN4b94b9e0-f80b-44d1-ab38-4bbbdd549c5119NaNNaN[44.3, 3.7]22750380NaNNaNNaNNaN1From Kick OffKate LonghurstLeft Defensive Midfield2Reading WFC[21037614-fe96-4eaa-af62-6d661507cc37, 3456b6d…10NaNNaNNaNWest Ham United LFC00:00:10.207CarryTrue

Great, now we can see all 41 columns and the data they contain. This should help us with rearranging and parsing out the data we need. Data for location for example provides an x / y value, as a list within the dataframe. This will need to be separated out before we could save this file, or use it effectively.

To do this, we are going to use a pd.Series. After reading a lot online, this is a slower method than using a Numpy tolist method, but handles NaN values much easier, which is something we are required to deal with in this dataset.

### First rename our dataframe
match_events_split = match_events

### Apply our split renaming columns for when we split the column
match_events_split[['location_x', 'location_y']] = match_events_split['location'].apply(pd.Series)

### Drop our location column as we don't need this anymore
match_events_split = match_events_split.drop('location', axis = 1)

### View the top of our file again to see this worked
match_events_split.head(10)
50_50bad_behaviourball_receiptball_recoveryblockcarryclearancecounterpressdribbledueldurationfoul_committedfoul_wongoalkeeperhalf_startidindexinjury_stoppageinterceptionmatch_idminutemiscontroloff_cameraoutpassperiodplay_patternplayerpositionpossessionpossession_teamrelated_eventssecondshotsubstitutiontacticsteamtimestamptypeunder_pressurelocation_xlocation_y
0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaN9579b6c0-b747-4ab7-9aa4-9aff4b8528271NaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFCNaN0NaNNaN{‘formation’: 41212, ‘lineup’: [{‘player’: {’i…Reading WFC00:00:00.000Starting XINaNNaNNaN
1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaNa4a12e95-b01a-4042-879f-45c3d992e9692NaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFCNaN0NaNNaN{‘formation’: 4231, ‘lineup’: [{‘player’: {’id…West Ham United LFC00:00:00.000Starting XINaNNaNNaN
2NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.100000NaNNaNNaN{‘late_video_start’: True}da9c5398-dae9-4a3d-b821-fd600b54a55d3NaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFC[035f18f5-8767-475f-b96b-b1548c2fd642]0NaNNaNNaNWest Ham United LFC00:00:00.000Half StartNaNNaNNaN
3NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.100000NaNNaNNaN{‘late_video_start’: True}035f18f5-8767-475f-b96b-b1548c2fd6424NaNNaN22750380NaNNaNNaNNaN1Regular PlayNaNNaN1Reading WFC[da9c5398-dae9-4a3d-b821-fd600b54a55d]0NaNNaNNaNReading WFC00:00:00.000Half StartNaNNaNNaN
4NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaN5a4cee02-6737-48e2-918f-724080b374711524NaNNaN227503845NaNNaNNaNNaN2Regular PlayNaNNaN96Reading WFC[f0bd2ba7-a946-4414-b04f-aeeae0928f31]0NaNNaNNaNWest Ham United LFC00:00:00.000Half StartNaNNaNNaN
5NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000000NaNNaNNaNNaNf0bd2ba7-a946-4414-b04f-aeeae0928f311525NaNNaN227503845NaNNaNNaNNaN2Regular PlayNaNNaN96Reading WFC[5a4cee02-6737-48e2-918f-724080b37471]0NaNNaNNaNReading WFC00:00:00.000Half StartNaNNaNNaN
6NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.771676NaNNaNNaNNaNc0cc1e3b-af5c-448c-82cc-08e546a72f5b5NaNNaN22750380NaNNaNNaN{‘recipient’: {‘id’: 10251, ‘name’: ’Fara Will…1From Kick OffJade MooreCenter Defensive Midfield2Reading WFC[99c4a406-f3d4-4bd0-b5b0-3a0598ae54dd]0NaNNaNNaNReading WFC00:00:00.046PassNaN61.040.1
7NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2.831446NaNNaNNaNNaN7172fc12-eaf2-4e9c-9f82-16950a04cfa78NaNNaN22750380NaNNaNNaN{‘recipient’: {‘id’: 15725, ‘name’: ’Natasha H…1From Kick OffFara WilliamsCenter Attacking Midfield2Reading WFC[91ca5def-0b84-4d2d-9313-68418f3e1b3a]0NaNNaNNaNReading WFC00:00:00.897PassNaN54.840.5
8NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.307655NaNNaNNaNNaN188815ee-705c-4deb-951b-97348bf7838f16NaNNaN22750380NaNNaNNaN{‘recipient’: {‘id’: 18147, ‘name’: ’Kate Long…1From Kick OffLaura VetterleinLeft Back2Reading WFC[a0d2f369-a161-424e-85a2-419a4fc693da]8NaNNaNNaNWest Ham United LFC00:00:08.899PassNaN33.22.8
9NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2.176197NaNNaNNaNNaN91685ff0-0d95-497a-9a8d-af14a6851ef625NaNNaN22750380NaNNaNNaN{‘recipient’: {‘id’: 15725, ‘name’: ’Natasha H…1From Kick OffFara WilliamsCenter Attacking Midfield2Reading WFC[dc4d1ac5-1444-438d-8d5b-572b9707048b]13NaNNaNNaNReading WFC00:00:13.385PassNaN77.474.1

Now we have our locations parsed out, each of our event type columns, such as pass or shot, have a variety of information also contained within them. Such as if the pass was an assist, who received the pass and how long the pass was. All of this information is provided as a Numpy array within the dataframe. If we isolate the pass column and drop all NaN values, this is what we get.

### Select the pass column
pass_data_raw = match_events_split['pass']

### Drop NaN values from our selected column
pass_data_raw.dropna().head()
6     {'recipient': {'id': 10251, 'name': 'Fara Will...
7     {'recipient': {'id': 15725, 'name': 'Natasha H...
8     {'recipient': {'id': 18147, 'name': 'Kate Long...
9     {'recipient': {'id': 15725, 'name': 'Natasha H...
10    {'recipient': {'id': 22027, 'name': 'Anne Moor...
Name: pass, dtype: object

As we can see, there are lists within lists here and the information provided might be common across multiple event types within this dataset. Let’s see if we can pull anything further out of here and create a nice little dataframe of the information.

First, we can split the list in to a dataframe of values and rather than having our list of lists.

### Convert our list in to a dataframe
pass_data = pass_data_raw.apply(pd.Series)

### Filter our list to find our pass values
pass_data[pass_data.length >= 0]
0aerial_wonangleassisted_shot_idbody_partcrosscut_backdeflectedend_locationgoal_assistheightinswinginglengthmiscommunicationno_touchoutcomeoutswingingrecipientshot_assistswitchtechniquethrough_balltype
6NaNNaN2.900027NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[54.1, 41.8]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN7.106335NaNNaNNaNNaN{‘id’: 10251, ‘name’: ‘Fara Williams’}NaNNaNNaNNaN{‘id’: 65, ‘name’: ‘Kick Off’}
7NaNNaN0.990103NaN{‘id’: 38, ‘name’: ‘Left Foot’}NaNNaNNaN[77.7, 75.4]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN41.742306NaNNaNNaNNaN{‘id’: 15725, ‘name’: ‘Natasha Harding’}NaNNaNNaNNaNNaN
8NaNNaN0.080904NaN{‘id’: 38, ‘name’: ‘Left Foot’}NaNNaNNaN[44.3, 3.7]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN11.136427NaNNaNNaNNaN{‘id’: 18147, ‘name’: ‘Kate Longhurst’}NaNNaNNaNNaNNaN
9NaNNaN-0.132765NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[108.1, 70.0]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN30.972569NaNNaN{‘id’: 76, ‘name’: ‘Pass Offside’}NaN{‘id’: 15725, ‘name’: ‘Natasha Harding’}NaNNaNNaNNaNNaN
10NaNNaN2.101826NaN{‘id’: 38, ‘name’: ‘Left Foot’}NaNNaNNaN[6.8, 31.6]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN21.918486NaNNaNNaNNaN{‘id’: 22027, ‘name’: ‘Anne Moorhouse’}NaNNaNNaNNaN{‘id’: 62, ‘name’: ‘Free Kick’}
757NaNNaN-1.799721NaNNaNNaNNaNNaN[112.1, 69.7]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN10.575916NaNNaNNaNNaN{‘id’: 26570, ‘name’: ‘Amalie Vevle Eikeland’}NaNNaNNaNNaN{‘id’: 67, ‘name’: ‘Throw-in’}
758NaNNaN2.310073NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[110.2, 78.1]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN4.601087NaNNaNNaNNaN{‘id’: 15725, ‘name’: ‘Natasha Harding’}NaNNaNNaNNaNNaN
759NaNNaN-0.851966NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[112.3, 75.9]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN2.126029NaNNaN{‘id’: 9, ‘name’: ‘Incomplete’}NaNNaNNaNNaNNaNNaNNaN
760NaNNaN-1.830611NaNNaNNaNNaNNaN[109.5, 72.1]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN8.174350NaNNaNNaNNaN{‘id’: 10190, ‘name’: ‘Jade Moore’}NaNNaNNaNNaN{‘id’: 67, ‘name’: ‘Throw-in’}
761NaNNaN-1.843406NaNNaNNaNNaNNaN[111.1, 70.7]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN9.656604NaNNaN{‘id’: 77, ‘name’: ‘Unknown’}NaNNaNNaNNaNNaNNaN{‘id’: 67, ‘name’: ‘Throw-in’}

756 rows × 23 columns

Now we can see what our data actually includes, with a few of our columns still including a list of information., with ‘id’ and ‘name’ common within those columns, while we have a list of co-ordinates for our end locations as well. We can convert these columns one by one, but this would be time consuming to do for all individual variables in this dataset. For example, we can split each column like this:

### Split the height column in to separate columns
pass_data[['0', 'pass_height_id', 'pass_height_name']] = pass_data['height'].apply(pd.Series)

### Filter the dataframe to find our data
pass_data[pass_data.length >= 0]
0aerial_wonangleassisted_shot_idbody_partcrosscut_backdeflectedend_locationgoal_assistheightinswinginglengthmiscommunicationno_touchoutcomeoutswingingrecipientshot_assistswitchtechniquethrough_balltype0pass_height_idpass_height_name
6NaNNaN2.900027NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[54.1, 41.8]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN7.106335NaNNaNNaNNaN{‘id’: 10251, ‘name’: ‘Fara Williams’}NaNNaNNaNNaN{‘id’: 65, ‘name’: ‘Kick Off’}NaN1.0Ground Pass
7NaNNaN0.990103NaN{‘id’: 38, ‘name’: ‘Left Foot’}NaNNaNNaN[77.7, 75.4]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN41.742306NaNNaNNaNNaN{‘id’: 15725, ‘name’: ‘Natasha Harding’}NaNNaNNaNNaNNaNNaN3.0High Pass
8NaNNaN0.080904NaN{‘id’: 38, ‘name’: ‘Left Foot’}NaNNaNNaN[44.3, 3.7]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN11.136427NaNNaNNaNNaN{‘id’: 18147, ‘name’: ‘Kate Longhurst’}NaNNaNNaNNaNNaNNaN1.0Ground Pass
9NaNNaN-0.132765NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[108.1, 70.0]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN30.972569NaNNaN{‘id’: 76, ‘name’: ‘Pass Offside’}NaN{‘id’: 15725, ‘name’: ‘Natasha Harding’}NaNNaNNaNNaNNaNNaN3.0High Pass
10NaNNaN2.101826NaN{‘id’: 38, ‘name’: ‘Left Foot’}NaNNaNNaN[6.8, 31.6]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN21.918486NaNNaNNaNNaN{‘id’: 22027, ‘name’: ‘Anne Moorhouse’}NaNNaNNaNNaN{‘id’: 62, ‘name’: ‘Free Kick’}NaN1.0Ground Pass
757NaNNaN-1.799721NaNNaNNaNNaNNaN[112.1, 69.7]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN10.575916NaNNaNNaNNaN{‘id’: 26570, ‘name’: ‘Amalie Vevle Eikeland’}NaNNaNNaNNaN{‘id’: 67, ‘name’: ‘Throw-in’}NaN3.0High Pass
758NaNNaN2.310073NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[110.2, 78.1]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN4.601087NaNNaNNaNNaN{‘id’: 15725, ‘name’: ‘Natasha Harding’}NaNNaNNaNNaNNaNNaN1.0Ground Pass
759NaNNaN-0.851966NaN{‘id’: 40, ‘name’: ‘Right Foot’}NaNNaNNaN[112.3, 75.9]NaN{‘id’: 1, ‘name’: ‘Ground Pass’}NaN2.126029NaNNaN{‘id’: 9, ‘name’: ‘Incomplete’}NaNNaNNaNNaNNaNNaNNaNNaN1.0Ground Pass
760NaNNaN-1.830611NaNNaNNaNNaNNaN[109.5, 72.1]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN8.174350NaNNaNNaNNaN{‘id’: 10190, ‘name’: ‘Jade Moore’}NaNNaNNaNNaN{‘id’: 67, ‘name’: ‘Throw-in’}NaN3.0High Pass
761NaNNaN-1.843406NaNNaNNaNNaNNaN[111.1, 70.7]NaN{‘id’: 3, ‘name’: ‘High Pass’}NaN9.656604NaNNaN{‘id’: 77, ‘name’: ‘Unknown’}NaNNaNNaNNaNNaNNaN{‘id’: 67, ‘name’: ‘Throw-in’}NaN3.0High Pass

756 rows × 26 columns

Now that was relatively simple to do, but having to change the values for each column we want to split like this will take a fair amount of time to do. What we could do is write a function that checks each column of the dataframe and apply a function to it.

### Split pass data in to dataframe
pass_data_split = pass_data_raw.apply(pd.Series)

### Create function to loop through columns
### and apply a function to split the column
### in to id and name columns.
def pass_parse_function(data) -> pd.DataFrame:
    
    df = pd.DataFrame(data)
    dfcolumns = df.columns
    for i in dfcolumns:
        try: 
            df[['0', str(i) + '_id', str(i) + '_name']] = df[i].apply(pd.Series)
            df = df.drop(i, axis = 1)
        except ValueError:
            pass
    
    return df

### Run the function using the split dataframe
pass_df = pass_parse_function(pass_data_split)

### View the data from the function
pass_df[pass_df.length >= 0].head(10)
0aerial_wonangleassisted_shot_idcrosscut_backdeflectedend_locationgoal_assistinswinginglengthmiscommunicationno_touchoutswingingshot_assistswitchthrough_ball0body_part_idbody_part_nameheight_idheight_nameoutcome_idoutcome_namerecipient_idrecipient_nametechnique_idtechnique_nametype_idtype_name
6NaNNaN2.900027NaNNaNNaNNaN[54.1, 41.8]NaNNaN7.106335NaNNaNNaNNaNNaNNaNNaN40.0Right Foot1.0Ground PassNaNNaN10251.0Fara WilliamsNaNNaN65.0Kick Off
7NaNNaN0.990103NaNNaNNaNNaN[77.7, 75.4]NaNNaN41.742306NaNNaNNaNNaNNaNNaNNaN38.0Left Foot3.0High PassNaNNaN15725.0Natasha HardingNaNNaNNaNNaN
8NaNNaN0.080904NaNNaNNaNNaN[44.3, 3.7]NaNNaN11.136427NaNNaNNaNNaNNaNNaNNaN38.0Left Foot1.0Ground PassNaNNaN18147.0Kate LonghurstNaNNaNNaNNaN
9NaNNaN-0.132765NaNNaNNaNNaN[108.1, 70.0]NaNNaN30.972569NaNNaNNaNNaNNaNNaNNaN40.0Right Foot3.0High Pass76.0Pass Offside15725.0Natasha HardingNaNNaNNaNNaN
10NaNNaN2.101826NaNNaNNaNNaN[6.8, 31.6]NaNNaN21.918486NaNNaNNaNNaNNaNNaNNaN38.0Left Foot1.0Ground PassNaNNaN22027.0Anne MoorhouseNaNNaN62.0Free Kick
11NaNNaN0.760755NaNNaNNaNNaN[37.8, 65.3]NaNNaN40.175865NaNNaNNaNNaNNaNNaNNaN40.0Right Foot2.0Low Pass9.0Incomplete31553.0Cecilie Redisch KvammeNaNNaNNaNNaN
12NaNNaN-0.237374NaNNaNNaNNaN[71.9, 74.0]NaNNaN25.515486NaNNaNNaNNaNNaNNaNNaNNaNNaN3.0High Pass9.0Incomplete18146.0Leanne KiernanNaNNaN67.0Throw-in
13NaNNaN-2.696125NaNNaNNaNNaN[57.3, 25.3]NaNNaN12.300406NaNNaNNaNNaNNaNNaNNaN40.0Right Foot2.0Low Pass9.0Incomplete31628.0Kristine LeineNaNNaNNaNNaN
14NaNNaN1.010365NaNNaNNaNNaN[66.6, 75.5]NaNNaN23.139793NaNNaNNaNNaNNaNNaNNaN38.0Left Foot1.0Ground PassNaNNaN31553.0Cecilie Redisch KvammeNaNNaN62.0Free Kick
15NaNNaN-0.167896NaNNaNNaNNaN[71.9, 74.5]NaNNaN5.984146NaNNaNNaNNaNNaNNaNNaN40.0Right Foot1.0Ground PassNaNNaN18153.0Alisha LehmannNaNNaNNaNNaN

So there we have a function that parses out our columns and applies a new name to each colunmn. This is far simpler, and quicker than trying to write out a single line of code for each column we need to pull details from.

I’m not sure if this is the quickest function for doing this, so if anyone who uses Python is reading this, feel free to let me know if there is a better method of doing it.

Hopefully this tutorial provides you with some good information on using Python to look at Statsbomb data. I know there is a few things I can clean up in this process and hopefully I can learn / show you all those in the future.

comments powered by Disqus

Related