Power Bi - Importing Data
For this new post I decided to make a video tutorial of importing data from a csv in to Power Bi. The video itself is basic, and will appear at the bottom of this post, but I will accompany it below with a few steps and a guide of how I import data in to Bi.
For those eagle eyed viewers, you will notice the data I am importing and use is from the recently released StatsBomb Messi dataset, which you can find more about here and can read my tutorial on importing StatsBomb data in to R here. This is an incredible dataset and I can’t wait to dig in to it and will probably be the focus of many of these Power Bi posts.
Data Import in to Power Bi
When you open Power Bi, you will be greeted with a blank canvas and the ability to create whatever you please. To do that, you will need to import a dataset from one of the variety of sources Bi offers access too. For this tutorial, I used a Text/CSV import from a local source, but you can access anything from SQL to Web based url links. You can use that as an option to import data directly from OneDrive or a variety of other online storage systems. Once the data imports and loads in to the query editor this is where the work begins.
1. Loading Errors
Occassionally, when data loads in to Bi you will receive an error message. When you watch the video, you will see this happens to me and can be the result of a number of factors. In the dataset, I try to adjust values in to a binary outcome (1,0) and it is possibly to miss options. So one example of a loading error is the incorrect data type is detected by Bi. For example, StatsBomb generally provides True or False for outcomes rather than 1 or 0 and all blank rows are given an NA from R. I adjust all NA values to equal 0, but forget about the TRUE or False values causing the data type of the column to be incorrect. Once these are all fixed the data will load correctly and you should receive no errors.
But another possibility, is an incorrect date type column. I have had this before with a dataset from a different locale. Power Bi offers the ability to use a locale to convert a date column. This means, that the data will be detected as being from that location and converted appropriately without causing errors. Power Bi has the ability to detect these errors in the query editor preview, so take your time to check through your dataset before applying any changes.
One of the big aspects for Power Bi is the relationships between your datasets that you import. This helps form the basis of your datasets and how they interact with eachother. For example, I will often use a unique dataset of names or dates that I use to filter across my main datasets. So if you were to have multiple datasets that you wanted to effectively join, this is an easy way of achieving this. Without the relationships active, slicers will not work and more complicated measures and equations would need to be employed, taking away one of the benefits of Power Bi as a visualisation tool.
3. Date tables
I like to keep a master date table that is separate from each dataset, but uses the datasets to form the minimum and maximum dates of the table. To achieve this, in the modelling tab of the ribbon is a button to create a new table. The formula bar will appear and in here you can type in the table name and the formula you want to use to create the table. For a date table, use the Calendar function to create a list of dates between a minimum and maximum date value. I use my full dataset to pull the minimum and maximum match dates, which is something I use frequently across my daily work projects. You could also create a unique list of match dates rather than all dates in the period, it is completely up to you. Once created, make sure the table has a relationship with the date columns of your other datasets so that the filter will work across all tables.
4. Other things to consider
When importing a CSV or Text file, the right encoding type is important if you have special characters in your dataset for names of athletes. You will see through the video that I encounter this issue but with a simple change of the encoding type, the special characters appear correctly.
When publishing to the Power Bi Cloud service, local files cannot be automatically refreshed, you would have to republish the report from Bi Desktop. This is important to consider if you are updating your datasets frequently and want this to be immediately available to end users. One way of solving this is by using a web URL source for a OneDrive document or connecting directly to sharepoint. Sharepoint does have some limitations though so I would personally suggest OneDrive URL link as the best option. This gives you the ability to schedule a refresh for multiple timepoints through the day.
5. Measures and Calculated Columns
Power Bi uses DAX formulas to create measures and calculated columns to visualise specific information that may not be already in your dataset. For example, in the StatsBomb data, shot outcome is a string value (Goal, Off T or Miss etc) and so to visualise a count of this we need to create a measure or a column to adjust this value in to a binary (1,0) form. I went for a calculated column, using an IF function to find where the term “Goal” was and assigning it a value of 1. This allows me to now sum all the 1’s in a column and to visualise this over time. This is one method of using measures or Calculated Columns in Bi but you can use these for a variety of different reasons and both have their advantages and disadvantages, which I will address in a future tutorial.
Power Bi is an incredibly powerful and useful tool for data manipulation and visulisation. Once all data is imported, creation of insightful reports is made easy giving users the ability to run wild with the ideas. I can’t wait to show you more as I go.
Thanks again to StatsBomb for the data used in this video.
Any questions please feel free to contact me