11 Practical 4: Exploratory data analysis in Tableau
During this practical we will go over conducting exploratory analysis in Tableau. We will go over:
- Level of Detail calculations
- Creating overview tables
- Creating scatter plots
If you want to follow along with the demonstration than download the P1_Demonstration_completed.twbx here
11.1 Level of Detail calculations
Level of details calculations can come in handy when we want to make sure we fix our calculations on a specific variable. E.g. in the demonstration example we have lap times from 1996 up until 2014, for all drivers in the F1, for every lap they drove in every race. This results in a large data set with multiple rows per Grand Prix, driver and year. What now if you we want to display the average fastest lap time over all the Grand Prix driven per year? To do that we need to identify the fastest lap for each Grand Prix and then calculate the average. However, if we create a ‘normal’ average in Tableau it will include all lap times. Therefore, we will create a calculations which calculates the fastest time per Grand Prix per year, and from there we can then calculate the average. To do this we will need to create a new variable which only takes in the one value per year per Grand Prix. In Tableau we can do this with what we call Level of Detail (LOD) calculations.
LOD calculations enable us to tell Tableau which dimension in the data source is unique for each row value, or the combination of dimensions. In our example, this would be the Race id (i.e. we want one fastest time per race). Once we know what our fixing dimension is we can create an LOD calculation which will ensure we do not use any of the other lap times in our future calculations. To do this go to Analysis - Create Calculated Field and enter the calculation below.
{FIXED [Race Id]: MIN([LapTimeSeconds])}
In the example above I apply the MIN function, this means Tableau will take the minimum value (i.e. fastest value) of LapTimeSeconds.
Creating calculated fields as we have done in the previous practical means we apply the calculation to each individual data point. LOD calculations and Table calculations (which we will cover later) allow us to apply a calculation to a group of data points.
11.2 Exercises
For these exercises you will use the datafiles cleaned during Practical 2. If you did not save them you can found a copy here.
Exercise 1: First import your Merged_Olympics.csv and world_pop_pivot.csv into Tableau. Join the two together by dragging the world_pop_pivot table into the data connection pane (this will create a relationship). Similar to when you created a join in Tableau Prep, you will need to select a join/relationship variable. This will be Year, NOC and…?
For video demonstration click here
Exercise 2: Can you create a basic overview table which displays the group average for Age, Height, Weight of the athletes by year of participation using LOD calculations?
You can fix an LOD calculation on more than 1 variable.
For video demonstration click here
Steps:
1. Go to Analysis - Create Calculated Field
2. Enter {Fixed [Name], [Year]: AVG([Age])} and name the variable Age corrected
3. Click OK
4. Do the same for Height and Weight but change AVG(Age) to Avg(Height) and Avg(Weight)
5. Create a table by dragging the calculated Age, Height and Weight to the columns and Year to the rows 6. Change year to “Discrete”
7. Change the measure from Sum to Average and change to table format.
8. Note in the video I also add the standard deviation, feel free to try this.
Exercise 3: Next can you create a table which displays total number of medals won by individual athletes?
For video demonstration click here
Steps:
1. Drag Medal and Name to the column and row shelves
2. In the Data Pane change Medal to Measure - Count
3. Drag Medal into the Text Mark card (change to Count if not already done so earlier)
4. Right click on “No Medal” and exclude this column (we do not want to include this data in the total medals count).
5. Go to the Analytics tab (next to the Data tab) and drag over “Totals” to display the “Row Totals” 6. You can choose to hide the Gold, Silver, and Bronze columns (if you exclude them they will impact on the grand total, hiding will still count them but not show them)
Exercise 4: Can you create a table which shows average age, height and weight during the 1988 and 2016 Olympics split by sex. Additionally, can you include the total number of medals won, number of males and females taking part, the total number of events available, total number of different sports taking part and the total number of countries taking part.
For video demonstration click here
Summary of steps:
- Ensure you filter for 1988 and 2016 using Year
- Pull Year and Sex to Rows (or columns depending on how you prefer your table)
- Pull the previoulsy calculated height, weight, and age to columns
- Calculate a Medals variable based on Medal which gives any Gold, Silver, or Bronze medal a 1 and all others a 0 (to ensure we don’t sum up the “No Medals” category).
- Drag the newly calculated variable over to the rows
- Drag the Event, Name, Country, and Sport variable over to the rows
- Change to table if not already done so and ensure you have selected to correct calculation for each variable. Note that if we are interested in number of Countries participating you want to only include each country ones (use the Count(Distinct) calculation for this). Same counts for event, name and sport.
- Do you know why we use Name to count number of females and males and not Sex?
In the video above you saw me create a Medals variable in which I gave every entry with a gold, silver, or bronze medal a 1 and those without a 0. The reason for doing so is that I did not want to include “No Medals” in our medal count (this which was part of our Medal variable). If I had used the filter on Medals it would have excluded the No Medals category, however, this would have applied to all variables and therefore our averages would have not presented the total sample but just the athletes who won a medal. This is something to keep in mind!
We can see from the table in Tableau that over the years the average age has increased a little and the height and weight of athletes hasn’t changed that much. However, we can also see there has been an increase in the number of events, the medals up for grab and countries who take part. We can also see the gap between male and female participants has been reduced.
Next, we are interested in the total number of medals won by each country per year. Now we need to remember that team event medals show up multiple times (e.g. 4*100m sprint is counted as 4 medals for that country). We will again need to use an LOD calculation to account for this.
Exercise 5: Can you create a table which shows the total number of Bronze, Silver, and Gold medals by country during the 2012 and 2016 Olympics?
For the LOD you want to use Event, Year, Country, and Medal as your fixed variables.
For video demonstration click here
Note in the video I create a true false variable we use as a filter. You could have achieved the same if you applied min or max to the Medals variable within the LOD (e.g. { FIXED [Year], [Event], [Country], [Medal]: MAX([Medals only])})
Now we have an overview of the total medals per country, I’m interested to see if this is correlated with the population number of the country.
Exercise 6: Can you create a scatter plot between total medals won per year and population for the Olympics since 2000.
For video demonstration click here
The results above indicate there may be a significant positive correlation between population size and medals won.
Exercise 7: Last, let’s save our work as a tableau packaged workbook.