3  Lecture 1 Handouts

Module introduction and Principles of Data Visualisation

3.1 Today’s Session

  • Introduction to the module and its objectives
  • Why data visualisation?
  • From data to visualisation

3.2 Today’s learning objectives

  • Recall the module learning outcomes
  • Explain the use of data visualisation
  • Describe the principles of data visualisation

3.3 Introduction to the module

3.3.1 Module aims

  • Introduce you to the use of data visualisation in sport data analytics.
  • Improve your understanding about different data visualisation methods used in sports
  • Understand the basic principles of data visualisation
  • Learn how to tailor visualisations based on the different needs of end users

3.3.2 Module learning objectives

  • You will be able to evaluate the different data visualisation methods available in a professional sporting context.
  • You will be able to formulate the steps required to visualise data in a clear and understandable manner.
  • You will be able to manipulate and organise data and identify the correct data visualisation methods depending on their aim and audience.
  • You will be able to critique for and against the use of specific data visualisation methods depending on the message they want to share and audience they target.
  • You will be able to use Tableau to create interactive dashboards sharing sports performance data.

3.3.3 Content

Face-to-face session
- Lectures focus on theoretical understanding of data visualisation principles and process

- Interactive sessions in which case studies will be discussed, group tasks assigned and findings presented

- Practical demonstrations on the use of R and Tableau for data visualisation

Self-study
- A variety of readings, group work tasks and practical’s

3.3.4 Assessment

You are required to complete a:

  1. 500-word written critique. You will be given a sport data visualisation and will be asked to outline the strengths and weaknesses of this visualisation. You will use published literature to support your critique (20% module grade, week 6)
  2. Tableau data visualisation based on a publicly available sport data set. You will work through the 7-steps of data visualisation and use R and Tableau to create your dashboard. (80% module grade, week 11)

3.3.5 Software

This module makes use of R and Tableau. It is recommend you install R and R Studio on your personal computer. R and R studio are available from: https://posit.co/download/rstudio-desktop/

To gain access to Tableau on your personal computer please go to Tableau for Students to register, download the program and obtain a free licence.

3.4 What is data visualisation?

  • Data visualisation is the graphical representation of information and data. 
  • Visual elements like charts, graphs and maps provide an accessible way to see and understand trends, outliers and patterns in data.
  • Data visualisations enable story telling.
  • Creating effective data visualisations requires skill and knowledge.

3.5 What is data visualisation?

3.6 What is data visualisation?

3.7 Why visualise our data?

  • Approximately 70% of sense receptors are in our eyes
  • 40% of the cerebral cortex is involved in processing visual information
  • The visual connection to the brain has more bandwidth than other paths
  • Visual perception is intimately connected to understanding

3.8 Why visualise our data?

  • Our brain is powerful but working memory is limited
  • Working memory limited to a small number of “chunks”
  • Visualization allows us to consolidate complex statistics so we can process more data simultaneously (seeing the forest along with the trees)
  • The picture is not the end goal – It’s what we do with it that is important

3.9 Let’s play a game, how many 7s can you count?

On the next slide you will see a square of numbers You have 15s to count the 7s

3.10 Let’s play a game, how many 7s can you count?

3.11 Let’s play a game, how many 7s can you count?

3.12 Let’s play a game, how many 7s can you count?

3.13 Visual perceptions

  • Our perception of data on a typical printed page is associated with several visual variables.

  • We call them aesthetics

3.14 Data visualisations

The use of data visualisations can improve:

  • Ease of understanding
  • Engagement and attention
  • Efficient communication
  • Enhanced memory retention
  • Cross-Disciplinary Understanding
  • Presentation flexibility

3.15 Data visualisations in Sport

Visualisations can be used to enhance:

  • Performance Analysis
  • In-Game Insights
  • Player Tracking and Biometrics
  • Fan Engagement
  • Scouting and Recruitment
  • Tactical Analysis
  • Predictive Analytics

3.16 From data to visualisation

3.17 Aesthetics

  • Aesthetics are quantifiable features within a graphic.
  • Aesthetics can take different forms.
  • The type of data we are working with will determine which aesthetic (or combination of aesthetics) we can best use.

3.18 Aesthetics

3.19 Task

  • Map your variable onto a relevant aesthetic

3.20 Aesthetics

  • Not all aesthetics can represent continues data
    • E.g. shape and line type cannot be used for continues data
  • Discrete data such as categorical data (ordered or unordered), text, or quantitative discrete variables (e.g. scale 1-5) can be represented by most aesthetics.

3.21 Using aesthetics

  • Data should be mapped onto aesthetics
  • Creating a scale
  • When creating a scale each unique value needs to have a unique aesthetic value
    • Reason why shape and line type cannot be used on continues data
  • Often visualisations use three scales, however it is possible to have more than 3 scales in one visualisation

3.22 Task

Design an imaginary sport visualisation with 3 different scales. Can you make one with 4 different scales?

3.23 Coordinate systems and axes

  • Visualising data requires position scales
  • Most commonly used system is Cartesian coordinate system
    • x and y coordinates
  • Each grid spacing on the x- and y-axis refers to a step in the variable unit (e.g. 1 league point).
  • X- and y-axis with different units don’t require same spacing.
    • Stretch along y-axis to emphasis on y-axis change
    • Stretch along x-axis to emphasis on x-axis change
  • If x and y use same units, spacing should be equal to not distort your message.

3.24 Coordinate systems and axes

  • Non-linear axis are not uncommon.
    • Log-transformed – often used when variables have a very different magnitude.
    • Square root – less often used but may be useful when your data contains 0’s.
  • When using log-transformation ensure you are clear when plotting the data.
  • Another commonly used coordinate system in sport data analytics is the polar system.
    • X-axis is circular

3.25 Using colour

  • Distinguish
    • i.e. make difference between groups/ categorical data clear

3.26 Using colour

  • Represent
    • i.e. use colour to show the value of continues variables

3.27 Using colour

  • Highlight
    • i.e. focus on one specific group or element within your data

3.28 Types of visualisations

  • Lots of different types of visualisations available.
  • Which one to use depends on the data you are displaying
    • Amounts
    • Distribution
    • Proportions
    • Relationships
    • Geospatial data
    • Uncertainty

3.29 Visualising amounts

  • Visualising amounts refers to visualising a value for a set of categories (e.g. Olympic medals per country)
  • Bar plots most commonly used
    • “Normal”, stacked, grouped
    • Pay attention to labelling
    • Ordering
  • Dot plots and heatmaps are alternative options

3.30 Examples of amounts

3.31 Visualising distributions

  • Visualising distributions refers to visualising the relative proportions of different variables.
  • Histogram and density plots most common
    • You need to set bins (histogram) or bandwidth (density) – arbitrary.
    • Note density plots can display data that does not exist, be aware of this
  • Alternatives empirical cumulative distribution function (ecdf) and quantile-quantile plots (q-q plots)

3.32 Empirical cumulative distribution function

  • ECDF ranks all data points based on value from small to large (or vice versa).

  • To increase readability and information the y-axis is often normalized to the maximum rank so the maximum y-value equals 1.

3.33 Empirical cumulative distribution function

3.34 Quantile-quantile plots

  • q-q plots are a useful when we want to determine to what extent the observed data points follow a given distribution.

  • q-q plots use ranks to predict where a given data point should fall if the data were distributed according to a specified reference distribution (often a normal distribution).

3.35 Visualising proportions

3.36 Visualising proportions

3.37 Visualising proportions

3.38 Visualising proportions

3.39 Visualising associations

  • Scatter plots
  • Correlation diagrams
  • Slope graphs