To help you become familiar with normative data, we’ve asked VALD Data Scientist Joshua Ruddy to create this easy-to-follow guide on the basics of normative data.
By the way, did you know that you can request normative data reports relevant to your organisation from VALD? Read here to learn more.
What are norms?
Normative data (norms) are information from a population of interest that establishes a baseline distribution of results for that particular population.
Norms are usually derived from a large sample that is representative of the population of interest. Opinions on what constitutes an acceptable sample size vary depending on the question that is being addressed. However, it is important to note that the smaller the sample, the more uncertainty there will be around any estimates made.
Uncertainty can be calculated and expressed using statistics such as 95% confidence intervals. Again, whilst it depends on the question you want to answer, it is a good idea to make sure your sample is relatively homogenous. For example, Figure 1 below shows the concentric peak power results from more than 35,000 countermovement jump tests conducted in NFL (American football) athletes:
However, we know that different positions in American football typically require different anthropometric and athletic attributes. Accordingly, the distribution of results from e.g., Defensive Line athletes versus Wide Receiver athletes could differ significantly:
Evidentally, looking at NFL athletes as a whole may not be the most informative approach. However, it should also be noted that subsetting the data and looking at individual positional groups will reduce the size of the sample.
Considerations before deriving norms
Before deriving norms, it is a good idea to check your data for errors. The following data is from more than 45,000 Nordic tests conducted in European football. We will focus on a two-limb average for maximum Nordic force and the between-limb asymmetry in maximum Nordic force. Looking at a summary of the data, we can see that the minimum for our Nordic force variable is 31 N and the maximum is 2,686 N:
Table 1: Summary statistics for results from over 45,000 Nordic tests conducted in European football.
|##||Min.||1st Quarter||Median||Mean||3rd Quarter||Max.|
Using some domain expertise, we know these limits aren’t feasible. This suggests we have some erroneous results and should clean our data before proceeding. One method to ensure that the norms we derive are as representative of our population as possible, is to identify and remove outliers from our sample.
There are many methods for identifying outliers, but one of the most common is to use the interquartile range (IQR):
The IQR is equal to the third quartile (Q3) minus the first quartile (Q1).
Using the IQR method, any results below Q1 – (IQR × 1.5) or any results above Q3 + (IQR × 1.5) are considered outliers and should be removed. Once this has been done, we can recheck our summary and make sure our data look more appropriate:
Table 2: Revised summary statistics after removing outliers.
|##||Min.||1st quarter||Median||Mean||3rd quarter||max.|
Checking the distribution of your data
When working with continuous data, it is important to be aware of the distribution type of your sample. There are formal methods for assessing distributions, but in this case we are able to visually assess our sample’s distribution using a histogram (see Figure 3).
From this histogram we can clearly see that our Nordic force sample is normally distributed. When looking at between-limb asymmetry, we can use a negative value to denote an asymmetry in favour of the left limb and a positive value to denote an asymmetry in favour of the right limb. In this case, we might expect our between-limb asymmetry sample to also be normally distributed:
However, when considering norms for between-limb asymmetry, we shouldn’t be concerned with whether the asymmetry favoured the left or right limb – the focus should be on the absolute value. If we convert all our negative results to positive results and remove any reference to limb, we can see that our sample’s distribution has changed:
We now have a positively skewed distribution, which is what we should expect for between-limb asymmetry data – more observations closer to 0%, and less frequent occurrences of larger asymmetries.
What descriptive statistics should you use?
Once we’re aware of the distribution of our data, we can use this information to decide what statistics to use to describe our data. Whilst it ultimately depends on the question you want to answer, a general rule of thumb is that the mean is appropriate for a normal distribution (e.g. our Nordic force data) but may not be appropriate for a skewed distribution (e.g. our between-limb asymmetry data). The median is considered to be a more robust measure of central tendency that is appropriate in the case of a skewed distribution or outliers. Figure 6 shows a comparison of the mean and median of an arbitrary variable with a normal distribution:
In cases of a normal distribution, the median will closely reflect the mean. In cases of a skewed distribution, the median can provide a more appropriate measure of central tendency and is less affected by outliers than the mean. Given this, it can be a good idea to use the median (and the interquartile range) to describe your data.
What are density curves?
Whilst histograms are useful to show the frequency of results across the range of your data, density curves provide an idealised representation of a sample’s distribution. Figure 7 shows how the density curve compares to the histogram for our Nordic force data:
When looking at a density curve, the height of the line is often misinterpreted as the probability of that particular result occurring. In actuality, it is the area under the density curve that provides us with the probability of a result falling between a particular range. For example, the total area under the density curve will always equal 1, indicating that there is a 100% probability that a result will fall between the observed minimum and maximum. The grey shaded area on Figure 8 shows the area under the curve between 300 N and 320 N:
Through eyeballing the density curve, we could probably guess that the grey shaded area takes up less than 10% of the total area under the curve. However, if we want to properly estimate the probability of a result falling between these two values, we can do so by identifying the total number of results between 300 N and 320 N and dividing this result by the total number of observations in our sample. In this case, the probability of a result falling between the two values is equal to 0.05, or 5%. Whilst this level of specificity can be useful, often the height of the curve alone can tell us whether a particular result is more or less likely. Given this, you may sometimes see density curves without a y-axis. It can also be common to see density curves with the first quartile, second quartile (also known as the median) and the third quartile displayed:
The first, second and third quartiles are equal to the 25th, 50th and 75th percentiles respectively. These lines can be used to make some quick and easy inferences about our sample. We identify that 25% of our data fall below the first quartile (at 349 N), 25% our data fall above the third quartile (at 454 N) and 50% of data fall between these two points.
Other methods of visualising a sample
Box and whisker plot:
Whilst you may or may not be familiar with density curves, most people will have likely come across another method for visualising the distribution of a sample – the box and whisker plot:
The above figure highlights that the density curve is simply an extension of the box and whisker plot. In addition to showing the minimum, the maximum and the key quartiles, the height of the density curve also gives us an indication of the likelihood of a result occurring.
Another way to visualise the distribution of a sample is the violin plot:
The violin plot is simply a double-sided density curve. It is common to use a point and error bars to indicate the median and IQR. Key information from a sample can also be easily summarised in a table. Key percentiles, such as the 1st (the minimum), the 25th (second quartile), the 50th (the median), the 75th (the third quartile) and the 99th (the maximum), can provide us with a basic idea of a sample’s distribution. Table 3 displays the key percentiles for our Nordic force sample:
Table 3: The key percentiles for the European football Nordic dataset.
Normative data applications
Norms can be utilised in a number of different ways. One example is using key quantiles from a normative dataset as thresholds to identify individuals that may require attention. In the figure below, we have Nordic force results for 10 athletes:
The horizontal line at 405 N indicates the median from our European football sample, with the blue shaded area indicating the IQR. Athletes that fall below the second quartile, or the 25th percentile, are easily identified in orange. In the context of Nordic force, we might use this information to prescribe additional eccentric hamstring strengthening exercises to our athletes that fall below the second quartile threshold. In a clinical setting, we might have a client that is recovering from knee surgery after injuring themselves playing football. After measuring their eccentric knee flexor strength, a common question might be, “Is my score good or bad?” Rather than suggest whether a result is good or bad, we can provide our client with context and show them where they sit relative to our European football cohort:
Stating the percentile into which our client’s result falls can provide our client with sport-specific context and feedback on their physical status and progression.
How can I request normative data reports from VALD?
You can request normative data reports through your Client Success Manager.
Let us know which populations, sports, disciplines and systems you work with and we can provide you with a tailored package.
If you are unsure about how to contact your Client Success Manager, please email firstname.lastname@example.org
How can I see my normative data reports on VALD Hub?
You can access your requested normative data reports on VALD Hub on the Dashboard. See here for a quick guide.