Comparative statistics

10 June, 2013 by David Johnstone

This post might have a complicated sounding title, but most of this is pretty straight forward. One of the nice things about having a lot of data from a lot of people is that interesting things can be done with that data. Therefore, there are now a few simple charts that show Cycling Analytics users how they compare with the rest of the users. Head over to the statistics page and take a look.

The first few charts show how the user’s resting and maximum heart rates, weight and FTP compare with those of everybody else. They look something like:

This chart shows a histogram of all users’ resting and maximum heart rates, and the solid thinner lines show where this particular user (me!) lies. In the chart legend it says that 34% of users have a lower heart rate than me and 43% have a higher resting heart rate. This probably means that 23% have the same resting heart rate (that is, it’s in the same 50–54BPM grouping), except rounding errors mean these numbers don’t always add up to exactly 100%. If you hover over the chart with your mouse (on the actual page, not this picture), it shows what proportion of the users have a resting heart rate lower than where the mouse is, and what proportion have one that’s equal to where the mouse is.

The most interesting of these charts is one that compares the power curves of all users:

The shaded segments on this chart show the ranges that most users have a peak power in. It’s drawn so that:

  • The top 5% of peak powers are above the top segment.
  • The next 5% of users have a peak power that lies in the top segment (light red).
  • Subsequent sections show where the next 10% of users’ peak powers are.
  • The bottom segment (blue) is for users with a peak power in the 5th to 10th percentile.
  • The bottom 5% of peak powers are below the bottom segment.

The dark line shows the user’s power curve. A lot more numbers are shown when hovering over the chart with a mouse.

This chart can be drawn in “relative” mode, where the user’s position, relative to other users, is charted instead of the actual power:

Ostensibly, this chart says “you can produce more power for this time period than this percentage of users”, but that’s not always quite true. It’s possible to have a value greater than 100%, since if you’re in the top or bottom 5%, the number it shows is proportional to the 5th/95th percentile value. There’s more information at the bottom of the statistics page.

Since this chart is based on ride data that users have uploaded, and power meters have a habit of occasionally making very high (but wrong) readings, the top power outputs for short time periods are skewed high. I’m fairly sure you don’t actually need to be able to produce over 1950W to get into the top 5% of users for one second power. There is a little bit of processing done to correct this (which is why it isn’t 100W higher), but I’ll look into doing more. This chart also requires users to have uploaded enough rides so that their power curve accurately represents what they’re capable of doing, so it doesn’t include data from users who have uploaded less than ten rides (this number might be changed).

Concluding remarks

Overall, this data isn’t going to help anybody win races, and some of it is barely meaningful from a performance perspective, but hopefully some people will find it as interesting as I do.

Unfortunately, all the data shown at the moment is for males. This will change when there is enough data to producing meaningful statistics for females. In the future, additional segmentation of users (possibly based on age and racing category) will be looked into, but this relies on collecting more data. At the moment, the data sets for each chart vary in size, but are all in the hundreds. There are many other things that might be shown on this page in time. Let me know if you have any good ideas.

A note on privacy: the privacy of users is very important. One of the goals with the charts on this page (and in general, any aggregated statistics released) is that they can’t be used to infer anything at all about any individual. This is why the top 5% of values on the power curve chart are not shown, and the data for the histograms are rounded off to the nearest 1% — it’s impossible to learn anything about any user, not even that they have a FTP or peak power in any particular range. Furthermore, no data has been provided to the NSA as part of the PRISM program. Get in contact if you have any concerns.

This is the blog of Cycling Analytics, which aims be the most insightful, most powerful and most user friendly tool for analysing ride data and managing training. You might be interested in creating an account, or following via Facebook or Twitter.

blog comments powered by Disqus