General Discussion Triathlon Talk » I'm a data junkie - interesting none the less Rss Feed  
Moderators: k9car363, alicefoeller Reply
 
 
of 3
 
 
2009-10-20 3:00 PM

User image

Extreme Veteran
541
50025
Colorado
Subject: I'm a data junkie - interesting none the less
I admit it - I'm a major data junkie. What can I say? Anywho...

Before my last triathlon in September (Fort Collins Club sprint), I was thinking about my finishing times and this naturally led me to consider how I might place in the event. I then started wondering about the relationship between placings in the various events and overall place, so I downloaded the results from the past couple of years and started crunching data.

In the figure below, I have three graphs. 1) Swim Rank vs Overall Rank, 2) Bike Rank vs Overall Rank, and 3) Run Rank vs Overall Rank. I think it is very interesting that the Swim Rank has very little relationship to the Overall Rank. (By the way, my dot is red - I placed 80th overall.)



I can hear you now, "Sure, but this is for a sprint with a 450 meter swim." OK, I also looked at the 5430 Long Course HIM in Boulder, CO from this year.



Again, very little relationship between the Swim Rank and Overall Rank. So, the consistent advice I've heard - Don't sweat the swim, just do your best and then work hard on the bike and run - has data to support it. I've done this same analysis for about a dozen races - sprint, olympic, and HIM - and I consistently get the same results. At least I thought it was interesting. 8^)

-Kirk


Edited by KirkD 2009-10-20 3:11 PM


2009-10-20 3:25 PM
in reply to: #2469487

User image

Expert
1087
1000252525
Portland
Subject: RE: I'm a data junkie - interesting none the less
As a fellow data junkie, this is very interesting!  It would be very interesting to do an analysis of professional only athletes, to see how much the swim compares to their overall finish place.
2009-10-20 3:29 PM
in reply to: #2469551

User image

Cycling Guru
15134
50005000500010025
Fulton, MD
Subject: RE: I'm a data junkie - interesting none the less
I've faked my way through sprints, olympics and halfs with swimming.  And while you cannot win the race on the swim, you can certainly lose it!

I've missed out on overall and podium spots because I swim for sh-t.  But I'm a strong cyclist and runner.

Triathlon is one sport.  Swimming is just a part of it that determines how well you do overall.
2009-10-20 3:35 PM
in reply to: #2469487

User image

Master
1736
100050010010025
Midcoast Maine
Subject: RE: I'm a data junkie - interesting none the less
As a mathematician - this is awesome! Now I can stop sweating the swim. ;-)

Thanks for doing this. (Although, like the previous poster, I would be interested in how it stacks up for the elite/pros)
2009-10-20 3:36 PM
in reply to: #2469487

User image

Elite
5316
5000100100100
Alturas, California
Subject: RE: I'm a data junkie - interesting none the less

Yes, if there is a lot of junk in the data you can try subgroups of data.  One might expect the bottom 50% to have a great deal of variability in which sport is their strongest, but as move up from there to the 60th, 70th, 80, 90th, 5th and 1st percentile you find that there is nonlinear relationship across the overall finish progressions. 

In other words, folks who do really well overall are good swimmers and perhaps more ballenced across all three events than say BOP or MOP athletes. 

It would be interesting to see how much varience is accounted for contributing to the overall score.  A multivarite annalysis of all IM swim, bike and run times for the year would show a decent breakdown of the unique contribution of each factor towards overall time. 

Of course it was this same discussion that started the creation of the first IM competition. 

2009-10-20 3:36 PM
in reply to: #2469487

User image

Elite
2645
200050010025
Phoenix, AZ
Subject: RE: I'm a data junkie - interesting none the less
As an data-driven UI engineer I'm impressed with your work.

As a swimmer, I'm beyond pissed.

Edit:
Which xy plot is the run in your long course data? There are 2 labeled "bike."


Edited by Slidell4life 2009-10-20 3:39 PM


2009-10-20 4:14 PM
in reply to: #2469487

User image

Extreme Veteran
430
10010010010025
Madison, WI
Subject: RE: I'm a data junkie - interesting none the less
Let's get some least squares regression analysis up in this biznatch!

Well I suppose it would help if I actually looked at the data before I asked for something that was already there. Great work!

Edited by tetchypoo 2009-10-20 4:15 PM
2009-10-20 4:16 PM
in reply to: #2469487

User image

Extreme Veteran
541
50025
Colorado
Subject: RE: I'm a data junkie - interesting none the less
Excellent! I'm glad to see that there are many more data junkies out there. I shouldn't be surprised in this sport, should I? I also thought that I would add in that I did this initially to develop a strategy. The question was, if I have to really push myself, how do I prioritize the disciplines? My conclusion is to swim fast but save it up for the bike and run, and that the bike and run are about equal in importance. It worked for me to let me finish 80/244 in my second ever tri. 8^)

Slidell4life: Ooops. I knew I'd end up with a label problem. The figures follow the same format in both sets, so the bottom-middle figure is the run in my 5430 figure. I apologize for the label mix up. I shouldn't be surprised that a UI engineer would catch it. 8^)

Daremo: Agreed, the results are certainly dependent on all three. To your point, notice in the swim graph for 5430 (top left) that overall there isn't much relationship, but there is a tight cluster at the bottom left of the graph suggesting that really fast swimmers tend to cluster as fast overall - and another cluster at the upper right - really slow swimmers tend to cluster slow overall. It isn't a strong relationship, but certainly a trend apparent in the data.

techypoo - Actually, I was thinking about seeing if I could develop a model for the overall place using the individual discipline placements and look at the coefficients for each. The relationships between the coefficients should be the same as we're seeing here, so I haven't bothered yet. I was also going to do one based on each disciplines finishing times and then plug in my times to see how close the prediction was.

menglo, itsallrelative_Maine, and Baowolf: Below are results for the 5430 Long Course for Pros only, with men shown as blue squares and women as red diamonds. There is much less data - only 8 men and 15 women. But, I think the trends hold.

For the swim, there is generally just a lot of scatter. The women's data tends to swamp out the men's so I broke men out independently in the right figure. There is a relationship there, but it isn't very strong.




For the bike, we see what I think is a closer relationship. The slower women tend to scatter quite a bit, but for those under 90th place, the correlation is quite good. The men looks very bad - 0.2 - but there is one bad outlier in that data. When I take him out (right figure), the relationship is very strong. I felt I could do that given the small data sample in this case - mathematicians, please correct me if I'm wrong. BTW, the outlier is Simon Thompson who came in 30th while the other 7 men made up the top 7 bike rankings. I wonder if he had a flat.



The run is similar to the bike. One outlier in the men's group - Michael Cupitt who came in 40th on the run. The other 8 men were 1, 2, 3, 4, 5, 12, and 17.




I'm so glad to have found others that thought this was interesting. I have been debating posting it for about a month now think it was entirely too geeky. OK, it's still awfully geeky.

-Kirk



Edited by KirkD 2009-10-20 4:19 PM
2009-10-20 4:18 PM
in reply to: #2469487

User image

Extreme Veteran
430
10010010010025
Madison, WI
Subject: RE: I'm a data junkie - interesting none the less
What's really interesting about that is that the R2 is more significant for the HIM. What I'm guessing that means is that if you plotted that at an IM distance, you'd probably see better correlation. Still, bike and run are much better fits than swim.
2009-10-20 4:21 PM
in reply to: #2469580

User image

Elite
4048
2000200025
Gilbert, Az.
Subject: RE: I'm a data junkie - interesting none the less
Slidell4life - 2009-10-20 1:36 PM

As a swimmer, I'm beyond pissed.



x2 (Except I'm more exasperated/resigned). There are going to be a lot of strong bikers/runners that can make up some for a lousy swim, but I would almost guarantee that the top 5 in each age group are no worse than top 10 in the swim in their AG for sprint/oly. You have more chance of outliers the longer the distance.

This looks like an attempt to create a personal excuse of "why I don't need to swim more to be competitive fool myself into thinking I'm competitive"

Oh, and I don't find a Kirk in the triathlon results...

John

Edited by tkd.teacher 2009-10-20 4:25 PM
2009-10-20 4:24 PM
in reply to: #2469678

User image

Extreme Veteran
430
10010010010025
Madison, WI
Subject: RE: I'm a data junkie - interesting none the less
tkd.teacher - 2009-10-20 4:21 PM

This looks like an attempt to create a personal excuse of "why I don't need to swim more to be competitive fool myself into thinking I'm competitive"

John
If I may...I think it speaks more to the disproportion of swimming as compared to bike and run, with respect to distance.

Personally, it makes me feel much better about skipping my swim workout tonight, due to being too tired (read:lazy).

Ugh, admitting that makes it much worse. Especially in front of you guys


2009-10-20 4:25 PM
in reply to: #2469487

User image

Champion
5781
5000500100100252525
Northridge, California
Subject: RE: I'm a data junkie - interesting none the less

Someone else here within the last few months posted that they had done a similar analysis and come up with what sounded like the same basic results:  Low correlation between overall finish and swim finish, strong correlation with bike and run.  Don't remember who it was that posted that, but he mentioned the run turning out to be the strongest predictor of the three.  Not true with these data, but pretty darn close between bike and run.  Interesting stuff.

Macca's race report from Kona (which was linked to from a thread here recently) gave what sounded to me like an example of how you can "lose the race on the swim", even though the swim is the least important of the three disciplines in relationship to final standings.  He came out of the water farther back than he expected and it sounds like he may have wasted energy on the bike getting back into the position he wanted to be in which may in turn have contributed to a tough outing on the run.

2009-10-20 4:25 PM
in reply to: #2469487

User image

Expert
1087
1000252525
Portland
Subject: RE: I'm a data junkie - interesting none the less
Geeky, yes, interesting, very!  Do what you love because it makes you happy!  Looking at those plots, it seems like this trend even holds true for the pros (for the most part). I think one big problem with looking at this data (after I posted I thought of this) is that they are pushing their bodies to the limit, and with such a "small" sample size its really hard to find as much of a trend as we see in the just the masses.  With them pushing so hard, you have bodies shutting down and DNFing.  This is very interesting and I think it would be an interesting thesis for someone to play with for a PhD.

If I had to guess, this correlation comes from how difficult swimming is compared to biking and running.  When you bike and run, if you just move your feet faster, 99% of the time you will move faster.  With swimming that is not the case, it is SO technique based, and some people have a hard time trying to find that technique.

I don't think that this means that you can skip swim workouts, in fact, the better shape you're in for the swim, the better off you'll be for the bike and consequently, the run.

Edited by menglo 2009-10-20 4:29 PM
2009-10-20 4:26 PM
in reply to: #2469678

User image

Extreme Veteran
541
50025
Colorado
Subject: RE: I'm a data junkie - interesting none the less
tkd.teacher - No, no, no - no excuses here. I've been going to my master's swim class faithfully twice a week for the past 6 months, and my swim has improved a lot. I'm currently working with the instructor to break my swim down bit by bit, and then reconstruct it, hopefully with fewer issues.

8^P


And, in the first analysis of the sprint distance, I'm the red square in the graphs. Hmm....funny. I'm the red square. That seems about right. I didn't run the 5430. I thought the results for the sprint distance would be badly skewed because of all the beginners (aka "noise") in the data. I was surprised to see it generally hold on the HIM.





Edited by KirkD 2009-10-20 4:29 PM
2009-10-20 4:32 PM
in reply to: #2469690

User image

Extreme Veteran
541
50025
Colorado
Subject: RE: I'm a data junkie - interesting none the less
menglo - 2009-10-20 3:25 PM

Geeky, yes, interesting, very!  Do what you love because it makes you happy!  Looking at those plots, it seems like this trend even holds true for the pros (for the most part). I think one big problem with looking at this data (after I posted I thought of this) is that they are pushing their bodies to the limit, and with such a "small" sample size its really hard to find as much of a trend as we see in the just the masses.  With them pushing so hard, you have bodies shutting down and DNFing.  This is very interesting and I think it would be an interesting thesis for someone to play with for a PhD.


I'm with you 100% on the sample size for pros. I may see if I can collect the Pros' IM data for the entire season and do the same analysis. If I'm feeling up to it, I may break it down further into subgroups as one poster suggested.

Can I get an honorary PhD for this work? 8^)
2009-10-20 4:42 PM
in reply to: #2469678

User image

Champion
5781
5000500100100252525
Northridge, California
Subject: RE: I'm a data junkie - interesting none the less

tkd.teacher - 2009-10-20 2:21 PM

Oh, and I don't find a Kirk in the triathlon results...

John


Maybe a search by the last name on KirkD's BT profile page might turn something up for that race...



Edited by tcovert 2009-10-20 4:44 PM


2009-10-20 4:43 PM
in reply to: #2469487

User image

Pro
5892
5000500100100100252525
, New Hampshire
Subject: RE: I'm a data junkie - interesting none the less
If you have buoyancy of granite, should you stop sweating the swim then as well?
2009-10-20 4:49 PM
in reply to: #2469712

User image

Extreme Veteran
541
50025
Colorado
Subject: RE: I'm a data junkie - interesting none the less
tcovert - 2009-10-20 3:42 PM

tkd.teacher - 2009-10-20 2:21 PM

Oh, and I don't find a Kirk in the triathlon results...

John


Maybe a search by the last name on KirkD's BT profile page might turn something up for that race...




Ah - different question than my "red square" answered. Kirk is my middle name, so my tri results are under "Robert" my first name. Look for a Robert that came in 80th overall. Check my race log, too.

2009-10-20 5:49 PM
in reply to: #2469487

Expert
626
50010025
Subject: RE: I'm a data junkie - interesting none the less

Beyond geeky but very interesting

2009-10-20 7:02 PM
in reply to: #2469487

User image

Elite
5316
5000100100100
Alturas, California
Subject: RE: I'm a data junkie - interesting none the less
Another more swimmer friendly way to look at the data would be to weight the scores by the amount of time it takes the mean/average person to finish them.  So if the average swim time for an IM were 1:30:00 and average bike time where 7:00:00 and the average run time were 5:30 then they would not be equally weighted measures predicting total outcome.  Practically speaking the bike has more minutes so it has a heavier weight on the results.  If we equalized the weight and then reanalized the numbers it would be interesting to see what we find. 

Total time for my random numbers= 14:00:00. S=10.7%, B=50% and R=39.3% of the total time of the IM.   

If we equalize the weighted contribution of s/b/r does that make swim correlate better?  As it is swim gets a raw deal as it is only allowed 1/4 to 1/5 of the variance of the other two thus limiting its impact on the whole. 
2009-10-20 7:31 PM
in reply to: #2469487

User image

Coach
10487
50005000100100100100252525
Boston, MA
Subject: RE: I'm a data junkie - interesting none the less
Could the data just be representative of how in general triathletes are terrible swimmers given the little emphasis they place on swim training?

My guess is that if you would do the same analysis for pro racing (i.e. ITU racing) a different picture would emerge.


2009-10-20 7:45 PM
in reply to: #2469487

User image

Master
1584
1000500252525
Fulton, MD
Subject: RE: I'm a data junkie - interesting none the less
I've haven't read all of the replies, but I just wanted to say Dude, you rock hard.  That is all.  :-)
2009-10-20 11:18 PM
in reply to: #2469956

User image

Extreme Veteran
541
50025
Colorado
Subject: RE: I'm a data junkie - interesting none the less
Baowolf - 2009-10-20 6:02 PM

Another more swimmer friendly way to look at the data would be to weight the scores by the amount of time it takes the mean/average person to finish them. 


Interesting idea. If I have time I may give that a try. At this point I was only looking at discipline ranking vs overall ranking to see what the relationship was for placement. The take home from what I've done is that placing low in the swim (MOP, probably) does not preclude a good placement overall. I like the time spent idea with differential weighting.

Jorge said:

Could the data just be representative of how in general triathletes are terrible swimmers given the little emphasis they place on swim training?



I'm sure you're right. I haven't done the analysis for pros yet with enough data to really be trustworthy. I may see if I can collect the entire season's IM Pro data and repeat the process for that set. There would be repeated data of sorts - same athlete in different races - but I'm not sure that will matter. The conditions are still somewhat independent due to different courses and different mix of competitors. Anybody have thoughts on that??

jcnipper said:

I've haven't read all of the replies, but I just wanted to say Dude, you rock hard.


Why, thanks. 8^) If I spent as much time training as I spend thinking about data, I WOULD rock.

-Kirk



2009-10-21 12:57 AM
in reply to: #2469956

User image

Champion
5781
5000500100100252525
Northridge, California
Subject: RE: I'm a data junkie - interesting none the less
Baowolf - 2009-10-20 5:02 PM Another more swimmer friendly way to look at the data would be to weight the scores by the amount of time it takes the mean/average person to finish them.  So if the average swim time for an IM were 1:30:00 and average bike time where 7:00:00 and the average run time were 5:30 then they would not be equally weighted measures predicting total outcome.  Practically speaking the bike has more minutes so it has a heavier weight on the results.  If we equalized the weight and then reanalized the numbers it would be interesting to see what we find. 

Total time for my random numbers= 14:00:00. S=10.7%, B=50% and R=39.3% of the total time of the IM.   

If we equalize the weighted contribution of s/b/r does that make swim correlate better?  As it is swim gets a raw deal as it is only allowed 1/4 to 1/5 of the variance of the other two thus limiting its impact on the whole. 


Questionable assumption there.

Specifically (and I think another BTer addressed this in passing in another thread a few months ago) is that the spread of finishing times in bike vs. run is not the same (different distribution curve...flatter for the run...e.g., slowest bike time in IM is about 1.89x the fastest, slowest run is 2.65x the fastest), which accounts for why you can argue for a fairly equitable correlation betw. bike finish and run finish to overall placement.
2009-10-21 1:14 AM
in reply to: #2469487

Master
2460
20001001001001002525
Subject: RE: I'm a data junkie - interesting none the less
I like the data. Pretty cool - I'm familiar with these coming from a science background.My thoughts though - - Better Bike/run correlation with race performance likely has a lot to do with the extensive x-training effects on the legs that overlap between the two sports. Hence, if you're a strong cyclist, odds are good that you'll be a respectable runner, and vice-versa. This will lead to a greater correlation of the lower-body related sports on the overall race result compared with the swim. - This bike/run phenomena can definitely be exploited by poor swimmers to climb the standings, as strong swimmers can only excel on one leg of the race, whereas the x-over allows for a predominantly strong runner or cyclist to still do significant damage on the other run/bike discipline, even with very limited training. - Despite this bike/run bias, the swim is critically important if you want to compete for an AG spot. You don't have to win it, but you have to do well on it in Oly/sprints. Even front of MOP likely won't be good enough. AG winners in SoCal are consistently in the top 10% AG, if not top 5% on the swim. Especially for Oly/Sprints, where the swim is a significant portion of the race.
New Thread
General Discussion Triathlon Talk » I'm a data junkie - interesting none the less Rss Feed  
 
 
of 3