Course Rating System: Making Sense Out of XC Time Variations

MileSplit has a huge database of performances nationwide that numbers in the tens of millions.  We love to utilize this data to generate various rankings, virtual meets, and scoring systems.  Sometimes we may do things a little more complex like only using certain meets, but usually we just list these by time.
Most people understand that this is just a guide and really enjoy it and take it for what it is.  Typically, we receive very little negative feedback for track & field since, while there are weather and track surface variables, in that sport things are more comparable.  However, especially in certain states, we receive a lot of scorn for even listing top cross country times!
This always baffles me a bit.  Of course it should not be taken as gospel!  It just is what it is and should be used for analysis and other things should be taken into account.  I never understand why some people insist on ridiculing us for publishing these time-based lists when we never claim it to be anything but what it is.  Perhaps they just don't understand what it's supposed to be.
My attitude has always been to take it for what it's worth, but I think generally people run a fast course at some point in the year… so in the end most have their chance at a "PR Course" and it ends up actually being a decent indicator toward season's end.  Maybe that's just because I'm from Florida where we have more PR or "pancake courses" as you Yankees like to call them.  But back on point…
We wanted to find another potential objective ranking tool to leverage our performance database and try to give cross country performances more quantifiable value.  So we are in the process of deriving a new cross country scoring system, the first step of which is assigning values to courses based on their difficulty.
Credit Where Credit is Due
While our system is not using the same methodology that he uses, I want to give credit to creator Bill Meylan.  He is kind of the pioneer in this area for cross country and derived a speed rating system for some New York courses (as well as some national analysis) that provides some of the inspiration and basis for some of the concepts.  Bill and I spoke and compared notes on our two systems.  He was most helpful!
Additionally Elder Research, a data mining and predictive analytics firm outside of D.C., consulted with us on the formulas and science presented here and provided some additional guidance from their years of experience in the field.
Why do we need to rate the courses?
The answer to this is pretty prima facie, but if we want to compare times over courses of various difficulties (and even small distance variations) then we need to have some way to standardize those times to "level them out". 
No rating system can be perfect, nothing can replace head-to-head, and varying weather conditions on any given day can totally throw a wrench into the works.  However, the goal is to come up with a rating system that works when all things are equal. 
What should the rating look like?
Many coaches, statisticians, wanna-be statisticians, crazy fans, and officially or self-proclaimed cross country pollsters for years have derived their own systems for doing this.  Typically it is done by subjectively assigning a plus or minus adjustment to times on courses that are considered fast or slow.  So they will determine they think Estero Park is 45 seconds slow and subtract that amount from all times to do their manual analysis.
This is essentially what we are looking to do.  And people can easily understand this type of add/subtract system.
While this is an easy system because you can pretty much do it in your head, on a technical level I do not like this type of adjustment any more than I like the "add 45 seconds" to 3000 meters to get a 3200 meter time.  While a standard second amount adjustment works on a certain time range, for another time range it is way off.  So what we will have (at least internally, if not published) will be a factor on each course.  So a given course might be 1.025 instead of +30 seconds, but we will publish the course ratings in a form in plus or minus seconds relative to a 19:00 time (which is roughly median) so that it is easier to understand without a calculator!
How is the rating being calculated?
We have written a program to analyze our performance data on a given course within certain pre-defined parameters.  It compares these times to the times run by that athlete on other courses.  My basic premise is that, when you have a large enough sample set, you can derive how fast or slow a course is by comparing the times run there to what the athlete runs elsewhere.  
Put simply, if everyone is running a season best time on that course then it is most likely a fast course! And if everyone is running 45 seconds slower than their season best time then it is probably a slow course. 
The first step is to gather a data set of all of the performances run (within our defined parameters) on that course.  Then we want to take that set of data and break it down into smaller samples, throwing out the outliers--that is, the extremes on either side.
So the computer program will take one sample at the 10th percentile of times on that course, another at the 40th percentile, and a final sample at the 70th percentile.  This single sample is then well-representative of the whole with not too much of any extreme.
After this sample is compiled, we do basically what they do in court to a pool of jurors:  we try to find a reason to throw them out.  So if the athlete does not have at least two times run that year on other courses--throw 'em out.  If they ran really slow compared to what they normally run (perhaps they trained through the meet) then we throw them out.  Or if they just for some reason had a fluke fast day relative to their previous times, we throw that out as well. 
Now we have our sample that we will actually use, filtered only for what we consider good and accurate for comparison.
But we have one more step.  It can be easily observed that athletes typically run faster at the end of the year than the beginning.  The reasons are obvious to anyone serious about cross country:  better fitness level, typically better weather conditions (cooler), and smart athletes and coaches work on a late season peak.  Because of this, without some adjustment early season meets look very slow.  This could make the courses look tougher than they actually are.
So the program will calculate a "peak performance penalty" that is subtracted from the times in early season meets to adjust those times downward to what they would likely be if ran during the end of season peak.  While this may not be accurate on an anecdotal basis, after much studying this improvement across thousands of athletes and lots of tweaking the formula, it seems pretty close in the aggregate.
This penalty is calculated based on the date of the meet (the earlier in the season the more penalty) and how fast the time is.  The latter assumes that faster athletes do not improve as much over the course of the season compared with slower athletes since they are likely already at a higher fitness level.
After all of those adjustments, we are finally ready to calculate the course rating.  It is currently based on the following factors in various weights (more may be added and/or the weighting adjusted in future revisions):
- Difference of adjusted time and season best time
- Difference of adjusted time and second best time
- Difference of raw time and season best
- Difference of adjusted time and season average time
- Difference of raw time and season average time
For clarification here are the definitions of the above terms:
- Season best time = Best time this season that was not at this course.
- Second best time = 2nd best time this season that was not at this course.
- Raw time = Actual time run at this course.
- Adjusted time = Time from this meet, minus the adjustment for how early or late in the season it was.
Where do we go from here?
The plan is to keep iterating this formula and this experiment.  There are a lot of ways to analyze data and you can derive some really amazing things from it--especially with a large enough data set.  So the intent is to do some additional studies on course rating but also other analysis.  We're just scratching the surface of what we can do with this and what we can learn about performance and athletic development.
Once this course formula gets a few more minor revisions the plan is to implement a scoring system that will rate cross country athletes, teams, and performances on a new type of measurement scale.  This scale will take into account the course variations, consistency, recency of the performances, and other factors.  We hope this will help with creating a new way to rank athletes and teams that will compliment (not replace) other calculations and rankings systems out there (both objective and subjective ones).  It will be in the form of a score instead of time and will be easy to understand and recognize even for those new to the sport.
We look forward to your feedback and comments on this exciting new project!
How to view these ratings
Keep in mind that this is still under development and that these ratings will only be as good as the data available to us.  In many states our data set is immense, but in others it may not be as complete.  Just keep that in mind!  All the more reason to help support the growth of MileSplit in your state!!!  
Here are some links to the XC venues in each state: