After each weekend, I pulled race results for the local area races and scan them to see which Charlotte Runner Club members were out racing. This task came my earlier in the summer and I am starting to get the hang of it. Actually, massaging data is something that I do on a regular basis at work so this is really just an extension.
Being that I like an open door policy, I thought I would share a little bit about the process so others knew what each it is like.
First order of business, I have a list of race timing services websites that I have collected. For example, Run for Your Life post their results online. Once they do, I pull them down to my local PC. There I transform this information into something that my program can then understand and scan for CRC Members.
The output from this scan is a list of CRC members that ran this particular race, where they finished, and what was their finish time.
This sounds fairly easily doesn't it?
Well, yes, and no.
There are a couple of different parts to this weekly task.
The biggest challenge is pulling down the data. Each race timing service tends to have their own way of doing things. For example, Run for Your Life uses Active.com. Queen City timing uses what appears to be an in-house results engine. Lee Timing uses still another format.
Basically, what I am saying is that each timing service has their own formats and these formats can range from HTML to straight text to a pdf.
This is the point where sites like "Athlinks.com" should get a lot of credit. I deal with a few timing services. They probably deal with thousands and each probably with their own format.
So I have the data down and cleaned up so I can now scan it for CRC members. And what I mean by cleaned up, is there is no extra spaces, tabs, and comma etc in the racer's names.
The scanning process is straight forward. We pull a name from the member list and scan the race results to see if we find this particular name. When we do, we pull this user's statistics out and put in another file.
This part has its own set of "gotchas" here. Upper and lower letters doesn't matter because I can correct for it. But if someone is registers for a race as "Mike" but puts in the CRC member list "Michael", his results don't get included. This is has happened to Mike Kahn at least once that I remember. I hope Bobby doesn't mind but I will use him as another example. Sometimes Bobby registers as "Bobby Aswell" and my program finds him. Other times, he registers at "Bobby Aswell Jr" so my program skips right over him. For a while, I think he was alternating it back and forth so I just added both variations in my member list. This is an easy fix as long as I remember to do it to each club list that I get.
The other "gotchas" that gets me on a regular basis is new club members. Since I don't know when someone new joins the club, I periodically go out and pull a new club list. Other times, I get an email from Caitlin or Aaron letting me know that I missed someone. This means I need to pull a new list.
The last "gotchas" that I have seen is when someone shows up in the results but didn't run the race. Some people just have common names like "John Smith". If this other "John Smith" runs a race, then our club "John Smith" gets credit for it.
At the end of the day there is only so much anyone can do. If I ran a race or seen that someone ran a race on Facebook, then I will do a little fact check to see if they showed in the results. But our club has nearly 500 members and I don't know all of them personally nor am I Facebook friends with all of them. I'd also say 99% of the time, my programs scan successfully.
But if I happen to miss someone, always please let me know. There could be any number of reasons why, but we do our best to make sure their name gets pulled.
If we know about the local race, we should be able to pull a member's results and include them into the news letter.
Thoughts from the Cool Down Runner