Last year, after an off the cuff comment in a meeting with our PR agency I set about digging around online data sources to see if they could be used to see who would win Strictly Come Dancing. Looking at search and social data in the build up to the show launching last autumn,there was a clear front runner – Kara Tointon. It also happened that she was an excellent dancer, and so as the weeks went on it was very easy to keep chopping the data in a different way and confirm that Kara would win.
This year the stakes have been raised as the challenge of examining the data to see who would go out each week was laid at our door. Manfully we accepted the challenge and have been writing a weekly post for the Guardian’s Media Monkey blog outlining what the data says and what that means for who is going to leave Strictly in any given week.
We found out very early on that predicting the loser each week is much more difficult than predicting the winner of the whole show. Who’d have thought it, hey?! The first issue we encountered was that of collecting clean data for the celebrities. Is there a more generic name out there than Alex Jones? Perhaps John Smith, but after that I’m not so sure! It has taken us until this, the half way point to be entirely happy that the data we’re looking at is actually about the right people as we’ve tweaked our queries each week.
Another issue that we have faced has been in examining the sentiment behind the buzz of celebrities. Our experience told us that volume itself was no sign of popularity, as people love to get on social networks to have a good whinge as much as they use it to declare themselves a fan – if not more! Our in house self-expressed data fiend developed a tool for sentiment analysis (well done Richard!) that does a pretty good job of sorting the positive from the negative, but there is no tool out there that is 100% accurate. In fact, even paid for tools such as Brandwatch and Meltwater aim for 70% accuracy – so we’re always at risk of being wrong. Just as an example one week someone tweeted ‘@bbcstrictly bloody hell that was absolutely fab…u…lous!! Len you are wrong #scd’ – our tool put this firmly in the negative camp, but clearly it’s not!
All that is before anybody has even danced a dance. We found that the volatility in the dancing performance by the celebrities in the bottom half of the table in the first few weeks made it very difficult to judge what would happen. As the couple that leaves is decided by a combination of the judges score for their dance and the phone vote, a novice celebrity doing the pasodoble one week and a waltz the next might be near the middle one week and then rock bottom of the judges score the next. We quickly had to factor this fluctuation into our algorithm, allocating a score for the perceived difficulty of the dance celebrities were undertaking each week.
So how have we done? We’re currently sitting at about a 50% success rate in predicting who will go out each week – so using the data is certainly more effective than randomly guessing! More than anything I think our experience doing this analysis has shown that data on its own isn’t enough. Without understanding the context and the content of the data we would have been way off the mark every week. By examining the source and taking into account the limitations of our data, we can be much more calculated in the way in which we read it. Data is one thing, but insight is another!
Blog post by Penny Anderson, Consultant at Reform