Bad Data, Perfect Content

By Marc Zao-Sanders

6 minute read


A learning manager is stressed out. She’s inherited two LMSs, three content libraries and a competency framework and apparently no one likes any of them. The data is a mess, period. Her director says she needs to find a six-figure cost savings. That same director has issued a new ‘robust’ performance indicator for her team which has not been divulged yet. And there’s another new idea: appraisals will need to be linked to learning. Her learning strategy proposal is due in three weeks. At the same time, she has plenty of her own ideas to make learning happen happily and productively, but she has doubts that she’ll have the time and space to implement them now.

Most corporate learning situations are pretty messy and difficult. However it might feel, data will help you make sense of it all.

I spent the first two years of my career working with data. I was at a strategy consultancy whose clients, even back in 2001-3, had many millions of lines of product and sales data. My job then was to fill it, clean it, analyse it and to try and draw useful insights from it. Since that time data has assumed an ever-more important role in business. Faster processing, wider bandwidth, more sophisticated technology and the resultant regulation, including GDPR, mean there’s just so much of it and it’s critical for everyone in business - and certainly those in learning - to get a proper handle on it.

Of course, there are myriad uses of data in any given field and learning is no exception. In L&D we use it to make buying decisions, find out who our star performers and departments are, calculate ROI, provide context for appraisals and much more. We focus here on a single problem facing many learning professionals right now: how do I use data to choose the right learning content for my company?

1. Primer: types of data

There are many excellent statistical resources online that will list all the different data types and their distinctions. I’ll just point out a few that are both easy to understand and relevant to the learning data you may well be familiar with. For the data-initiated, skip to the next section.

Qualitative vs Quantitative. Quantitative data are those which are easily quantified like how long the playback time for a video. Qualitative data are those less easily measured but which can be observed and evaluated subjectively like quality, professionalism or light-heartedness (of an asset).

Structured vs unstructured. Structured data is data which takes one of a fixed number of predetermined values, eg video/MOOC/infographic/podcast/classroom/article. These will often be shipped with the product you buy or they can be user generated (eg from a survey) if the answers are limited to a menu of fixed options (like the list above). Unstructured data tends to come from free-text like the entries from a comments field. Structured data are easier to analyse but sometimes shallow in what insights they bring. A special case of structured data is binary data, where the data can only possibly take one of two forms eg Yes/No, Complete/Incomplete, Long/Short. So binary data are even easier to analyse but of course even more limited in the insights they offer.

Granular. Granular means detailed in this context. If you tag an asset ‘Project Management’ it’s not that granular. If you tag it ‘Agile Project Management’, it’s more granular. If you tag it ‘Agile Architecture, Design, & Collaboration 3.0’ it’s more granular still. What’s important is choosing just the right level of granularity: not too detailed, but detailed enough.

Metadata. Metadata is data about the data. So for example, a learning asset is a data point itself and data about that learning asset (language, length, difficulty level, usage, etc) is metadata.

2. Basic Analytical Skills

If you’re going to be a data scientist or data analyst you need a lot of skills. If you’re anyone else you don’t need very much. But your learning career will probably advance faster, and with a greater degree of control, if you can clean, trim, sort, filter, find anomalies, know when a find and replace is OK and when not, when and how to use a scatter chart to show that two variables (eg playback length and popularity) are related (eg negatively correlated). know when and how to use PivotTables, and have a sense of what’s possible, when something’s amiss and where the juicy stuff is likely to be.

This sounds a lot like Excel functionality is. Excel is precisely the right tool for the needs of most knowledge workers, which is why a billion of us use it. Learning people should all have the basics of Excel to monitor, measure, interpret and improve how learners are learning. That’s something like items 1, 2, 3, 5, 9, 10, 20, 27, 31, 40, 43, 44, 53 and 55 from the list below. (You can get the full, free, interactive infographic here).


3. Sources of data

The learning industry often complains about not being taken seriously enough by the business. Data is part of the solution. Don’t just use siloed learning data from your LMS or even an HR System to make your business case. Contextualise and triangulate with data which is more likely to make the business sit up and listen. Most likely this will come from the business itself: sales data, traffic data, customer call data. Go further and say something above and beyond your company; use public data sets from AmazonGoogleReddit/538, even the CIA.

Let’s make this more relevant. Suppose you introduced a new learning initiative in May and take-up’s been low. You think seasonality is the reason it? Google trends could give you some data points to support (or undermine) that hypothesis. And by going outside of learning and outside your company you give the discussion a broader context and greater weight.

New call-to-action

4. Bad data

Many people get stuck at the start. All they see is that there’s loads of data and much of it is terrible. How on earth am I going to turn this mess into something useful?

And data can be bad in so many ways. There’s missing data (empty cells), wrong data (that video is simply not an infographic), typos, corrupted data (something bad happened somewhere), unuseful data (it’s correct but...irrelevant).

But there are solutions to all forms of bad data.

5. Reassurance

It will probably be OK. Know that:

  • Almost anything’s possible. It may not feel that way to begin with but there are so many shortcuts and bodges and functions and tricks and mapping possibilities. If you have a clear idea of what you want, you can probably achieve it.
  • Basic skills are essential. But basic skills are enough. So sharpen your analytical skills - see above. Ironically, this will give you a better idea of when you will need to consult those with more specialist data skills.
  • Something simple is better than nothing. Try to get a sense of the data. How many rows should it be? How many columns? How many are you missing? Why might there be gaps? Who would know the answer to that question? As you start to put the pieces together you are developing a scaffold of understanding on which to hang further detail. And as you develop this understanding you’ll gain a better sense of if when something is wrong and where the insights lie.
  • But nonsense is worse than nothing. Spot check and sense check as you go. Are the totals roughly what you think they should be. Keep a close eye on the big, recognisable numbers. They will be recognisable for scrutinising eyes form the business too!

6. Towards perfect content

To make intelligent learning recommendations we need to understand a vast amount of content from multiple sources, usually with plenty of imperfect metadata. To get to recommendations, first we need the right pool or universe of content. Note that whether you have many thousands or even millions of learning assets, a few hundred may be the right number for your company. We see this time and again even with very large firms.

We go through a process of:

  1. Understanding the problem [two weeks]. What’s important to the business? Is it low engagement (it is usually)? Are some specific skills missing? Are the on-boarders less delighted than you’d like them to be? Is churn the issue? This sometimes requires taking a couple of uncomfortable steps back to gain the right perspective.
  2. Establishing the universe [1 week]. How much stuff is there? Where is it? What other features are there? Is there a competency framework? Do you want to modify it? What should it look like? Should there be different frameworks for different departments? How do we link these to specific assets?
  3. Shrinking the universe [1 week]. Much content is not useful to anyone. It’s poor quality, out of date, not relevant. Let’s expel that. Usually that reduces the volume by 10x.
  4. Increasing the universe [a few days]. There’s bound to be some brilliant stuff that you don’t have. In many cases it’s free. Add that in. Usually that increases volume by just a few hundred.
  5. Fixing the data [1 week]. This can involve any number of: filling in data (from methodical spreadsheet gap-filling to AI-based classifier algorithms), cleaning data, developing taxonomies (and making them relevant to the business eg an existing or anticipated competency framework), mapping in from or out to other related data sets.
  6. Establishing how to do updates [1 week]. Establish an ongoing system for making these work. There’s usually an automated and manual element to this.

That’s a big part of how we ensure that the universe of content is exactly as our clients need it. Of course, it’s easier to list six bullet points than do the job in the sprawling mess of actual live systems and people. In reality none of the above is trivial but all of it’s possible. And ending up with a tighter selection of stellar content is well worth it.


Free learning content library benchmark
Filtered logo rotating

Get the best return on your L&D spend.