Learning Insights | Filtered | Content Intelligence

Learning gunk

Written by Marc Zao-Sanders | Oct 11, 2021 5:34:09 PM

Gunk /ɡʌŋk/
noun INFORMAL
An unpleasantly sticky or messy substance. Often vague and unknown.

Every single company has a glut of learning gunk. It’s accumulated over years. It’s saved in multiple, disparate, rarely-opened folders in corporate cyberspace. Much of it is duplicative (and that is partly why it grows). Much of it is obsolete. Some of it is corrupted. All of it is poorly described and signposted.

They are often files with file extensions such as .mp4, .mp3, .pptx, .pdf, SCORM 1.2, SCORM 2.4, AICC, .zip, .wordx, .m4A, .wav, .wma...

When extracted into a spreadsheet it looks a lot like:

It’s a problem

Too much learning gunk means your workforce misses out on the good stuff. There are valuable materials stored there too. Indeed, the most relevant, highest quality, most impactful learning materials you have are those created by your workforce. But the gunk obscures them.

That means your workforce has a bad time. The experience of wading through gunk is unpleasant for your workforce and so of course they avoid it.

The average learning platform user experience

It’s wasted human effort. Thousands of hours and more have gone into the production of this content for it to go to waste. Furthermore, thousands of hours of future effort are potentially wasted duplicating perfectly valid and usable existing work.

What you can do

What you can’t expect to do is fix all gunk. Plenty of it is irredeemable. But if you take a pragmatic data-oriented approach to this, you can filter it to extract the 80% of the value hidden in 20% of the most relevant content (thanks Pareto principle). Here’s how. 

  1. Get an extraction of the gunk. It will be near-impossible to get an exhaustive list. But a simple extraction from a LXP or LMS would be a great starting point. If you can ask IT to supplement this with a pull of data from a file-sharing platform like SharePoint, so much the better. But don’t shoot for the moon with this first step. Get some quickly rather than all never.
  2. What’s obviously good.
    A small but important portion of the gunk will be good. Highly relevant, high quality, good data. This will partly be in the data and partly from what you or colleagues happen to know about it (e.g. the provider / creator is very reliable). This is likely to be considerably less than 5% of the full pool.
  3. What’s obviously bad. Bad means obsolete, duplicative and poor quality. There are clues for each of these types of bad content. Obviously, take a back-up before making any irreversible changes.
    1. Obsolete content. Tends to reveal itself by creation, last-used or publication date. It might also be in the title (obsolete software training for example).
    2. Duplicative content. Software makes it easy to copy files and we went to town. Over years, many different colleagues will have made copies of files intentionally and unintentionally. The clue for these can be in the title (obviously) or the exact file size. Of course you need to run spot checks and create back-ups before deleting anything permanently. 
    3. Poor quality. This is less obvious. Still, there are clues in the data. Those clues might be to do with the subject matter, usage / non-usage data (starts, completions, feedback), and anecdotal feedback on the producer or creator in general.
  4. Devoid-of-data. Web content and library content (Skillsoft, Linkedin Learning, etc) always comes with some metadata (it might come with metadata that are not relevant to your high-value skills and business priorities but there will be something). Proprietary content, created for the most part by your workforce is not the same. It’s often totally without data.

    Well two pieces of good news. First, even just 25 words associated with the content is often enough to generate some useful data (see point 5, below). Second, you might be able to generate those 25 words from the main body text. But if there’s really no existing or realistically attainable metadata, you may just have to archive it. (The only exception to this would be if you suspect that some bits of it are extremely useful for some parts of your workforce and if that’s true, there will need to be some manual work to uncover that.)
  5. Generate some fresh, new useful data. with Filtered’s Content Intelligence. As I wrote in Right Content: The Buyer’s Guide to Learning Content:

    The good news is that algorithms were made for this kind of big-data-narrow-scope task. At Filtered, we have developed our own algorithms which apply skills and other tags rapidly and reliably. We only need ~25 words of data to be up and running, and require zero training data. Zero.

    That means we can take, for example, 10,000 proprietary learning content assets, each with very little metadata, and tag them all in the time it takes a human curator to make a cup of tea.

Your problem is now solved. You won’t have solved absolutely everything. But you’ll have solved 80% of it. And even if you’ve only identified and resurfaced 10-15% of high-value content from the gunk, since there’s so much gunk - and also, frankly, since your workforce is probably time-poor - this is exactly the right kind of solution.

What you can do right now

  1. Get a slice of the data described in #1 above. Start to run the kind of analysis described above. If that becomes difficult, feel free to ask us about it.
  2. Send a sample of that data to us below and test the Content Intelligence algorithms tangibly by asking for a sample output. AI shouldn’t just exist in marketing literature; its benefits should be tangible and demonstrable so insist on that with us or any other vendor making bold AI claims.
  3. Decide if that’s valuable for you and your organisation. It has been for all of our clients so far, but you be the judge of that - see step 2, above.

Learning gunk has long been a pernicious, pervasive problem for organisations. You now know how to tackle it.