The Dangers of Data, Part 1

I have been using data to guide decisions for a long time, and I have made a ton of mistakes in analyzing that data that has led to bad results. There’s a reason people say “garbage in, garbage out,” but how do laypeople know when data analysis is useful and when it isn’t?

Most people never know. To compound the issue, we are all presented with more access to data, and more exposure to unsupported analysis of that data presented as truth. To learn how easy it is for folks to be fooled, read this article.

There is neither space or time enough to cover every potential issue with data analysis, so what I will offer here is a brief list of common errors in order to help everyone up their game.

Article Continues Below
  1. Confusing “Cause” and “Correlation.” I’ve written about this before, but this is, in my experience, the most common error made, and the one most commonly committed by the media as well. If you read the article linked above you will see how data dredging works. To overly simply it, this equates to identifying correlation in data and using that assume causation. It (like any data analysis snafu) is a slippery slope. In my past I worked for an organization where one division had much, much higher turnover than any of the other divisions. Attempts to manage that turnover, and foster retention, were random and desperate. Lo and behold, however, suddenly in the space of just a couple of months, turnover dropped to near zero. The HR leader for that area immediately trumpeted about the latest retention effort and how it turned everything around. The data could certainly demonstrate correlation, but it was premature to announce causation. The HR leader had not taken into account any additional factors, or conducted any comparative analysis to remove all (or as many as possible) variables other than the retention effort. When we looked at all divisions, turnover had dropped suddenly at all divisions and across roles and locations. A survey was launched to ask employees why they were not leaving anymore; were they now happy in their roles? The results confirmed a larger factor impacting turnover: the economy had quickly turned for the worse. There were suddenly very few jobs and innumerable job seekers. According to our employees, they weren’t happy, but it felt forced to wait the downturn out. What would the outcome have been if we had accepted the initial assumption vs. digging deeper and now having time to really solve the issue of retention? Which brings us to:
  2. Be skeptical, always. Analysis is prone to bias, accounting errors, poor assumptions, and manipulation. Unless you have implicit trust of the source, ask questions. What was the sample size? What was the initial hypothesis? How was the data collected? How are variations being defined (i.e. is what you consider significant, what I consider significant)? Are you reporting results that are transferable? If you aren’t sure of the data, do your best to test it yourself. This is core to research: are the results able to be replicated by others? Most results about research you read in TA cannot offer apple-to-apple type comparisons, since the variables are different for each organization (location, cost, supply, demand, etc.). So just because something works for Spectrum Health does not mean it will work for you. There may be a statistical probability it will work, but there is no definitive proof it will, so just doing something that someone else has done and expecting the same result is a fools’ errand.
  3. Summary statistics don’t tell the whole story, so use them carefully and only if you are prepared to share underlying detail. A great example of this is time to fill reported as a singular number for an entire organization, which is generally a useless number. Why? Well unless every position you recruit for is the same, then there will be variation. In order for your organization to truly be prepared for and manage vacancies, they deserve to have a more accurate and specific datum. For example, in our organization the time to fill for a third Shift RN in the NICU is drastically different than a customer service rep in our call center. If we reported a single time to fill number, our nursing leadership would be disappointed when we could not deliver in the time reported, and our customer service team would be appalled at how long it took us to fill positions. Neither accurate assumptions since we supplied a garbage number instead of providing necessary specificity.

Collecting and analyzing data requires patience, caution, and the ability to keep asking questions to drill down to honest answers, even if those answers disprove your thesis or prove contrary to conventional wisdom. I will continue to explore this fascinating topic next month, and look forward to talent-acquisition organizations becoming bastions of analytics!

Jim D'Amico is a globally recognized TA Leader, specializing in building best in class TA functions for global organizations. He is an in demand speaker, author, and mentor, with an intense passion for all things talent acquisition. Jim currently leads Global Talent Acquisition for Celanese, a Fortune 500 Chemical Innovation company based in Dallas, TX, and is a proud Board Member of the Association of Talent Acquisition Professionals.

Topics

5 Comments on “The Dangers of Data, Part 1

  1. Also (I’ll call it No. 4) is the danger of thinking that data, even really good data, that is true for a group is true for a given individual. Such as:
    — You determine through research into your current employees that people who went to college perform better than those who did not. Or, people who went to grad school perform better than those who did not. Or, people who did not go to college perform better than those who did. Or people who came from one source of hire — say from agencies vs. job boards — perform better or worse. And let’s even say that whatever the group is, they perform on average much better. Maybe that helps you allocate money, but for an individual, well, they could easily fall into the 20% or whatever percent who doesn’t fit the mold. Easily, they could be a great recruit and not fit whatever you found was true in general.
    — You read that women are often this type of candidate, and millennials are often that type of candidate, and IT people are motivated by A, while creative people are motivated by B. Again, these things may be pretty solidly true for, say, 75% or 80% percent of people, but they easily may not apply to an individual. One out of five is a pretty big number of exceptions. A given individual woman may have the characteristics of what’s supposedly a typical male, and a given male may be motivated by what’s supposedly motivating women.
    I imagine all this comes down to good recruiters who ask, “everyone’s different but what would motivate *you* to make a job move?”

  2. Todd, great #4! Data doesn’t produce perfect or “magic” information, and we all have to avoid broad generalizations whenever possible. Additionally, what is true today, is subject to change over time based on a variety of environmental factors. Thanks for the great addition!

  3. On point article Jim. Those of you that know me and given my current role, I love not data but the stories they can tell. So let me add one additional thought for people when telling the data stories that has help me immensely throughout my career.
    You will never get 100% accuracy when people are involved in the input of the data. I would always preface with leadership that the data story is directionally correct based on all the possible variables and like most business decisions which have lots of complex dependencies, this is solid enough position that informs a leader to build the right plan of action.

    Data for the sake of Data is just busy work. It must inform a course of action.

    1. Rob, thanks for pointing out that limit of humans! We often assume an automatic infallibility of ‘data’, but forget that it could be fat fingered me typing in the datum!

Leave a Comment

Your email address will not be published. Required fields are marked *