Validation: Sense or Nonsense?

Why don’t physicians bleed patients anymore to let out the “bad” blood? Why did they stop freely administering opiates, radioactive water, and addictive drugs as cure-alls? Why don’t they feel the bumps on your head to diagnose personality type? Because these are voodoo science. But voodoo science is not limited to medicine. It is still alive and well in testing and selection.

In fact, I thought it might be time to re-visit some nonsense validation procedures. In other words, learning whether test usage predicts vendor revenue or your candidate’s job performance.

Occupational Norms

You might have heard it before: We have norms for truck drivers, customer service, managers, fill-in-the-blank positions. Yes, somewhere along the line, an enterprising organization had a bunch of tests, a bunch of data, and a bunch of occupational titles. Someone suggested. “What a great idea if we could examine the norms of each occupation and see if they are different!” Sso far so good. Then they said, “Why don’t we use this information to sell more tests!” Bad idea. Voodoo science. They should have stopped while they were ahead. I’ll explain why.

Let’s suppose someone surveyed 100 truck drivers (TD), 100 customer service reps (CSR), and 100 managers (MGR). Do you suppose they all perform equally? Were all high performers? Identical within each group? Hardly! Just clustering norms into groups is no assurance that a single person in that group will conform to the overall norm, a conforming person is either a high or low performer, a non-conforming person is either a low or high performer, or any applicant who fits/doesn’t-fit the norm will be good or bad.

Tests sites that offer occupational norms are interesting, but only brain candy. It might be interesting to learn groups are different, but wouldn’t you really like to know whether the applicant can do the job? Deciding to hire or not hire a candidate based on whether he or she matches a group norm is voodoo science.

High-Low Groups

OK. Now we are going to ramp it up a bit. A few years back, a skeptical reader told me his test company divided people into a high- and a low-performance group, gave both groups the same test, developed norms, and used those norms to hire people. I said, “Sorry, but no.” He wrote back to say that his boss said there was more than one way to “validate” tests. “Yeah,” I thought, “There is a right way and a voodoo science way.”

Dividing groups into high and low performers has more than a few problems. In addition to the ones I described in occupational norms, we now have this thing called “performance.” In many cases, a performance rating is half fact and half opinion. We just do not know which is which. For example, I have seen many employees who were job duds, but skilled at sucking up to their manager. Co-workers knew he or she was a slacker, but their opinions were unimportant. The manager thought the dud did a good job. Accurate testing depends on knowing whether you are measuring actual job skills or suck-up skills. In the meanwhile, you can add validation based on high-low group averages to validation based on performance group classifications. Both are examples of voodoo science because they tell you nothing about the individual.

Training Tests

At one time, I was like everyone else: wowed by a training test. I answered 20 questions about being reserved, quiet, thoughtful, and so forth. I got the results back in the workshop. The 367- page report scores indicated I was reserved, quiet, and thoughtful. Amazing! “What a powerful tool for hiring!” I thought. Wrong! Voodoo science.

Training tests can tell us a great deal about a person’s self-descriptions, but only if the test-taker is in-touch with reality, honest, and knows himself or herself well. Those are big assumptions. The second thing about a training test is it was not designed to predict differences in job performance. That is a special condition. For example, do you know equally successful people in your organization who have different personality types? Do you know people with the same personality type who perform at substantially different levels? If you hire salespeople, customer service reps, or managers with the same personality profile, do you also have the luxury of selecting customers, prospects, and subordinates with matching personalities?

Personality differences and similarities must be carefully thought out. The only time they can be used to predict job performance is when you can separate correlation (e.g., shark attacks and ice cream sales are positively correlated) from cause (e.g., sharks attack swimmers who resemble shark food). Whoa! I can hear the keys furiously typing … ”But, what about culture and manager match???” Yes, that is also important, but successful hiring requires knowing your priorities, in this order: first, job skills; second, manager chemistry; and third, culture fit. Think about it. When is the last time you heard a manager comment, “Sure, he’s a job-dufus, but we have good chemistry!” or, “Yep, she cannot find her way out of a paper bag, but she really fits our culture!” Voodoo science: Job first, manager and cultural fit second.

Bogus Tests

Can you trust a scale that reports a different number every time you stand on it? A scale that is not calibrated to a uniform standard? A scale that reads weight when you really need volume? This nonsense represents what happens when test vendors fail to follow professional test development standards. A hiring test is supposed to measure something directly related to job performance, deliver stable scores over time, and accurately predict job performance. Anything less will produce bad hires and reject good candidates. If your vendor wants to sell you a hiring test, ask to see the report showing he or she followed professional hiring standards. Oh, yes, be sure to avoid vendors who promote matching candidates to occupational norms, high-low group validations, or cross-market their test for training. Users, not vendors, are responsible for test use. After all, your job is to make sure scores accurately predict performance for your job and your organization.

Article Continues Below

Performance Prediction

Imagine attending a Witch Doctor convention. People attending the workshops are arguing violently about what color chicken feathers is most effective in curing disease. You suggest they use modern antibiotics. The group hurls back a challenge, “Antibiotics are not perfect … we reject them!” Then they go back to something they know: arguing about chicken feathers. Voodoo science.

Hiring is a probability game with both controllable and uncontrollable variables. We won’t worry about uncontrollable stuff, but, do know that antibiotics are better than feathers. We also know high -quality hiring depends on identifying critical job skills and accurately measuring them. Identification, accuracy, and criticality get more high performers. It’s a fact.

Although it’s the chicken-feather of choice, once they screen out the blatantly unqualified candidate, casual interview-tests are no better than chance. This is due to applicant faking, unclear questioning techniques, personality factors, unclear objectives, and so forth. Adding behavioral event interview structure to interviews helps improve interview accuracy by clarifying critical factors, improved probing techniques, and making it hard for candidates to fake answers. But it is not easy to do and takes time to master.

Now here is the part people would rather not hear. Without getting all statistical, the more a test resembles the critical elements of the job, the more accurate it will probably be. For example, is an interview or actually solving a problem more accurate at predicting problem solving? A pencil-and-paper test or a realistic sales simulation at predicting sales success? A personality test or a planning exercise? Remember. Skills first. Manager and cultural fit second. What you measure will always be what you get. What you ignore (or mis-measure) is always left to chance.

Money Money Money

Here are some final thoughts about organizations both large and small. Line managers know the most about employee performance problems and its associated costs. They have the budget and they feel the pain. Aside from lawsuits and EEOC challenges, HR often has no idea what bad hires cost. I think this is because many of them have little budget and less pain. Although HR has the greatest potential to do something about bad hires, it tends to do the least. I think it all comes down to money and time.

HR is seldom willing to spend the money and take the time to calculate employee ROI. Instead it tends to spring for web-screening services that reduce their department workload, but do little to improve employee quality organization-wide. Meanwhile, line managers are left to their own devices. Considering the difference between their perceived and potential value to organizational profit, my advice to HR is to toss-out voodoo science practices, work with line managers to calculate the financial benefit of reducing turnover, improving new-hire performance, and reducing training time. Ask line managers for the budget to do something about it.


16 Comments on “Validation: Sense or Nonsense?

  1. “Why did they stop freely administering opiates, radioactive water, and addictive drugs as cure-alls?”

    Uhhh…I don’t know about the radioactive water but they’re still administering opiates and addictive drugs as cure-alls. It’s a HUGE problem in this country that impacts the workforce but nobody talks about it.

  2. John, what were you under the impression chicken feathers cure?
    All kidding aside, and I didn’t mean to take it down Silly Lane, I agree wholeheartedly w/ this:

    “HR is seldom willing to spend the money and take the time to calculate employee ROI. Instead it tends to spring for web-screening services that reduce their department workload, but do little to improve employee quality organization-wide.”

    There’s no silver bullet. Would that there was but there ain’t.

  3. I respectfully disagree with your article and I don’t believe that this argument is well supported in fact. By taking a “broad -brush” approach and casting doubt on all pre-employment assessment tests suggests to me you haven’t done your homework. Many of the pre-employment assessment tests are developed very specifically for the success predictors for the job and the organization. Not voodoo – its science. I have been in recruiting for the past 18 years now and I have seen many trends. I do see that many clients would like to mitigate their hiring risks. This is a sound way to do so.

    In addition, regarding your statements “Skills first. Manager and cultural fit second.” I would argue quite the opposite that it’s vital to measure cultural fit first and skills second. Not to say that skills aren’t important but the fact is that if the cultural fit isn’t a match – the candidate will struggle and will not likely be able to ever demonstrate the technical skills.

    In fact, I would strongly suggest that more organizations would be better served to focus more on the cultural fit and I would be confident that the costs of hiring and turn-over would decrease. Skills can be taught, personality traits cannot. I have had numerous conversations with clients who are looking for the infamous “purple squirrel” skill set – paying excessive amounts in relocation costs, recruiting fees, sponsorships of H1-B visas in some cases and premium skill compensation plans. Perhaps the candidate with measured cultural fit skills and a good foundation in the desired skill sets will actually stay longer and produce more – a much better ROI than the other premium skill candidates. We have seen it time and time again – they take the money and jump ship at the next highest bidder.

    In my experience, success Predictors are the true measure of separates the great from the average. And, to determine what that is, the organization would have this measured. Such science is no accident and it’s not competent to suggest otherwise. I would much prefer to focus on what can be measured and provide my clients with a structured method of selection. This way we are better able to provide consistent results.

    I do agree with your statements that such best practices are always not budgeted for. Author Herbert Heineman III , states in his book, “Staffing Organizations” that the trends may be turning “ Turn-over rates are often used in staffing metrics” (Heneman III pg. 651) This evidence suggests to me that indeed, HR is measuring turn-over and looking at the costs associated with employee ROI. Understanding why an employee leaves and organization is part of the measure.

    We can agree that there are the canned, “off the shelf-generic” products. And yes, I agree, will not measure success predictors for clients based on corporate culture.

    However, there are options. An organization seeking to be more successful will undertake best practice measures to determine which product is easiest to use. I would recommend to my clients that they likely want to achieve a better candidate experience and saving hiring manager’s time and is priced affordably. One such product is ( They offer a number of services options that include a tailored assessment feature – thus, allowing a company to find those specific success predictors.

    In sum, I think as Subject Matter Experts in staffing, as professionals we would be wise to support improved hiring strategies that will enable our clients to achieve greater success – the more successful our clients are, the more they need to hire – it’s just smart business.

    References: 2009. 22 March 2010 .
    Heneman III, Herbert G. & Timothy A. Judge. Staffing Organizations. Middleton: McGraw Hill, 2009.

    Submitted by:
    Sandra M. DeChant
    President & Founder
    Human Capital Consulting

  4. Let me see if I can understand what you are saying…
    You claim I have not done my homework. Does an MBA, MS, Ph.D. in selection,and 9 years developing and validating tests and assessments qualify?

    If, as you claim, cultural assessments and something called “Predictors” are more important than skills, why does the EEOC and OFCCP audit organizations using the 1978 Guidelines on Employee Selection Procedures as their base-line for compliance investigations? This document clearly states tests and assessments (that includes interviews) should be based on job requirements and business necessity (i.e., skills).

    Skills are easier to train? If you have a significant other, is it easy to change his or her behavior? As an executive manager, line manager, head of training for two large organizations, and master trainer for two international training organizations, I feel qualified to say that it is easier to train a chicken to climb a tree than to skills-train an unqualified employee.

    Finally, you cite as your source, a self-reported test written by a vendor. If you call the officer of that company cited as the SIOP expert, I’m pretty sure he would tell you that mental ability is the single greatest success predictor we know of. Self-reported tests are among the bottom. Furthermore, it is virtually impossible to train someone to become more intelligent.

    I could not agree more with your comment that staffing professionals would be wise to support improved hiring strategies…They are already being used by many of the leading organizations in the world, however, in my experience, they are seldom used or understood by external folks.

  5. I see two issues that hamper validation. First, predictor score are too often composed of manager ratings. While these are easier to get; however, they are biased. We can get better performance measures that measure multiple facets of our jobs and that are objective, Especially if it is a line job. Second, there is not enough follow-up studies on hiring test. Many time a validation study is done with incumbents, but no one looks back one or two years later to measure the “skills” of those employees that originally took the test and match up with the test content.

  6. Right on! Nevertheless, no validation = no assurance the test works. We can however, if the group is large enough, statisticaly control for some manager effect. I’ve also had good results by guiding managers through a behavioral anchoring rating workshop…taking each criterion, discussing it, discussing examples of performance at each rating level, and then rating employees.

  7. @Sandra:

    If the reference you mention validating is a neutral, unvbiased, double-blind study. please let us know. If not, please cite one that is. Otherwise, it is at best anecdotal. Also, please disclose if you have any financial or personal relationship with the people at

    IMHO, these might serve as guidelines:
    1) If you say something works, cite which neutral, unvbiased, double-blind study also says so, or say that the evidence is currently anecdotal.
    2) If you promote something, please disclose if you have any financial or personal relationship with the “promotee”. As an example: I’m always crowing about I DO NOT GET ANY COMPENSATION FROM THEM, AND I DO NOT KNOW ANYONE THERE PERSONALLY, (though I had a long talk with one of their executives). However, I like what they do.


    Keith “Snake Oil Raises My Cholesterol” Halperin

  8. ‘it is easier to train a chicken to climb a tree than to skills-train an unqualified employee.’ Classic!

    Love your work, Dr Williams. Keep up the good fight.

  9. Interesting points. But, I’m unsure if you mean ‘personality tests’ or actual work simulations. If you mean personality tests like Meyers-Briggs being used as a tool to predict performance, I agree with you. But, I don’t think many professional testing consultants would recommend using those tools for selection.

    But, if you mean work simulations (like those we use to test technical electrical acumen) I respectfully disagree. For these types of tests, typically there is a thorough validation process involving thousands of respondents along with statistical validation. While still not 100% accurate, they do demonstrate a very high rate of accuracy. And, effective use of these testing tools has had a measurable impact on new hire performance in those tested roles… for us, anyway.

  10. You said: “Skills are easier to train? If you have a significant other, is it easy to change his or her behavior?”

    These two rhetorical questions are unrelated. Behaviors are not skills and skills are not behaviors. No, I cannot change my spouse’s behaviors. But, I can give her information that changes her level of knowledge. That knowledge may or may not change her behavior, but if she learns something, it definitely will have an impact on her skill level.

  11. Hi Steve, This can get deep. I’m not sure what you are referring to, so let me clarify a few areas… A personality test is usually a collection of self-reported descriptive statements (i.e., MBTI, DISC, and so on). It helps people learn they are different from one another.

    A technical knowledge simulation requires performing actual job-skills (i.e., wiring a motor to run clockwise on a 440, 3-phase circuit, analyzing a control circuit schematic, and so forth). It tells you if a person can perform the job immediately.

    A technical simulation is the ability to perform a sample collection of job-RELATED tasks (i.e., hand-eye coordination, climbing a pole, operating lineman’s tools, and so forth). It tells you if the person has the ability to quickly learn the job.

    A mental alertness test measures how fast and how well a person will learn new cognitive tasks. This is imporant if you expect the person to continually learn and apply new information or solve problems.

    A motivational test tells you whether the person is WILLING to do job-related things.

    They all have different applications depending on what you want to achieve…and, as you probably already know, are only highly predictive (not absolutely perfect) indicators.

  12. My comment regarding spouses and skills refers to “what you measure is what you get”…Changing anything about a person can take considerable time and energy…and, even if he or she had the ability, the candidate may or may not want to accomodate you.

  13. Just to avoid us getting too lovey dovey on skills testing, recall that Roth, et al.’s 2005 meta-analysis brought Schmidt & Hunter’s criterion correlation coefficient down to .33, below GMA, integrity tests, and interviews (structured or unstructured), and about on par with conscientiousness, biodata, and assessment centers. If you buy their analysis. And put stock in criterion-related validation coefficients, which I realize people have raised questions about already in this thread (although I would point out that supervisory performance ratings are pretty much the de facto standard in criterion validation studies).

    I’m still a huge believer in skills testing. And to some extent we’re re-creating here the age-old “what type of test is better” debate, which isn’t the right question.

  14. I have the highest respect for meta analyses…they have their purpose, but they should never be used for more than casual observation. I’m probably preaching to the choir, but consider the following…MA is a statistical study of statistical studies, it does not control for the quality of its sources; MA usually includes only published studies,studies that have not been published are generally ignored; and, mixed sample sizes are often significantly different leading to erroneous conclusions. In short…it’s a big mistake to quote specific meta analyses numbers as sacred.

    My point yesterday, today, and will be tomorrow…the only test (aka interview, application, etc.) that should ever be used for hiring is one backed by a professional-quality study (i.e., following the APA “Principles” and DOL “Guidelines”). If you know of any other way to show a test predicts job performance, then I would be interested in learning about it. As to unmoderated supervisory ratings…there are other ways to increase predictability.

Leave a Comment

Your email address will not be published. Required fields are marked *