Good Test? Bad Test?

Get used to it: unless your organization hires everyone who applies, you are testing. Some people (even attorneys who should know better) vigorously deny that their organizations test applicants (pssst?interviews are tests!).

Whether an organization uses verbal questions or written questions, they both have the same objective: to separate qualified applicants from unqualified ones before spending big bucks on salary, benefits, and potential lawsuits. Tests are tests.

Now?let’s discover whether your test is working for you.

A Good Test

Separating a good test starts with reliability. Suppose an applicant takes a test on Monday and on his way out, you deliver a carefully aimed blow to the head sufficient to cause short-term memory loss (but not permanent damage).

After he gets out of the hospital, you invite the applicant back to take the same test a second time (with the promise of safe passage). Will he score roughly the same? That is, can you trust the scores to remain consistent from one time to the next?

This is called “test-retest reliability.”

Reliability means you can trust a test to deliver similar scores regardless of when it was taken. Otherwise, you would never know whether it was accurate.

Interviews, for example, are notoriously unreliable. Interviewers tend to like or dislike applicants; they may ask different questions of different candidates; they may think the objective of the interview is to get to know the applicant (wrong answer!); they tend to rate applicants based on personal appearance; and sometimes interviewers just talk about themselves. Interview test-retest reliability is pretty low.

Reliability is not limited to interviews. It also applies to many popular tests used in training, especially ones that measure personality type. Type-tests are fine for workshops and communication classes, but even some of the most popular ones are filled with reliability problems. Independent reliability studies show scores from a popular four-letter type-test tend to change from one time to the next. So, test authors, which score is the “real” score? The score on Monday? Tuesday? Last month?

Before you subject any applicant to a test, examine the vendor’s manual carefully and search for a section on “reliability.” You want proof the vendor knew enough to study the reliability of:

  • Each test item (item analysis).
  • All test items (inter-item reliability).
  • The first test half compared to the last half (split-half reliability).
  • The same people at two different times (test re-test reliability).

If you cannot find any reliability data, then your favorite test scores probably change from day to day. The next time you buy a pound of cheese, wouldn’t it be nice to know you were really getting the weight you paid for?

Hopefully you see that unreliable tests are a dead end, especially since most organizations want their tests to predict performance.

Using Test Scores for Prediction

Predicting job performance means that a reliable test score is directly related to job performance. The word “directly” means two things. First, it measures something that affects job performance. Second, the scores correlate with ratings. A typing test, for example, is clearly linked to jobs that require keyboard skills. If your organization still has a typing pool, the scores probably indicate the amount of work a typist can do.

But are keyboard skills always linked to job performance for management? Should we fail candidates who could learn keyboard skills in a few weeks or months? Do we know if applicants are physically unable to operate a keyboard?

Accurate prediction is called “validation”?and if you thought reliability was complicated, you ain’t seen nothin’ yet! Validation requires knowing clearly what skills are necessary for the job, and doing sufficient analysis to show test scores are statistically correlated with job performance (i.e., the test content and job requirements are causally related).

Otherwise, you are predestined to turn away qualified people and hire unqualified ones. Is that wrong-headed or what?

Why Should I Care?

If your objective is finding and filling, then you probably don’t. Stop right here, get some coffee, and don’t send me any nasty-grams. I assure you reading this article will be a colossal waste of your time.

However, if cutting turnover in half, doubling individual productivity, reducing training expenses, and building a solid base on future-qualified employees is attractive, then you need to know this. These claims are all normal for an organization that uses reliable and valid tests. Why? Their tests screen-out unqualified applicants. In case you are wondering, only about one applicant in six (on average) can pass a series of validated tests. Put another way, only about one applicant in six can demonstrate skills required for the job.

Ever hear about the 80/20 rule?the one where 20% of the people produce 80% of the results? It’s amazingly close to a one-in-six hiring ratio. Think about it. So if you care about making the biggest splash ever in the company pool, then continue reading.

A Bad Test

A bad test is one that an organization uses consistently, is backed by folklore and plenty of personal anecdotes, but has never been critically evaluated. Bad tests usually come out of corporate training programs. That is, a workshop participant who answered 10 questions about being a thorough planner was “amazed” when the test reported he or she was exceptionally organized. Next step?.use it for hiring!

Folks, personal agreement with test scores is not a reliable and validated way of predicting job performance. It is only a summary of how someone describes himself or herself. It is a self-reported description. Is the person actually as organized as he/she says? Or are they faking? If they are not faking, is organization important to job performance?

Article Continues Below

Defining the Job

This is a tricky area. The secret is to define the critical skills that directly affect job performance. This might include learning ability, problem-solving skills, persuasiveness, and so forth. The key to defining job requirements is to identify behaviors leading to job success or failure. It sounds weird, but you don’t look for results, just the behaviors that lead to the results.

If you cannot clearly define the key job skills, then there is nothing to test. The 1978 Uniform Guidelines suggest job competencies be based on job requirements and business necessity. I don’t know about you, but that sounds pretty good to me. Amazing! The government recommends organizations test for job requirements and business necessary. If anyone out there can suggest something better than basing a test on job requirements and business necessity, I’d like to hear it.

To reiterate, your test first has to be reliable. Then you must know what to explicitly measure. To make sure the test works, determine whether test scores predict job performance. We call this step “validation.”

Throw It On the Wall and See What Sticks Approach

Here is a sure clue to wrong-headed hiring practices. It goes like this. A vendor has a general personality-style test (we’ll make a fanciful assumption that it passes professional reliability standards). The vendor herds high producers into one group and gives them the test. He examines the averages and declares, “Yea, verily, these scores doth become our target!” (vendors like to use old English?it sounds so classy!).

Whoa?not so fast.

How does one define high-producer? By results or by actions that lead to results? It makes a big difference. Individuals in the high-producer group could have used different skills to get there. Some might be good politicians. Some might be very smart. Some might be taking credit for others’ work.

What about the confusion between correlation and causation? Just because ice-cream sales and shark attacks are correlated does not mean that one causes the other. Almost anything can be correlated, but not everything is causal. If you sort through enough garbage, you are likely to find correlations between cookie wrappers and hotdogs. So what? Your goal is to find a correlation between hotdogs and hotdog buns.

The “see what sticks” approach has a few natty problems. Sure, it looks scientific, but what good are decisions based on wrong-headed performance criteria, wrong-headed clustering techniques, and wrong-headed statistical analysis?

Job-Match Approach

The job-match approach is scientifically similar to the “see what sticks” approach, except worse. Some types of tests say certain occupations have similar styles: Introvert Sensing Thinking Judging (ISTJ) for example.

Before you use this stereotype for hiring, ask yourself if all the people in the same occupation do the same thing, or do they all perform equally well? Did their personality style cause them to be an engineer? Are these folks extreme ISTJs or are they marginal ISTJs? Do their organizations all have the same objectives for the job?


Everything starts with the human elements of job requirements and business necessity. Human elements are seldom included in job descriptions or job evaluations. You have to dig for them. If you cannot test/interview for specific human elements, your tests will probably be inaccurate.

All selection tests have to pass rigid standards for reliability and validity. Reliability means the test delivers consistent results time, after time, after time. Validity means the test scores accurately predict job performance and should be done carefully.

It is a grave mistake to assume any group of performers has equal skills. For example, some salespeople are great repeat sellers, some are great cold callers, and others are great service people.

They all might be high performers but for entirely different reasons. It is a big mistake to assume characteristics or traits correlated with performance actually cause performance.


4 Comments on “Good Test? Bad Test?

  1. Under the heading of “Job-Match Approach”, Dr. Williams chose an example (ISTJ) drawn from a Myers-Briggs Type Indicator (MBTI), about which Dr. Williams has previously said: (i) ‘People who take the test may find they are characterized as one kind of person today and if they take it tomorrow, they will find they are characterized significantly differently.’ and he said(ii) ‘In most cases, scores on a personality test have little or nothing to do with how well you perform on the job,’

    In fact, CPP, Inc. (publisher of the MBTI) clearly states that it is not ethical to use the MBTI instrument for hiring or for deciding job assignments. MBTI is neither sufficiently reliable nor sufficiently job-related to conform to the U.S. Department of Labor?s Uniform Guidelines on Employee Selection Procedures. Dr. Williams seems to agree. Yet, MBTI and a plethora of copy cat, four-quadrant personality tests still serve as ?pillars of the hiring process? for many unwitting employers.

    MBTI type mappings onto job titles is not what ?Job-Match? means to those who advocate best-in-class practices in hiring and promotion. As a proponent of real job-matching, I was disappointed to see an apparent connection being made between the term, “Job-Match”, and what Dr. Williams characterized as: “similar to the ?see what sticks? approach, except worse”.

    In the words of Jim Collins (“Good to Great” (c)2002) “Get the right people on the bus, the wrong people off the bus and the right people in the right seats.” Getting the right people (employees) in the right seats (employee roles or jobs) is what job matching is all about.

    There are highly reliable, thoroughly validated, job-matching assessment instruments that consistently outperform alternative employee selection procedures (e.g. behavioral tests and standard interviews) by a significant margin.

    Real job matching starts with a thorough ?job analysis? ? i.e. What does it take to perform well at this job in this employment context? Only then, can reliable, valid and sufficiently comprehensive assessments gauge the extent to which particular candidates may have what it takes.

    Job analyses examine (i) the activities and tasks that make up a job, (ii) the conditions under which they are performed and (iii) what the job requires in terms of aptitudes, attitudes, behaviors, values, interests, knowledge, skills, abilities and other qualities.

    Dr. Williams intentionally used a “Bad” example of “Job Match” to make his point. I want to point out that “Good Job-Match” represents a best practice that employers can use to systematically build high performance organizations. It?s not just me saying it. Job matching is the answer to Jim Collins mantra, from a book that has topped best seller lists for five years, now. We’ve known the power of job match for decades and today’s best online assessments make it more straightforward, more powerful and more affordable, than ever

  2. Hmmm.
    It seems that if interviews are unreliable predictors, and jobs are often too complex to break down into a series of accurate and valid assessment tests (e.g.,
    “How many tests would be required to accurately measure the skills, traits, and abilities which define an exceptional Engineering Manager or High School Teacher or anything that doesn’t involve highly-quantifiable, causally-related results?”), then *hiring is fated to be an inherently error-prone process. On the other hand, if it IS theoretically possible to create a series of ?useful? tests which can measure predictive ability in a large number of fields, then we might finally be able to end ?the tyranny of the resume,? where most individuals are constrained by what they have done as opposed to what they can do?.

    A request to the readers: please don?t send me an email about how wonderful and valuable are the tests you sell to companies!


    * I suspect that (like voting systems) there is an inherent trade-off between a number of factors that we consider valuable/good in tests, but I do not know this for certain, and hope I am wrong?.

  3. Whatever you call it, selecting qualified people for the job starts with a thorough understanding of critical job elements. You can think of these elements as ‘itty-bitty behaviors’ moving toward results.

    For example, being smart enough to learn and solve job-related problems; being able to plan and implement activities; being able to get things done through people’ and, so forth. Professionals call this ‘job-analysis’.

    Matching people to jobs is a function of knowing exactly what you need (e.g., job-related mental skills, people skills, and motivations discovered from the job analysis), then using professionally-developed tests to evaluate applicants.

    An interview is just another form of ‘test’. It has questions, answers, and a scoring guide. What makes traditional interviews inaccurate is 1) lack of clarity about what itty-bitty skills are needed, 2) poor questions, 3) self-reported answers, and 4) subjective scoring guides.

    Behavioral and siutational interviews MAY more accurate…but only when they are based on a job analysis, interviewers are trained in best-practice questioning technique, multiple interviewers integrate their individual data, and answers are standardized.

    Anything less and you ‘role the dice’ with the organization’s money…about 50/50 is the average.

  4. Dr. Williams’ article is a perfect selection tool for picking an assessment tool. I wish all of our prospects had a copy.

    As others mentioned correctly, the human element also comes into play and in several ways. First, with the understanding of the job and defining the correct criteria. An assessment should have a tool to help bring together multiple views of a position and a process for synthesizing a single profile.

    Second in the application of the resulting data. An assessment tool gives you insight into the individual that would not otherwise be available. A competent interviewer will use that insight to ask better questions and reach a better decision.

    A good assessment tool will improve every aspect of the hiring process but will never replace the human interaction needed to finally select an ideal applicant.

    Steve Waterhouse

Leave a Comment

Your email address will not be published. Required fields are marked *