Assessments: Can’t Live With ’em, Can’t Live Without ’em

If you have been reading ERE over the last few weeks, you have probably been exposed to assessment argument overload. You might have read claims that unstructured interviews alone were sufficient to survive a guarantee period. You might have read selection scientists quoting numbers showing it took more than interviews to reduce turnover, increase training success, and increase on-the job-performance. And, of course, you might have read a few recruiters immodestly claim they knew more than anyone else on the subject. Well, good luck with that.

It reminded me of a scene from an old Monty Python movie where French soldiers high on a castle rampart shouted tauntings at the English below. The English, who spoke no French, had no idea what they were saying. The French, who spoke no English, had no idea their tauntings were being ignored. In frustration, the French hurled a cow at the English (i.e., a seldom-used medieval weapon of udder destruction).

Well, tauntings can also lead to confusion among bystanders who don’t know what to believe … scientists who study the effectiveness of different assessments under controlled circumstances, or someone with strong opinions and a product to sell. So, let’s see if we can clear away the smoke and mirrors.

Recruiting Objectives and Organization Objectives

I don’t direct these articles toward recruiting firms. The ones I know tell me their main quality of hire measure is surviving the guarantee period. Organizations, however, are different. They want lower turnover, successful training completions, and higher individual productivity. I have never heard an organization mention guarantee periods. So, if guarantee periods are your main metric, it’s time to stop reading and have some coffee or tea. If however, you are a typical organization, keep reading.

Assessment Defined

Assessment is just another term for measurement. Every method used to evaluate applicants is an assessment. That includes application blanks, recruiting sources, photographs, interviews, tests, training workshops, video interviews, and so forth. And, unless you hire everyone who applies, the choice is not whether to assess, but how accurate and consistent you want assessments to be.

Rolling the Dice

Hiring is a game of odds. In spite of statements to the contrary, nothing anyone can say, do, or ask will provide 100% certainty that a specific employee will survive a guarantee period, have long tenure, quickly learn, effectively solve job-related problems, or become a top performer. Anyone who maintains otherwise lives in a parallel universe. However, even though achieving hiring perfection is like reaching the carrot at the end of the stick, we can do a great deal to control our odds of success.

In most cases, a common interview has one good purpose: it screens out blatantly unqualified candidates. The more questions you ask, the more opportunity there is for a candidate to say something wrong. Upon passing the interview, however, research shows the odds of success are about 50/50. Starting with a base-rate of chance, a smart HR group has potential for improvement, providing, of course, they start with a clear understanding of job requirements and business necessity.

It’s critical to discover specific competencies associated with job performance or failure. Job descriptions and compensation bands are just one source of data. You need to extract trustworthy competency information from training programs, job holders, job managers, and a visionary manager or two. This is not easy because most people don’t think in competency terms. However, once you have a critical list of job competencies, you can start using assessments to mine three sources: a candidate’s past performance, future intentions, and present-day abilities.

Hiring Competencies: The Candidate’s Tool Box

Many people do not understand hiring competencies. I’ll keep it simple. A hiring competency is not something a candidate accomplishes on the job. That has too many variables. A hiring competency is a specific skill the candidate uses from time to time to get the job done; and, it has to be something we can accurately measure quickly.

On the simplest level, a hiring competency might include skills like learning ability, technical knowledge, problem-solving ability, organization skills, prioritization, coaching skills, persuasive skills, or the like. It might also include attitudes, interests, and motivations to apply those skills. Think of hiring competencies and AIMs as the candidate’s “personal toolbox.” It’s not a work product left behind at the end of the day.

Measuring Competencies: The Recruiters’ Toolbox

Hiring personnel have the responsibility for quickly and effectively measuring candidate competencies. The need to master questioning techniques that probe the candidate’s past performance while, at the same time, making it hard for the candidate to fake good. These are usually called behavioral event interviews or BEI’s. BEI’s gather complete stories, extract competencies, and compare them to job requirements. For example, if my job requires analytical skills, I might ask a candidate to share a time when they had to solve a problem, what the problem was, what they did, and what the result was. Once I learn how the candidate solves problems, I can use that information to predict performance in the new job. But, be cautious …

Article Continues Below

The high-structure of BEI makes it more accurate than garden-variety interviews, but BEI is not perfect. And, BEI is not a set of short questions. Candidates are still motivated to hide weakness and often give examples that are not even close to the job. On the other hand, BEI-trained interviewers must have the skills to dig for data, determine and evaluate hiring competencies, distinguish between hard facts and a good story, and know when to press for details. BEI accuracy requires thinking like a detective. It usually takes months or even years to develop the skills. And, even the best BE interviewer is only as effective as his or her job competency list.

That is why savvy organizations add other validated tools to the hiring process: Something they call a multi-trait-multi-method process.

Validation means the tool is tested and proven using some aspect of job performance. Validated tools include self-reported tests, knowledge tests, and generic ability tests. I’m intentionally excluding tests such as the MBTI or the DISC, as well as clinical tests like the MMPI. In my professional opinion, broad personality or clinical tests should never be used as hiring tools. There is often little or no proof they predict job performance (e.g., you can read about this in some of my earlier articles). Never use any test whose vendor who cannot provide documented proof the test was designed to predict job performance. Other assessment tools include simulations that require the candidate to perform critical parts of the job; skill tests that measure cognitive ability or technical knowledge; smart application blanks; and, realistic job previews that provide gut-honest descriptions of what it’s like to work the job.

About Correlations

Selection scientists do not trust personal stories or opinions. Because they have learned how easily people can be mislead, they only trust tests that measure something both necessary for the job and have a strong correlation with some aspect of performance. Knowing the difficulty of being absolutely, positively correct, they report results in terms of “strength” of the association between scores and job performance in terms of a correlation ranging from perfect negative (-1.0) to chance (0.0) to perfect positive (+1.0).

Correlations are frustrating for people who insist on certainty, and confusing for people whose last exposure to statistics might have lead to periods of prolonged rest and sedation. So let’s put stats on the shelf and examine a few facts that are no-brainers: 100 smart employees will outperform 100 dull ones; 100 motivated employees will outperform 100 unmotivated ones; 100 persuasive salespeople will outperform 100 unpersuasive ones; 100 coaching managers will outperform 100 non-coaching ones; and, 100 candidates who demonstrate they can do a job will outperform 100 ones who can only tell you about it. We may never be 100% accurate on a person-by-person basis (i.e., there are too many unexpected events that can affect our decision) but at the group level we can almost always skew the odds heavily in our favor.

So, which assessment methods do you think will deliver the best performing workforce? Those that start with job descriptions, or those backed with a detailed list of hiring competencies gathered from job holders, managers, and visionary managers? Those that use a few general questions, or those using validated tools such as structured behavioral or situational interviews, simulations that require the candidate to perform critical parts of the job, attitudes, interests and motivations tests, skill tests that measure cognitive ability or technical knowledge, smart application blanks, and, realistic job previews that provide gut-honest descriptions of what it’s like to work the job?

In all situations, we’ll use the gold-standard definition of quality of hire: collective turnover, training success, and on-the job-performance. Meanwhile, keep a sharp lookout for flying bulls!


43 Comments on “Assessments: Can’t Live With ’em, Can’t Live Without ’em

  1. Everyone knows that assessments (of all types – interviews, resume screens, tests, simulations, etc.) don’t predict the future performance of a candidate perfectly. Let’s put that in context:

    The very best assessments (job-related, face valid, statistically reliable and valid) correlate about 0.35 – 0.50 with future job performance. Some researchers report correlations a bit higher, but those tend not to turn out to be repeatable experiences. This means the scores on assessments, at best account for 10% – 25% of the variability in individual performance (the square of the correlation coefficient).

    That’s right – a max, absolute maximum of 25% percent or so of the variation in performance! By the way, to me that statistical reality feels about right and matches what most line managers intuitively believe.

    So, what accounts for the other 75% plus of variation in individual performance? Here’s a partial list:

    – Management/leadership.
    – Training & development experiences on the job.
    – Organizational structure & culture.
    – Business strategy & processes.
    – Completely external factors like the local market conditions, etc.
    – Tools, equipment, and systems available for the job.
    – etc.

    Maybe this is obvious, but it’s worth repeating. 75% to 90% of the variation in individual performance is explained by external, business, and HR/T&D factors completely outside the hiring process and the assessment used.

    Nevertheless, that 10% to 25% that assessments can account for can create a huge organizational and financial impact.

    So, let’s be sure to hire the best using measurement processes that reflect the best science and practice. But, don’t assume the job of management is done when you’ve hired a great population of employees.

  2. I’ll throw the same three ideas out there on this thread as I do on others of its type:

    1) Assessment for creative or leadership roles should encompass group social dynamics because those dynamics appear to have a large effect on outcomes.

    2) Misuse of specialized Assessment methods is easy and likely more expensive than simple non-use. Assessment vendors have their own agendas and the business if full of quacks. Real care is needed to use various methods in the correct context.

    3) Work simulations are usually the best assessment methods available, on multiple levels.

  3. How long till we hear someone say “Hold on! MY assessment method is foolproof and gives great results! Only $49.95 to get a standardized interview tool (never mind people aren’t standardized).”?

  4. Re Martin Snyder’s comment: There are indeed many quacks in the assessment business. Fortunately, independent observers like Wendell Williams and Charles Handler have made it their mission to help business folks understand how to sort one group from the other. Generally, the “big guys” in the field deliver pretty good quality (SHL, Kenexa, PreVisor, AON, DDI, etc.)

  5. Assessments are at tool — part of the PROCESS. The most intelligent companies are hiring using a well-thought out and clearly defined hiring process. This includes effective sourcing and pre-screening, as well as a smart, coordinated and planned interview process with appropriate feedback systems in place among the interview team. Assessments increase the likelihood of success. With haphazard non-process based recruiting, you may get lucky sometimes, but in the end you pay for the lack of attention by making poor hires or not being able to retain your best talent. We use a great assessment tool for our clients that not only shows performance style, but looks at ambitions and drivers for success as well. If you would like to try it out, go to this link and you will get a 24 page report with your results.

  6. The real-deal with assessments is not whether you use XYZ assessment(s) or not…It’s how you use them. As a general rule, a wise organization will only choose assessment(s)that accurately predict job “performance” based on business necessity…Then, they will measure critical skills at least twice. Finally, they will put their cheap, yet reasonably-accurate assessments first, creating a multiple hurdle process where a candidate has to pass one phase before proceeding to another.

    It’s wrong-headed to make simple-statements about what kind of assessment is best…It all depends on the job.

  7. Wendell,
    Re: BEI accuracy requires thinking like a detective.
    It has been my experience that really good recruiters have a natural curiosity and “instinct” in digging for details in the interview. Whereas, HR interviewers have a low or no ability to dig deeper for meaningful data, beyond 1 or 2 obvious follow up questions.
    I wish some of you guru’s out there would address how to overcome that limitation in interviewers who seem to lack that fundamental trait of being a “detective”.

  8. I really hope for the organization’s welfare that everyone involved with assessment understands a few basics:

    1) The only best-type of assessment is one based on business necessity and job requirements.
    2) No single assessment is perfect. Assessments vary in accuracy, complexity and expense.
    3) Smart organizations use a MTMM design, i.e., they measure more than once, use different assessments, and measure all the critical skills.
    4) Assessments should be used intelligently…put the cheap ones first, the expensive ones last, and require the candidate to complete one hurdle before going to the next.
    5) The best results come when you separate and measure the BIG-3: job-fit, job-skills and job-attitude.
    6) Every assessment should have a proven record of results for your organization and your job (e.g., validation).
    7) Beware of assessments that offer “nice to know” information about the candidate accompanied by long narrative reports that few people bother to read.

  9. @ Dr. Williams: Thank you for clarifying and simplifying.
    I particularly like:
    “Never use any test whose vendor who cannot provide documented proof the test was designed to predict job performance.”
    A question: is there an optimum time range that allows a reasonable balance between ease and accuracy? e.g., you might have someone take a one minute assessment- very easy, but not very accurate. You might have them take an 8 hour assessment which would be very accurate, but very few would be willing to submit to it, except for some specialized positions. Also, is there a similar optimum range for cost?

    @Ken: very thoughtful. Can these extraneous factors be measured or do they simply have to be taken into account as “noise”?

    @Martin: I agree. ISTM that there are relatively few work simulation tests used (at least in the areas I recruit for). Why is this?

    @Brian: Though not in these exact words, it appears that something similar happened already (see below).

    @ Ken: I would appreciate any independent, neutral studies which validate the “big guys” (or anybody else).

    @ Cheryl: Please show us independent, neutral studies which validate your claims.

    @ Patsy: You raised a good point. What can we do if our organization doesn’t have enough trained and experienced “assessors”? It seems that like many other skills, accurate assessment may take considerable training and practice.


  10. Keith – simulations are expensive and complex- no way around it. The ROI has been there for pilots, surgeons, EMT’s, etc. and now for CSR’s, bank tellers, and other roles sharing similar parameters.

    I believe that one trend that can be counted on is greater investment of time and energy in the hiring process by all parties- an 8 hour process may just be the start a few years from now as the value of a job increases (again on both sides).

  11. Thanks, Martin. I’m a bit puzzled- I understand why the ROI would be there for the highly-paid occupations, but what about for the CSRs which can be obtained (virtually) for a fraction of minimum wage?

    Where in the hiring process does the assessment typically takes place? Also, are we talking about using the assessment as a substitute for some other aspect of the hiring process, or as an additional element of it?

    As a disincentive, I think that a lengthy assessment process can actin a similar fashion to a long, convoluted, inefficient hiring process.



  12. Keith the performance difference between poor CSR’s and great ones can make a large impact on customer satisfaction and financial performance. Hiring a CSR may not be expensive, but a bad one can do a lot of damage, while the technology is now available to really simulate the job in detail @ pre-hire.

    In terms of long hiring processes, if the logic is there, it’s a two-edged sword- committment builds on both sides as the process unfolds, but if it’s not understandable or helpful (a risk with much current assessment practice), yes you are moving backwards and paying to do it.

  13. Keith…as an answer to your short-long question…For most jobs, I find a couple of 60-minute interviews, about 60 minutes of testing and a 15-minute simulation gives me all the critical data I need. After that, I see diminishing returns and measurement redundancy. Of course, each job family is a little different…some take a little more time and some a little less.

  14. Thanks again, Dr. Williams. Does the ~3 1/4 hrs show:
    1) This person can do the job and we should hire them- they are competent.
    2) This person can do the job and we should consider hiring them- they are not incompetent. i.e.,
    do they “pull in the good” or do they “weed out the bad”?

    Does the interview occur because of the testing or in addition to it?

    I would like to see assessments which minimize the need for resumes/profiles/summaries as they function as an effective means of showing what someone can do as opposed to what they have done. Is this realistic?



  15. Keith..

    The 3+ hours tell us if the person has the essential skills to do the job and whether he or she is likely to use them. For positions with a lot of power, we add a few special tests to minimize dark-side factors. Interviews (i.e., verbal assessments) are where we poke and probe for pre-qualifications, poke and probe for more and detailed information, verify information from on-line tests, and do a chemistry check between the candidate and the hiring manager. Eliminating resumes/proflies/summaries usually requires developing “smart” application forms. Of course, if you are doing low volume hiring or have to source passive candidates, resumes will probably have to do.

  16. Patsy,
    Whenever I do an assessment training session (interview or exercise) only about 75% of the group are able to get it. For some reason or another, the other 25% either cannot/willnot identify or accurately evaluate critical skills. It’s as if they formed their opinions long ago and, right or wrong,will blindly defend them to the end of time. This is particulary true of people who either think they have learned all there is to know, have a self-image to protect, or cannot take their personal bias out of the hiring equation.

  17. As a 30 year Psychology qualified recruiter -across all levels I feel I have some experiences to share to this debate. However first I have to laugh at Wendell’s attempts to claim the recruiters have a product to sell. Funny I thought Wendell was selling his views and of course please buy a Psychometric test for your interview processes. I would say both parties are closer together than they care to admit. The only difference I have seen over the years is the continual attempt by purveyors of Psychometric testing to knock everyone else’s tests except their own.
    I have used DiSC, Genesys, 16PF etc etc and done even more recruitment without the use of Psychometric tools. End result: Hardly a scrap of difference in successful placement – short or long term.
    However of all the tools only DiSC has given me insight into why a persons really applying for a job or how compatible they are with their future employer. Their personality doesn’t interest me that much – their behaviour, past, present and future does.
    Cheers from Australia!

  18. Responding to Keith Halperin’s first comment:

    I don’t consider the factors I listed as “extraneous”. In fact, they are at the heart of running a business. How much relative impact on individual performance can be explained by each depends on the particular organization and its situation. For example, an organization with a track record of building systems that add value and are very distinctive from their competitors (say, WalMart with its logistics system & infrastructure over much of the past thirty years) may find that most of the 75% can be explained by those investments. Think how WalMart continued growing and operating profitably through in many different locations and countries, and through economic cycles up and down.

    I am not aware of studies that have attempted to quantitatively sort out the different factors in a precise way. My main message was that using assessments in the hiring process can indeed make a very big difference in the quality of hire, but we all need to have realistic expectations when considering how business leaders evaluate assessments and their impact.

  19. Peter,
    Please tell me why, as you state, a 30-year psychology qualified recruiter uses training and general personality tests that were never designed to predict job performance. Using general personality tests as hiring tools was discredited over 30 years ago. If you want to classify me as a person with something to sell, then please consider me an avid purveyor of how to stamp-out unprofessional hiring practices.

  20. Wendell,
    Firstly in my view general personality tests were discredited mostly by those with an agenda to sell another product, so I have little faith in most testing. Secondly it is only because my clients want to use them that I do so. And if I don’t then another recruiter will get the gig! Sad but a fact of life.

    Thirdly, there are many things we use today that are used for things they weren’t designed for- most technology in use today was designed for another purpose so really the only measure is – does it work.

    Lastly, although I consider myself a “man of science” the attempt to bring science into the “art” of recruiting has in my mind been largely unsuccessful. The variables being too many to control (ie: people). Most failures I have experienced over time relate to cultural misfit and lack of motivation long after the recruitment process has ended. eg: The owner who wants a GM to run his business but then can’t let go, or, the introvert with good listening skills who fals to impress in an assessment centre recruitment process for a call centre.

    Statistics show that nearly every recruitment tool has low validity on its own. It is the combination of measures that provide more insight into an individual that better shows success. The common factor for this – the longer you spend with a person/candidate the more you can see the real person and their potential.

    As a consultant working in the business world who also tutors at university I see both environments. Many of the textbooks allocated for use at Uni for a “management principles” course (and also posts on the ERE forums) use examples of businesses held up to be leading lights to study, particularly for recruitment practices: eg: Starbucks: gone in OZ, McDonalds: almost went, Enron, GM etc etc. As the decisions of these companies were made by employees then some questions must be raised as to the success of their recruitment practices. Be interesting to know what Pychometric tools they used!

    The day that recruitment becomes a domain only for scientific process is the day that robots will take over the world.

  21. You claim to speak with knowledge and authority, while at the same time making statements about personality and selection testing that are completely and utterly wrong. I was curious to know if you had any qualified academic training to back them up…or, as I suspect, if they are just personal opinions.

  22. Wendell
    I thought I was clear in expressing that these were my personal opinions gained over 30 years of successful recruiting. Although I am undertkaing further studies it is, by choice not in the area of Psychometric testing.

    However having an opinion based on 30 years observation is not the domain for academics only. So my comments were directed at Wendell Willams who by his own admission in his bio above is a “bottom-line consultant”, a hired gun just like myself, Lou Adler and many of the other writers on ERE. As a consultant making a living you have a barrow to push as do we all. Therefore as much as you would like the readers of this forum to see WW the academic when you write articles you need to be prepared to be seen, rightly or wrongly as WW the Consultant, and we consultants are a cynical lot.
    I have no doubt that by employing your methods you have great success in recruiting winners, but I also enjoy success without the benefit of your or others Psychometric methods so the question remains- why use testing at all?

  23. I am not a psychometrician nor do I play one on TV. However, I have been selling – yes yes I am in sales – validated certification tests and skill assessments (with Kryterion an award-winning test development and delivery company) and validated, EEOC compliant Integrity Tests (American Tescor with Merchants Information Solutions and QuickStaf InsightWorldwide) for 15+ years.

    I have found that virtually every potential HR/Recruiting prospect has significant reservations about the ROI and reliability of assessments. Over and above client referrals, I have always provided third-party documentation on the ROI and the applicable validation studies to prospects. (Note that more often than not it is the ROI report which moves the prospect to a customer.)

    It is important that any vendor of assessments be willing and able to provide benchmarking for the new customer following a statistically valid number of administrations and period of time. Customers who track revenue (in the form of productivity,) turnover, workers’ comp. claims prior to implementing an integrity testing program, for example, will be able to share post-implementation statistics with the vendor to obtain the ROI of the assessment(s) for their specific organization.

    I concur completely with the conceit that an assessment or integrity test should not be used as the sole determining factor of an applicant’s potential performance. However, without question these tools can save HR personnel a lot of hands-on time with an unqualified or high risk applicant and, in the end save employers significant costs in regard to background screening, training, turnover and potential employee theft and violence.

    Please don’t beat me up – as with other posters, this is my personal opinion based upon my experience to date.

  24. Hmm – This issue of accuracy.

    The evidence for “general” personality tests as predictors for important work outcomes is about the same as for any supposed “work-relevant” test – except where the test is targeted very tightly against a very specific work criterion (such as HR Chally do with their tests). I guess this is the point Wendell is making.

    The problem though is that even with such tight coupling between the predictors and criterion, very few of them are that accurate (cross-validated, robust) – to the extent that you could be pretty sure (> 70% predictive accuracy) of the predicted outcome for any individual (partly because the tests rarely show accuracy at these levels, and partly because the statistics relate to group outcomes, not for any specific individual within that group).

    That’s why a whole industry has arisen over 40 years which has specialized in marketing tests as “brand products” rather than concentrating on selling the accuracy of the product to predict important outcomes.

    However, all credible tests do shift the odds slightly in favor of the hirer, and ability/skills tests are the best at this – but still not much good except for the “bleeding obvious” kinds of filtering use (the meta-analyses from Schmidt and Hunter et al are of course overblown and over-corrected – painting a “what if there was no measurement error” picture – which is practiclaly useless for anyone needing to make a hiring decision in the real-world).

    Personally, I think some psychometric tests might actually be doing a far better job than the usual psychometric statistics tell us (because the conventional reliability and validity indices are not optimal, and pearson correlations/effect sizes are simply insufficient to do the job required). However, it’s an empirical issue rather than one for debate.

    I can understand where Peter MacDonald is coming from – the validity data from many test publishers is barely worth a yawn. Correlations around 0.2-0.3 if you are lucky; an assumption of perfect bivariate normality and a random sample from whatever population the investigator has decided to sample, an assumption that the attributes (predictors and sometimes criteria) vary both quantitatively and linearly, and that the validity result published or advertized by a test publisher will actually hold for your specific application. This does not look good when you see just what those correlations depend upon for their interpretation and eventual utility.

    Given the fragility of these, and the necessary subjective interpretation of test scores invited by test publishers as part of their “how to interpret test scores” training courses, is it any surprise that some tests may actually end up being no better than 50/50 successful, as per Peter’s observations?

    I think Wendell (as does Charles Handler and other pundits on ERE) makes some sound points. But, the evidence for the accuracy (and even purported ROI sometimes) of psychometric tests is not as “rosy” as sometimes painted. Many “ROI” investigations are feeble, when you look very closely at what exactly took place. However, to look that closely you need to be a statistician yourself to find those “not quite cricket” kinds of “oversights”!

    I think things are changing – very slowly – as predictive analytics and algorithmic statistics begin to ooze into the market. Plus, the relative success of the tools now used for the prediction of recidivist risk within forensic psychology/psychiatry has provided some food for thought about what tests should accomplish (viz a viz prediction rather than interpretation).

    A simple question is how Peter MacDonald could state his considered, experience-laden, opinion IF tests were that accurate? The day-to-day evidence would negate any such claim – as it would if somone tried to claim that giving Lasik treatment (or correcting lenses) to people with short-sight has very little effect on their vision.

    Clearly, assessment remains a complex and awkward issue – IF you require predictive accuracy as your primary evidence-base for using the test.

    Anyway, I just hate to see people beating themselves and others up over this, when both are correct to an extent in their observations/considered judgments.

  25. Interestng posts

    Jill, I don’t fear being beaten up by aggressive writing and the not-so subtle verbal assertions that without a PhD in Psychometrics one is in-capable of clear observation or valid opinion. In a recruitment environment this response would provide me with insight into how an individual deals with conflict and contrary opinions etc.

    For the record I actually value using Psychometric tools for the additional insights into an individuals make-up I get that I couldn’t gain unless I spent considerable time with them in the work environment.

    However I resist the temptation to treat them as gospel or as being more relevant than many other components of the rcruitment process. In this regard I see most of the posters agree. And as I noted previously, most of my placements have not involved using Psychometric tools with no difference in success rate.

    Having said this, a notable exception has been one global consulting client for whom we have found higher Verbal and Abstract reasoning scores clearly indicate likely success at learning and assimilating a complicated consulting techology during inital training. As all applicants possess at least one degree, raw intellect not qualifications have proven to be a pre-determinate for success with my client.

  26. @Jill: Thank you for your comments and how you expressed them:
    1) You openly discussed you are selling something- no hidden agenda
    2) You discuss validated claims, not anecdotes (Who did the
    3) You expressed opinions, and labeled them as such- not the Commandments coming down from Sinai.
    I suggest these be the new standards for ERE commentary.

    @Paul: Could you simplify what your points are- I’m afraid Idon’t understand what you’re saying

    @Peter: I have read that GMA is “good” predictor of “success”. If so, why isn’t it used more?

  27. @Peter: I think I can answer your question about GMA – “general mental abilities”.

    Not surprisingly, your global consulting client finds that higher scores on reasoning tests indicate success in certain aspects of initial job performance. In fact, GMA is the single strongest predictor of future job performance. Smarter employees have a leg up in many aspects of job performance for most jobs.

    Methinks Halperin is baiting us with his question, “why isn’t it [GMA] used more”. I’m quite sure he knows the answer. Dr. Williams can comment eloquently I suspect.

    Bottom line: tests of GMA show adverse impact in study after study. I.e., various minority groups score lower on average than whites. This is one of the big conundrums in assessment: the “best” predictors create the biggest discrimination problems for users. The federal EEOC requires all practices using assessment for any decisions about people to avoid adverse impact, if there are any practical means of doing so.

    This requirement is one of several reasons that personality inventories and work simulations frequently show up in assessment batteries: they do indeed add additional accuracy to the assessment and importantly, they usually show far less or even no adverse impact.

  28. Ken I can’t recall exactly, but didn’t Schmidt and Hunter find work samples (simulation) to equal GMA ? And IIRC, there was some discussion about GMA having a variable correlation based on job type, while work samples were more steady across domains.

    The issue of GMA v. race is pretty loaded- I was highly impressed by this book, which is both scholarly and approachable on the subject:

  29. @Martin: Well… as in all these discussions, it depends on the specific situation – job and organization. But, tests of GMA are cheap and easy to use while work samples or simulations are expensive to properly develop and expensive to administer. Technology is beginning to help in making work samples and simulations more affordable, but the cost differences remain.

    “The Genius in All of Us” is indeed illuminating. One important conclusion might be that we should all strive to create an organizational climate and leadership approach that encourages everyone to unleash their full potential; leadership and culture are among the major factors affecting the 75% – 90% of job performance that lies beyond choosing which candidates to hire. But, that’s not the problem we all try to solve with pre-hire assessments for screening and selection. We still need to choose which candidates to hire.

    I’m staying out of the legitimately loaded question of GMA and race. My points are very practical: what tools and processes can be used in a practical way to improve recruiting, screening, and selection decisions? Dr. Williams offers good advice in his earlier response.

  30. @Peter: Regarding your earlier comment “why use testing at all?” I think of this as very similar to investment management. There are indeed Warren Buffett’s out there who consistently beat the odds over the course of an entire career. While some of his methods appear transferable, we really don’t know why or how he’s had long term investing success. But, we clearly do know that for every Warren Buffett, Peter Lynch, or Bill Gross, there are literally thousands of financial types who consistently and significantly underperform the market.

    I don’t know you and maybe you’re a Warren Buffett of picking people. But, we can’t design approaches that rely on one-in-a-million skill, but rather we need practical approaches that reliably work for the thousands of recruiters and hiring managers that face the challenge of choosing which candidate to hire. This is where assessments of all types fit in to help out.

  31. There has been at least one meta-analysis follow-up to Schmidt & Hunter’s study suggesting that the value for work samples was inflated (which I was depressed to read*). Regarding GMA, if you look at the data (yes, much of it is old), the utility depends greatly upon the complexity of the job. IMHO looking for the “best” testing method kinda misses the whole point, but it’s important to be aware of the scientific findings.

    The thing that scares me about some of these comments isn’t that people have an agenda–reader beware. What scares me is that there seems to be an ongoing debate about the “real” value of good assessment, which we stuck a fork in like, I dunno, 80 years ago, and the fake competition between personal opinion and research findings. I’m just not getting how we can claim experience trumps research–is this just a result of people feeling threatened?


  32. @Bryan, et al: It appears we have one validated type of testing (GMA) which is often (and may be inherently) discrimainatory. We have another method (Work Simulation) which is expensive, and if the listing above is correct, not so valuable agfter all.

    I think that it is perfectly acceptable to say that your product/service has works and has resulted in large numbers of satisfied customers. (As I have said elsewhere, “anecdotal” can still “work”.) However, if you say your product doesn’t need/is too good for scientific, neutral, peer-reviewed validation, that these standards are too low for your wondrous product/service, or something to the effect that you won’t allow your product/service to be validated, then I think you invalidate your product/services potential value.


  33. Keith Halperin asked:
    @Paul: Could you simplify what your points are- I’m afraid I don’t understand what you’re saying.

    Ok – try these:

    1. The test validity data for both general and many “specific-target-focused” psychometric tests is not very accurate. There are exceptions – but these are because of a very tight-coupling between predictors and criteria (as with Saville Wave).

    2. Because of this fundamental inaccuracy of tests, many users interpret the test score magnitudes subjectively (rather than rely upon cut-scores) – which can introduce a layer of “error” over and above the raw test score predictive accuracy. I say “can”; I do not mean always.

    3. Given #2, Peter MacDonald’s experiential observation about the difficulty perceiving any effect of using psychometrics might be a reasonable and justifiable viewpoint.

    4. The magnitudes of validities in the Schmidt and Hunter/meta-analytic work are lower (for practical purposes) than as given, because most of the coefficients are “in a perfect world” format (i.e. the observed values are corrected for restriction of range, estimated unreliability of both criterion and predictors, and estimated corrections for rater unreliability).

    5. #4 doesn’t mean the validities are useless, merely lower than presented. This is a complex issue (re: corrections to observed parameters) with much evidence and argument pro and con.

    6. Bottom line, it’s the actual demonstrated actuarial predictive accuracy using real data which calls the shots (e.g. the kinds of statement like 70% selected using this test went on to perform as predicted, 30% did not). Not much of this evidence exists in this “obvious to digest” format, and some which reports 80-90% accuracies is nearly always incorrect (capitalization on chance, too many predictors vs cases etc.).

    7. I wish I hadn’t said anything! I don’t do “generalizations” very well; sorry. For background, my website is at: That will better explain why I find it hard to make generalizations.

  34. Thanks Paul.

    Keith Halperin asked:
    @Paul: Could you simplify what your points are- I’m afraid I don’t understand what you’re saying.

    Ok – try these:

    1. The test validity data for both general and many “specific-target-focused” psychometric tests is not very accurate.
    KEITH: I understand this.

    There are exceptions – but these are because of a very tight-coupling between predictors and criteria (as with Saville Wave).
    KEITH: I DON’T understand this.

    2. Because of this fundamental inaccuracy of tests, many users interpret the test score magnitudes subjectively (rather than rely upon cut-scores) – which can introduce a layer of “error” over and above the raw test score predictive accuracy. I say “can”; I do not mean always.
    KEITH: Lots of people misinterpret the test?

    3. Given #2, Peter MacDonald’s experiential observation about the difficulty perceiving any effect of using psychometrics might be a reasonable and justifiable viewpoint.
    KEITH: No, I don’t understand this point.

    4. The magnitudes of validities in the Schmidt and Hunter/meta-analytic work are lower (for practical purposes) than as given, because most of the coefficients are “in a perfect world” format (i.e. the observed values are corrected for restriction of range, estimated unreliability of both criterion and predictors, and estimated corrections for rater unreliability).
    KEITH: In the “real world” things aren’t quite as good as Schmidt and Hunter claim?

    5. #4 doesn’t mean the validities are useless, merely lower than presented. This is a complex issue (re: corrections to observed parameters) with much evidence and argument pro and con.
    KEITH: Don’t throw out the tests, just take them with more of “a grain of salt”?

    6. Bottom line, it’s the actual demonstrated actuarial predictive accuracy using real data which calls the shots (e.g. the kinds of statement like 70% selected using this test went on to perform as predicted, 30% did not). Not much of this evidence exists in this “obvious to digest” format, and some which reports 80-90% accuracies is nearly always incorrect (capitalization on chance, too many predictors vs cases etc.).
    KEITH: No, I don’t understand this point, either.

    7. I wish I hadn’t said anything! I don’t do “generalizations” very well; sorry. For background, my website is at: That will better explain why I find it hard to make generalizations.
    KEITH: Yes, this makes sense now. Must of us (like me) are poor Northern Hemisphere Dwellers, and the blood runs to our feet, thus making us more stupid than you Southern Hemisphere Dwellers where the blood rushes to your head, enriches your brain, and makes you smarter than we are. No?


    Keith “Hopefully Not as Dumb as I Seem” Halperin

  35. Hello Keith

    I was not trying to say anyone was dumb, or that I’m “cleverer” or “above” these “discussions. I know you were probably “tongue-in-cheek” with the Southern hemisphere stuff – but some may still have felt that I was claiming some kind of superiority. I’m not. It’s more to do with the breadth of my view of these matters, from the perspective of measurement theory and the philosophy of science, as well as from the applied perspective of someone who advises upon/creates/constructs assessment instruments and prediction-oriented evidence-bases.

    Probably the best “snapshot” of how I approach matters of test usage is the recent presentation I gave – which is available for download from my website #35 Taxonomies, traits, dispositions, motivations, and personality dynamics: How now to interpret personality test scores?

    None of which might be of much use to practitioners – except insofar as how they mght conceive of a personality “trait”, and interpret test scores as something more fundamental than the summed responses of a few descriptive behaviorally-oriented items.

    Anyway, not much more I can say really without digging an even deeper hole for myself!

    Regards .. Paul

  36. If I can chime in here…in my experience, the most useful tool for practitioners is learning how to do a good job analysis, gaining agreement with the client about good/bad candidate answers, and mastering BEI techniques.

  37. Hello Bryan

    I think Wendell has made the usual sensible recommendations in his article.

    And much will depend upon how you will use a test score; as a cut-score for pre-screen or “screen-out” filtering, or as something to be interpreted contextually along with ancillary information, to arrive at a selection decision.

    The former requires actuarial-style (e.g. ROC analysis or typical 2×2 decision table statistics) evidence of predictive accuracy for a particular cut-score. Validity coefficients (pearson correlations) are useless as “evidence” for cut-score work.

    The latter will only rely upon validity coefficient evidence, because criterion predictive accuracy is secondary to the interpretative task facing the recruiter. That the test is mostly inaccurate as a predictor of job perfomance (or whatever) is acknowledged implicitly by both test publisher and user; so the strategy is that the user will try and improve the accuracy by interpreting the test score/s along with other candidate information.

    For cut-score work, you or the test publisher must have, or must create, an evidence-base which properly justifies whatever cut you choose. The veracity of that evidence-base is your protection against legal challenge, because you are making employment decisions based upon the test scores themselves. You are looking for robust (cross-validated) analyses. One-shot, single-sample analysis is insufficient unless the sample size is huge (many thousands) or can be shown to be truly representative of a constrained-size “population”.

    As to the “but I use test scores as but one of many sources of information to arrive at a decision”, well, as long as the publisher has something which looks like a passable validity coefficient or two to justify the use of the test, that’s sufficient to justify their inclusion. But, because of the degree of subjectivity in making a judgment by combining test scores and many other kinds of information, unless the judgment itself is tested emprically for its validity (accuracy), you have no evidence-base for the validity/accuracy of your selection process, as a protection against legal challenge.

    This is how forensic risk assessment has improved from initially relying almost solely on “expert” clinical judgment, through to actuarial probabilities of risk, then attempting “Structured Professional Judgment” (modifying actuarial risk probabilities with a kind of structured clinical judgment). The latter has the problem that the judgment is partly dependent upon the unique skill/lack of skill of each practitioner to adjust the almost objective actuarial test results.

    Knowing what I do about psychometric tests and the measurement assumptions upon which many rely, I have a healthy suspicion about any evidence-base put forward for them by test publishers/academics. Some evidence-bases really are “as good as can possibly be constructed”, some not so good. Recognizing which is which is not easy.

    But, you can recognize the “good” publishers as the ones who take the time to explain what looks like complex gobbledook in a simple and straightforward way. The dodgy ones either browbeat you with complex technical stuff or make sweeping statements for which the actal evidence remains “incomplete”, or reported without any crucial details which would enable more robust critical evaluation.

    Frankly, it’s not easy these days as the test market has expanded exponentially; Wendell and other “gurus” who write these ERE articles are probably doing the best they can to broadly advise without actually becoming a reader’s paid personal on-hand expert!

    Anyway, just my limited attempt at some recommendations.

    Regards .. Paul

  38. Thoughts on integrity testing to screen out those likely to steal, engage in violent behavior, or otherwise unproductive behaviors?
    Do we use overt questions, or are we better served by personality style inventories that capture conscientiousness, social desirability, etc?
    Opinions on where integrity testing fit into the process are of interest! thanks`

Leave a Comment

Your email address will not be published. Required fields are marked *