Old website
Homepage LogiPass articles

Validity of selection methods in HR

Article Summary: The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings/Schmidt & Hunter
The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings
Schmidt & Hunter, Psychological Bulletin (1998)
 
Abstract
The article summarizes the practical and theoretical conclusions of 85 years of research in employee selection. Based on meta-analytical findings, the article presents the predictive validity of 19 psychometric and other selection methods for candidate performance in work and training, as well as the combined validity of GMA (General Mental Ability) with each of the other 18 methods.
The three combinations with the highest validity, and therefore with the greatest value for predicting job performance, are:
(1) GMA with work sample tests (average validity of 0.63)
(2) GMA with integrity test (average validity of 0.65)
(3) GMA with structured interview (average validity of 0.63)
An additional advantage of the last two combinations is that they can be used for selection of both experienced workers and workers without experience.
The article discusses the significance of these findings, both from a practical perspective and for developing theories for predicting job performance.
 
Introduction
The most important thing in workforce selection for jobs is: predictive validity.
It has been found that using methods with good predictive validity leads to an increase in employee performance and an increase in the level of learning job-related skills.
The well-known conclusion from 85 years of research in the field is: when there is no previous experience in a similar type of work, the best predictor of success is cognitive abilities GMA (General Mental Ability) typically tested as part of a psychometric test. However, there are additional methods with good predictive validity.
The current article reviews 19 methods for employee selection. It presents findings collected over 85 years on the effectiveness of these methods. It also reviews the effectiveness of various combinations of these methods.
  1.   Factors determining the practical value (utility) of the tool
  2. Predictive validity: As mentioned, it has been found that using methods with good predictive validity leads to an increase in employee performance and an increase in the level of learning job-related skills.
  3. Variance in job performance: If performance variance was zero, it would mean that all employees hired for the job would have the same level of performance. In such a case, the utility of the selection tool would be zero: it wouldn't matter who is hired because performance would be identical.
  •   Factors determining the practical value (utility) of the tool
  • Predictive validity: As mentioned, it has been found that using methods with good predictive validity leads to an increase in employee performance and an increase in the level of learning job-related skills.
  • Variance in job performance: If performance variance was zero, it would mean that all employees hired for the job would have the same level of performance. In such a case, the utility of the selection tool would be zero: it wouldn't matter who is hired because performance would be identical.
  • If the performance variance is very large, the selection process is of great importance and there is a need to select the best candidates. In such a case, the selection tool needs to be efficient in terms of sorting. This case (and not the first one) typically characterizes reality.
    Candidate pool variance- The variance of the candidate pool is the variance that determines the practical value (utility) of the tool we choose for selection. This is because we select employees from the candidate pool and not from a pool of employees who are already in the position.
    Job performance variance can be measured in two methods:
    a.   Dollar value- It has been found that the standard deviation of the dollar value is at least 40% of the average salary.  That is, if the average salary in a certain position is $40,000, then the dollar standard deviation is $16,000. Using this scale, it's possible to measure where employees are positioned on a normal curve in their salary. This serves as an index for performance variance.
    b.   An additional index for performance variance. The standard deviation is calculated in this index as follows: a specific employee's output is divided by the output of employees at the 50th percentile and then the ratio is multiplied by 100. Research has found that for each level of professions there is a different standard deviation: for unskilled or semi-skilled jobs, the SD is 19%. For jobs that require skill - 32%. For managerial and professional occupations - 48%.

    1. Selection ratio-The proportion of candidates who are hired for the job. The selection ratio typically ranges between 0.30-0.70.
    The article presents 3 formulas for calculating the practical value of the tool for each employee, for one year of work, based on the three parameters presented above - validity, variance in performance, selection ratio.
     
    Validity of selection tools: Review and summary of findings from 85 years of research.

    Validity studies of psychotechnical selection tools began as early as the 1920s. Different studies on the same tool yielded different findings regarding its validity, even for the same position. As a result, the belief developed that the different findings reflect a true state of affairs, and that the different results are the product of a different setting (even if it is an identical tool for an identical role). This theory was called "the theory of situational specificity". This theory dominated until the 1970s, when it was discovered that the different results obtained were due to statistical errors and especially the use of small samples. In order to correct the research results, meta-analyses were conducted that combined the different samples together. These studies completely refuted the theory of situational specificity and showed that there is no variance in the validity of a specific tool for a specific role across different situations, and moreover - there is almost no variance in the validity of a specific tool for different roles.
     
    Predictive validity for success at work:
    The article summarizes in a table the validities found for 19 psychotechnical selection tools in meta-analyses covering 85 years of research.

    Tool Name Name in English Predictive Validity
    Aptitude tests GMA tests 0.51
    Work simulations Work Sample Tests 0.54
    Integrity tests Integrity Tests 0.41
    Conscientiousness tests Conscientiousness Tests 0.31
    Structured interviews 0.51
    Unstructured interviews 0.38
    Professional tests Job Knowledge Tests 0.48
    Experiential procedures Job Tryout Procedures 0.44
    Peer ratings Peer Ratings 0.49
    T & E Behavioural Consistency Method 0.45
    Recommendations 0.26
    Years of experience in profession 0.18
    Biographical material 0.35
    Assessment centers 0.37
    T & E Point Method 0.11
    Years of education 0.10
    Occupational tendencies 0.10
    Graphology 0.02
    Age -0.01


    GMA has been found to be an optimal tool for selecting employees for a variety of positions. Its predictive validity is high, its cost is relatively low compared to other tools, and it is also suitable for candidates who lack knowledge or experience in the position. The theoretical basis regarding intelligence is the broadest and most established, and therefore when measuring intelligence as part of the psychotechnical test, it is clearer what is being measured than when conducting an interview or relying on integrity tests. A work sample test is slightly more valid, but is not suitable for candidates without knowledge and experience, and its cost is high (it must be adapted to each position uniquely). A structured interview was also found to be a tool with high predictive validity, but sometimes such an interview also examines professional knowledge and therefore is not suitable for those lacking knowledge and experience. In addition, its cost is high relative to GMA tests.
     
    Predictive validity for success in training:
    GMA was also found to be the most effective predictor of learning on the job. (Acquiring information in the position and learning from training programs).

    Below is an additional table, presenting the validity of various tools in predicting performance in training programs, that is, predicting job-related learning ability:

    Tool Name Validity
    GMA tests 0.56
    Integrity tests 0.38
    Interviews (structured and unstructured) 0.30
    Peer ratings 0.36
    Recommendations 0.23
    Years of experience in profession 0.01
    Biographical material 0.30
    Years of education 0.20
    Occupational tendencies 0.18
     
    The added validity for predicting success at work:
    Thanks to the special status of GMA tests, they can be considered as the central selection tool. The other 18 tools are complementary tools to GMA.
    When combining several psychotechnical selection tools, the following question should be asked: to what extent will each additional selection tool (besides GMA) improve the predictive validity beyond the prediction of 0.51 that GMA tests provide? The degree of improvement in predictive validity and selection efficiency obtained when adding tools to GMA depends not only on the validity of the additional tools but also on the correlation between GMA and the additional tools. The smaller the correlation between the two, the greater the overall predictive validity of the selection.

    Tool Name Added Value[1]
    Work Simulations 0.12
    Integrity Tests 0.14
    Conscientiousness Tests 0.09
    Structured Interviews 0.12
    Unstructured Interviews 0.04
    Professional Tests 0.07
    Trial Procedures 0.07
    Peer Ratings 0.07
    T & E Behavioural Consistency Method 0.07
    Recommendations 0.06
    Years of Experience in Profession 0.03
    Biographical Material 0.01
    Assessment Centers 0.02
    T & E Point Method 0.01
    Years of Education 0.01
    Occupational Tendencies 0.01
    Graphology 0.00
    Age 0.00

    Work Simulation Tests
    These are active tasks that simulate the tasks the candidate will be required to perform in the position. When combining these simulations with GMA, it is found that the simulations add 0.12 to the predictive validity. This means the predictive validity of the two tools together is 0.63.

    Integrity Tests
    These are used to reduce behaviors such as drinking, drug use, theft, and embezzlement in the workplace. The tests predict these behaviors but are also a predictor with high validity for general performance in the position. Although their validity is lower than that of work simulations, their contribution to improving validity when added to GMA is higher than that of simulations: 0.14. The reason for this: the correlation between integrity tests and GMA tests is zero. Integrity tests have the highest addition to validity when combined with GMA.

    Interviews
    Structured interview - There is a uniform format of questions developed after job analysis. Their validity is much greater, but their cost is also more expensive (due to formulating questions from job analysis). The improvement in validity of structured interviews (when added to GMA) is 0.12. Total validity of the two together: 0.63. This combination has high validity and reasonable cost and is therefore recommended in selection.
    Unstructured interview - No uniform format, no uniform scoring process. A general assessment is given based on the interviewer's impression.

    Professional Knowledge Tests (job knowledge tests)
    These are relevant only for candidates with knowledge in the required field. The organization itself usually builds tests that examine the abilities for the position. This is a more expensive and lengthy process than building a structured interview. Their added validity is 0.07.

    Job Tryouts
    In this method, candidates are accepted for a trial period of 6 to 8 months, after which supervisors evaluate the candidates' performance relative to a predetermined performance level.
    The method has several serious problems:
    1. Employment for an extended period is very expensive. In addition, poor workers cause significant financial losses during this time.
    2. Supervisors form personal relationships with candidates during the trial period, which biases their reports.
    3. The effort is unnecessary, as a poor worker tends to be eliminated from the job during such a long employment period anyway.
    There is no information on the validity of the method in predicting training performance.

    Peer Ratings
    In this method, peers rate the potential or performance of the employee. A prominent limitation of this method is that it can only be applied to candidates who have access to their work colleagues, meaning candidates from within the organization. Seemingly, the method has two additional disadvantages - peer judgment can be influenced by friendship with the candidate and their popularity, or by secret agreements ("you scratch my back, I'll scratch yours"). However, these concerns are not supported by research.

    T & E Behavioural Consistency Method
    This method relies on an established psychological principle that past performance is the best predictor of future performance. As a first step in building tests in this method, it is necessary to define which dimensions of achievement distinguish those with excellent performance in the position from those with low performance (for example, the ability to get people to perform a task). This information is obtained from experienced supervisors in a very structured way. As a second step, scoring scales are prepared to rate specific achievements in the defined dimensions. In performing tests in this method, candidates are asked to describe their past achievements in the defined dimensions, and the examiner gives these achievements scores according to the scales prepared in advance. The candidates' achievements can come from work in similar positions, but also from activity in the community, university, and so on.
    The cost of building tests in this method in time and money is almost equal to the cost of building professional knowledge tests, and checking the test results is not simple either. Therefore, the investment may be worthwhile mainly when looking for candidates for a high-level position. There is no information on the validity of the method in predicting training performance.

    Reference Checks
    In this method, previous employers of the candidate are contacted to obtain information about their work performance, integrity, and so on. The relevance of the findings of existing research on the subject is questionable. This is because the willingness of U.S. employers to provide details about former employees has been in a process of change in recent decades. In the 70s and 80s, several employers were harmed by lawsuits from former employees, and therefore ceased to provide information about former employees. However, in the 90s, laws were enacted in many U.S. states to protect employers in this area, and employers may return to providing information. If this happens, the validity of this method may increase.

    Work Experience
    In this method, the number of years of experience in identical or similar positions is measured, without taking into account the employee's performance in previous positions. Research has shown that the number of years is a successful predictor only for experience of up to 5 years. It appears that most professional knowledge is accumulated during the first 5 years. This also implies that the prediction horizon is at most 5 years. This is compared to GMA, which predicts without time limitation. Regarding the prediction of training performance - previous experience does not help, but it also does not harm these performances.

    Biographical Data Measures
    In this method, the candidate is asked to answer questions about their life experience in general, not necessarily at work - for example, their involvement in sports. To develop such tests, questions are selected that are supposed to measure skills relevant to the position. It is difficult to build such tests well, since the assessment of skills here is not direct, and one must be careful of hidden assumptions. However, it is relatively easy to use the method. This method has some value, but due to its high correlation with GMA, it does not have high added value.

    Assessment Centers
    In this method, candidates spend a period of one day to several days at a specific site. During this time, they conduct group discussions, participate in business games, take personality and psychometric skills tests, and usually undergo structured in-depth interviews. On average, an assessment center includes 7 tasks and lasts two days. Similar to biographical data measures, the results here are good, but due to the high correlation with GMA, they do not have high added value. Also, there is no information on the use of assessment centers as a predictor of performance in training.
    However, organizations claim that they derive various insights about the candidate from assessment centers. In addition, when it comes to management levels, assessment centers predict to some extent the employee's advancement in hierarchy and salary level.

    T&E Point Method for Evaluating Previous Experience and Training
    This is a collection of methods, known by different names, but they have a common principle - a point is given for each year of relevant experience, as well as for each year of education of the candidate. There is no assessment of the candidate's past performance. This method is very simple to use, and it is very common in US government institutions. The validity of the method is low, and there is no benefit in adding it to GMA tests.

    Number of Years of Education
    Simple counting of years of study. The validity of the method is low, and there is no benefit in adding it to GMA tests. The method has some value in predicting the candidate's performance in training (correlation 0.20).

    Occupational Tendencies
    Supposedly, employee performance may improve when they are employed in a job that matches their interests. In practice, this is not a good predictor. It seems that tendencies mainly affect the stage of choosing a profession, but are not significant later on. The method has some value in predicting the candidate's performance in training (correlation 0.18).

    Graphology
    This method is not common in the USA, but it is widespread in other countries, such as France and Israel. Research has found that it has no validity at all, and graphologists are unable to help predict the candidate's job performance. This finding contradicts intuition, which tends to link a person's character with the characteristics of their handwriting. But research clearly shows that there is no such connection. Therefore, it should be assumed that graphology will not contribute anything to predicting employee performance in training either.

    Age
    The candidate's age also does not predict their performance in a role at all. Although there is no information on the validity of the age variable in predicting employee performance in training, it probably will not have real value there either. This is because there is a correlation between age and number of years of experience, and as we recall, there is no correlation between number of years of experience and employee performance in training.

    Limitations of the Research Presented
    In the field of research, there are several topics that the research described in the article does not address, mainly due to lack of data in the research literature:
      • The validity of a combination of tests that does not include GMA tests.
      • The validity of a combination of tests that includes, in addition to GMA, at least two more tests.
      • The validity of different test combinations for subgroups in the population. It should be noted that in general, the predictive validity of each test by itself emerges from the research as fair, in the sense that the average prediction of each subgroup corresponds to its actual average score.
     
     Summary and Implications
    Two combined methods stand out as having high validity, but also convenient in execution:

    • GMA with integrity test - validity of 0.65 for job performance, and 0.67 for performance in receiving training.
    • GMA with structured interview - validity of 0.63 for job performance, and 0.59 for performance in receiving training.
    Also, it was discovered that there is a direct relationship between the validity of the selection method in job acceptance and the monetary value that the organization derives from the employee's work. Therefore, the use of psychometric tests according to the above details is recommended for most cases and may give the organization a real competitive advantage.
     
     
    [1] Incremental validity: The extent to which the tool improves predictive validity beyond GMA tests.
     
     
     

    We would be happy to hear your opinion!

    Interested in reading more?

    LogiPass has dozens of additional interesting articles dealing
    with issues of screening tests and career guidance.

    Additional articles