In an effort to streamline administration and heighten accessibility of 360-degree feedback for Canadian public sector clients, the Personnel Psychology Centre (PPC) has initiated a pilot project testing a web-based version of Full Circle Appraisal (FCA) - the PPC's principal 360-degree feedback tool. A central feature of this project will be evaluating the measurement equivalence of web-based and pencil-and-paper versions of FCA. Potential deviations from present paper-and-pencil data may reflect added error variance owing to the web medium; therefore, inspection of measurement equivalence requires close conceptual as well as empirical scrutiny.
The balance of research examining the equivalence of computer vs. paper-and-pencil psychological measures has centered on cognitive ability tests. In a meta-analytic review of the literature investigating the measurement equivalence between computer and paper-and-pencil cognitive ability tests, Mead and Drasgow (1993) reported cross-mode correlations of .91 for timed power tests and .72 for speeded tests, suggesting speededness moderates measurement equivalence in ability tests.
The rather sparse research addressing non-cognitive measurement equivalence has yielded less definitive results. King and Miles (1995) failed to detect appreciable differences attributable to administration mode (computer vs. paper-and-pencil) for any of four work-related measures evaluated. More recently, however, evidence appears to be mounting that both social desirability and state affect moderate ratings across media. More specifically, relative to paper-and-pencil formats, computer-mediated assessment appears to decrease socially desirable responding (e.g., Richman, Kiesler, Wiesband, & Drasgow, 1999), and increase negative affect (e.g., George, Lankford, & Wilson, 1992; Schulenberg & Yutrzenka, 1999). The increase in negative affect owing to computer-mediated assessment is largely transmitted via computer anxiety/aversion (George, Lankford, & Wilson, 1992; Schulenberg & Yutrzenka, 1999).
Trait affect (often operationalized as personality constructs such as agreeableness and emotional stability) and state affect (reflecting general mood, or more targeted emotion - e.g., "liking") have each been linked to specific components of ratee performance, including organizational citizenship behaviours and contextual performance (e.g., Findley, Giles, Mossholder, 2000; Organ & Ryan, 1995; Van Scotter & Motowidlo, 1996); however, the relationship between rater affect and rater judgments of overall performance has not been systematically explored. Some authors (e.g., Conway, 1998; Robbins & DeNisi, 1994; Varma, DeNisi, & Peters, 1996) have argued that past ratee performance may fuel rater affect, in turn, influencing overall performance judgments. Given that contextual performance shapes the "organizational, social, and psychological context" surrounding task activities and processes (Borman & Motowidlo, 1993, p. 71), relying heavily on the interpersonal domain, and has been found to account for nearly 50% of the variance in overall performance ratings above and beyond task performance (Motowidlo & Van Scotter, 1994), it appears that rater state affect may, in large part, be determined by observed ratee contextual performance. Consistent with this proposition, work by Allen and Rush (1998), Ferris, Judge, and Rowland (1994), and various Leader-Member Exchange Theory researchers (e.g., Heneman, Greenberger, & Anonyuo, 1989), suggests rater affect should mediate the influence of ratee contextual performance on rater overall performance judgments. Following from this literature, Figure 1 depicts a general performance appraisal process model outlining the roles of ratee and rater affect, ratee contextual performance, and other key variables in propelling judgments of overall performance. Supplementing this model, Figure 2 delineates the manner in which state negative affect induced through a web-based assessment medium may mediate overall performance ratings. In the latter model it is projected that certain individuals will experience computer anxiety/aversion, in turn, precipitating negative state affect. Due to a mood-congruency effect (e.g., Sinclair, 1988), raters displaying negative affect will accord greater weight to unfavourable information, thereby resulting in lower overall performance ratings, particularly with respect to negative behaviours.
The purpose of the present paper is to preface psychometric analyses pertaining to the measurement equivalence of web vs. paper-and-pencil versions of FCA with an examination of the magnitude and mechanism by which rater affect influences FCA ratings.
Based on these hypotheses, if rater affect influences overall performance judgments, and a significant portion of this relationship is attributable to ratee contextual and task performance, then rater affect may not reflect rater bias, but true score variance. Conversely, should rater affect not mediate the relationship between contextual and task performance and overall performance judgments in this manner, one may expect negative state affect resulting from future web administration to introduce substantive rating error to the assessment process. Inasmuch as negative wording of questions may prompt a mood-congruency effect among raters experiencing negative affect, the moderating effects of item wording is of particular interest in the analyses presented.
Data were collected from federal public service departments administering a FCA process between March 1999 and April 2001. The items in the FCA were behavioural statements designed to assess the 14 competencies from the Profile of Public Service Leaders at the Middle Manager level. These competencies are: Cognitive Capacity, Creativity, Visioning, Action Management, Organizational Awareness, Teamwork, Partnering, Interpersonal Relationships, Communications, Stamina and Stress Resistance, Ethics and Values, Personality, Behavioural Flexibility and Self-Confidence. Supervisory ratings of 105 public service Middle Managers from these FCA's were the focal objects of analysis. Pursuant to the foregoing hypotheses, supervisory ratings of affect toward ratee, contextual performance, task performance, and overall performance were each subjected to statistical analysis. In accordance with Borman and Motowidlo's (1993) distinction between contextual and task performance, two of the FCA competencies - Teamwork and Action Management - were identified as content valid measures of contextual and task performance for the Middle Manager position. The rater affect measure was formulated based on FCA items from Interpersonal Relationships and Ethics and Values, which had distinctive affective overtones. The overall performance construct was assessed as a composite measure of all items from 10 competencies in the questionnaire (excluding those comprising contextual performance, task performance, and rater affect). Sample negative items and internal consistency estimates for the each of the constructs are listed below. Alphas are reported for positive and negative items combined, positive items only, and negative items only:
"Does not foster trust: lacks respect for others' principles" alpha = .85; .75; .81 (4 items overall)
"Makes little effort at finding joint solutions" alpha = .86; .77; .83 (6 items overall)
"Plans work without attention to quality or timelines" alpha = .84; .83; .81 (6 items overall)
"Applies a narrow understanding of the organizational vision" alpha = .98; .97; .98 (60 items overall)
SPSS was used for all statistical analyses. To address Hypothesis 1, two hierarchical regressions on overall performance were carried out. For the first regression, contextual performance was entered first, followed by task performance in the second block. The second regression entered task performance first followed by contextual performance. R2-change indices revealed that task performance accounts for 12.2% (p < .001) of variance in overall performance ratings above and beyond contextual performance. Moreover, contextual performance captures significant incremental variance (R2-change = 10.0%, p < .001) in overall performance ratings beyond task performance. Consistent with Hypothesis 1, contextual and task performance represent distinct components of job performance. In view of this outcome, the role of rater affect on overall performance judgments should be assessed from optics of both contextual and task performance. On a broader scale, future research assessing the measurement equivalence of FCA paper-and-pencil and web-based media should examine differential relationships owing to these two conceptualizations of job performance.
To more systematically isolate the effects of contextual and task performance on overall performance judgments and gauge the mediating role of rater affect (Hypotheses 2, 3, and 4), six regressions were conducted to test the two path models. These two affiliated models probed the direct effects of task and contextual performance on overall performance judgments as well as their indirect tracings through rater affect (see Figure 3). Positively-worded items were evaluated in the first model; negatively-worded items in the second model.
With respect to positive items, the beta for the direct effect of contextual performance on rater overall performance judgments is .40; the total indirect effect through rater affect is .21. The direct effect for task performance is .55, compared to .18 for the indirect effect. This data lends credence to Hypothesis 2: rater affect partly, (albeit not strongly) mediates the relationship between both contextual and task performance and overall performance judgments. Furthermore, consistent with Hypothesis 3, rater affect is a somewhat stronger mediator of contextual than task performance. Taken together, results for positive behaviours lend moderate support to Hypotheses 2 and 3.
Turning to negative behaviours, the direct effect of contextual performance on overall performance judgments is .38 and the total indirect effect through rater affect .35. The direct effect of task performance is .54 and the indirect effect .32. As predicted in Hypothesis 4, these data indicate that rater affect is a more potent mediator of overall performance ratings for negative relative to positive FCA items. In addition, results for negative behaviours reinforce Hypotheses 2 and 3; both of these hypotheses are more definitively supported with respect to negative behaviours relative to positive behaviours. In concert, these findings signal that rater affect will play a more prominent role in driving overall performance ratings for negative behaviours. Moreover, rater affect and overall performance judgments are fuelled to a greater extent by observed ratee contextual performance than observed ratee task performance.
This study offers three primary pragmatic implications in regards to future testing of FCA measurement equivalence. First, insofar as rater overall performance judgments are driven by rater affect, particularly for negatively-worded items, the impact of rater affect on the evaluation of negatively-phrased behaviours should be judiciously monitored in the web-mediated version of FCA. If rater affect is significantly related to overall performance judgments but does not encompass true score variance (as evidenced in correlations with observed ratee contextual performance and task performance) then the option of dropping negative items from the questionnaire should be entertained. Given this eventuality (i.e., rater affect not being correlated with observed ratee contextual and task performance), in conjunction with evidence of non-equivalence between media, rater affect may be categorized as a rater bias stemming from a negative rater mood-congruency effect attributable to computer anxiety/aversion. Second, to more clearly tease out the foregoing mood-congruency interpretation, the web-based FCA pilot should use a more direct measure of rater affect in which rater "mood" is distinguished from rater affect toward the ratee. Third, to safeguard against same-source bias, peer evaluations of ratee contextual performance and ratee task performance should be employed to obtain a more precise estimate of the relationship between these two performance components and rater affect.
In closing, this study furnishes conceptual and empirical grounds for examining the measurement equivalence of paper-and-pencil and web-based versions of FCA. Results indicate that rater affect and its reflection of valid true score variance (observed contextual and task performance) must be duly considered and incorporated in psychometric analyses of web-based FCA. Inasmuch as rater bias may be best traced through negative behaviours underlying contextual performance and rater affect, a foremost consideration will be the influence of contextual performance on rater affect, particularly with respect to negatively-phrased FCA items.
Allen, T.D. & Rush, M.C. (1998). The effects of organizational citizenship behavior on performance judgments: a field study and a laboratory experiment. Journal of Applied Psychology, 83(2), 247-260.
Borman, W.C. & Motowidlo, S.J. (1993). Expanding the criterion domain to include elements of contextual performance. In N. Schmitt & W.C. Borman (Eds.), Personnel Selection in Organizations (pp. 71-98). San Francisco, CA: Jossey-Bass.
Conway, J.M. (1998). Understanding method variance in multitrait-multirater performance appraisal matrices: examples using general impressions and interpersonal affect as measured method factors. Human Performance, 11(1), 29-55.
Ferris, G.R., Judge, T.A., & Rowland, K.M. (1994). Subordinate influence and the performance evaluation process: test of a model. Organizational Behavior and Human Decision Processes, 58, 101-135.
Findley, H.R., Giles, W.F., & Mossholder, K.W. (2000). Performance appraisal process and system facets: relationships with contextual performance. Journal of Applied Psychology, 85(4), 634-640.
George, C.E., Lankford, J.S., & Wilson, S.E. (1992). The effects of computerized versus paper-and-pencil administration on measures of negative affect. Computers in Human Behavior, 8, 203-209.
Heneman, R.L., Greenberger, D.B., & Anonyuo, C. (1989). Attributions and exchanges: the effects of interpersonal factors on the diagnosis of employee performance. Academy of Management, 32(2), 466-476.
King, W.C. & Miles, E.W. (1995). A quasi-experimental assessment of the effect of computerizing non-cognitive paper-and-pencil measurements: a test of measurement equivalence. Journal of Applied Psychology, 80(6), 643-651.
Mead, A.D. & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: a meta-analysis. Psychological Bulletin, 114(3), 449-458.
Motowidlo, S.J. & Van Scotter, J.R. (1994). Evidence that task performance should be distinguished from contextual performance. Journal of Applied Psychology, 79, 475-480.
Organ, D.W. & Ryan, K. (1995). A meta-analytic review of attitudinal and dispositional predictors of organizational citizenship behavior. Personnel Psychology, 48, 775-802.
Richman, W.L., Kiesler, S., Wiesband, S., & Drasgow, F. (1999). A meta-analytic study of social desirability distortion in computer-administered questionnaires, traditional questionnaires, and interviews. Journal of Applied Psychology, 84(5), 754-775.
Robbins, T.L. & DeNisi, A.S. (1994). A closer look at interpersonal affect as a distinct influence on cognitive processing in performance evaluations. Journal of Applied Psychology, 79(3), 341-353.
Schulenberg, S.E. & Yutrzenka, B.A. (1999). The equivalence of computerized and paper-and-pencil psychological instruments: implications for measures of negative affect. Behavior Research Methods, Instruments, and Computers, 31(2), 315-321.
Sinclair, R.C. (1988). Mood, categorization breadth, and performance appraisal: the effects of order of information acquisition and affective state on halo, accuracy, information retrieval, and evaluations. Organizational Behavior and Human Decision Processes, 42, 22-46.
Van Scotter, J.R. & Motowidlo, S.J. (1996). Interpersonal facilitation and job dedication as separate facets of contextual performance. Journal of Applied Psychology, 81, 525-531.
Varma, A., DeNisi, A.S., & Peters, L.H. (1996). Interpersonal affect and performance appraisal: a field study. Personnel Psychology, 49, 341-360.