[04/18/2024] AI for Assessment

A Conceptual Framework for Investigating and Mitigating Machine-Learning Measurement Bias (MLMB) in Psychological Assessment

Summary

this paper discusses the growing concerns about bias and unfairness in the use of artificial intelligence (AI) and machine learning (ML) for psychological assessment. It introduces the concept of machine-learning measurement bias (MLMB) and provides a conceptual framework for investigating and mitigating MLMB from a psychometric perspective.

  • Concerns over bias and unfairness in AI and ML applications

  • Introduction of machine-learning measurement bias (MLMB)

  • Definition of MLMB as differential functioning of trained ML models between subgroups

  • Manifestation of MLMB in differential predicted score levels and predictive accuracies across subgroups

  • Sources of bias in ML models: data bias and algorithm-training bias

  • Importance of addressing measurement bias in ML assessments to avoid disparities and discrimination

  • Lack of methodological guidelines for defining and investigating ML bias

  • Focus on bias in ML measurements used to infer individuals' psychological attributes

  • Proposal of a conceptual framework for investigating and mitigating MLMB

  • Emphasis on the need for new statistical and algorithmic procedures to address bias

Takeaway

How is machine-learning measurement bias defined in the paper?

  • Machine-learning measurement bias (MLMB) is defined in the paper as the differential functioning of the trained ML model between subgroups, where one empirical manifestation is when a trained ML model produces different predicted score levels for individuals belonging to different subgroups despite them having the same ground-truth level for the underlying construct of interest. Another empirical manifestation is that the ML model yields differential predictive accuracies across the subgroups.

Fig. 1. Simplified process of machine-learning modeling.
Fig. 2. Measurement bias (MB) and machine-learning measurement bias (MLMB). MB and MLMB Case 1 represents a noncompensatory bias that creates different predicted subgroup distributions despite the same underlying subgroup distributions. MB and MLMB Case 2 represents a compensatory bias that creates equivalent predicted subgroup distributions even though there is measurement bias.
Fig. 4. Expanding the Brunswik lens model to identify the sources of machine-learning measurement bias: an illustration using personality as the focal construct. Areas highlighted in blue represent possible sources of machine-learning measurement bias; “platform-based personality”: the personality construct measured by input data (e.g., online personality assessed by social media data) used in machine-learning models to predict self-report personality.

Psychological Measurement in the Information Age: Machine-Learned Computational Models

Summary

This paper discusses how psychological science can benefit from and contribute to emerging approaches in computing and information sciences, particularly focusing on machine-learned computational models (MLCMs). The authors highlight the potential of MLCMs to transform psychological measurement by combining the prowess of computers with human inferencing abilities, enabling the analysis of unstructured data sets in real-time and improving objectivity. They explain the process of developing MLCMs through supervised machine learning techniques, contrasting them with traditional computational models. The document emphasizes the importance of considering context and intended use when interpreting MLCM performance, as well as addressing concerns related to fairness, bias, interpretability, and responsible use.

  • Psychological science can benefit from emerging approaches in computing and information sciences

  • Machine-learned computational models (MLCMs) can transform psychological measurement

  • MLCMs combine computer capabilities with human inferencing abilities

  • MLCMs enable analysis of unstructured data sets in real-time and improve objectivity

  • Development of MLCMs involves supervised machine learning techniques

  • MLCMs are contrasted with traditional computational models

  • Importance of considering context and intended use when interpreting MLCM performance

  • Addressing concerns related to fairness, bias, interpretability, and responsible use in MLCMs adoption

  • Advocacy for the adoption of MLCMs in psychological science for enhanced measurement practices and research advancements

Fig. 2. The four main approaches to computational modeling. The approaches differ in whether features, parameters, and structure are prespecified. Handcrafted and traditional psychological models are more explanatory, whereas standard machine-learning and deep-neural-learning models are more predictive. Note that the approaches are not mutually exclusive and can be combined in multiple ways; explanation and prediction goals are combined in blended approaches. In the case of deep-neural-learning models, raw data can be input directly; for the other model types, features are first prespecified and then computed from raw data and used as inputs for learning. Parameters are prespecified for handcrafted models only, and model structure is prespecified for both types of traditional models. Annotations are labels provided by humans to guide the machine-learning process by providing a supervisory signal. They are needed in the model-training phase for supervised machine learning. Technically, a response variable (not shown) is needed for traditional models, but such variables are not considered to be annotations. Dotted, solid black, and solid red lines indicate requirements for minimal, typical, and substantial human knowledge engineering, respectively.
Fig. 3. The basic pipeline for training standard machine-learned computational models (MLCMs). The arrows denote the flow of information processing; red arrows denote steps that are involved in the training process only and are skipped once MLCMs have been trained.
Fig. 4. Standard versus deep-learning approaches for training a machine-learned computational model to classify different types of spoken discourse from audio. In the standard approach, n-grams derived from training data are used to produce a random forest model. In the deep-learning approach, contextual semantics are learned from large corpora in a pretraining phase, and the deep neural network is then fine-tuned for the training data.
Fig. 5. Selected example cases of machine-learned computational models in four domains of psychological assessment, aligned with respect to Newell’s (1990) four bands of action for the input modality and psychological construct assessed. See Table 1 for additional details about the examples.

How Well Can an AI Chatbot Infer Personality? Examining Psychometric Properties of Machine-inferred Personality Scores

Summary

This paper explores the feasibility of measuring personality indirectly through an Artificial Intelligence (AI) chatbot. The study examines the psychometric properties of machine-inferred personality scores, including reliability, factorial validity, convergent and discriminant validity, and criterion-related validity. The research involved undergraduate students engaging with an AI chatbot and completing a self-report Big-Five personality measure. Key findings indicate that machine-inferred personality scores showed acceptable reliability, comparable factor structure to self-reported scores, good convergent validity but relatively poor discriminant validity, low criterion-related validity, and incremental validity over self-reported scores in some analyses.

  • The study explores measuring personality through an AI chatbot using machine learning algorithms.

  • Participants were undergraduate students who engaged with an AI chatbot and completed a self-report Big-Five personality measure.

  • Machine-inferred personality scores showed acceptable reliability, factor structure comparable to self-reported scores, good convergent validity but poor discriminant validity, low criterion-related validity, and incremental validity over self-reported scores in some analyses.

  • The research emphasizes the need for further validation and examination of the psychometric properties of machine-inferred personality scores.

  • The study discusses the potential of AI-based personality assessment and its implications for future research and practical applications.

Machine learning uncovers the most robust self-report predictors of relationship quality across 43 longitudinal couples studies

Summary

This paper explores the predictors of relationship quality in romantic relationships using machine learning techniques across 43 longitudinal datasets from 29 laboratories. The study aimed to quantify the predictability of relationship quality and identify the key predictors. The main findings include that relationship-specific predictors such as perceived-partner commitment, appreciation, and sexual satisfaction, as well as individual difference predictors like life satisfaction and attachment styles, were significant in predicting relationship quality. Actor-reported variables were found to be more predictive than partner-reported variables, and individual differences and partner reports did not have additional predictive effects beyond actor-reported relationship-specific variables. The study also found that changes in relationship quality over time were largely unpredictable from self-report variables. This research contributes to understanding the factors influencing relationship quality and highlights the importance of individual perceptions and experiences in shaping relationship outcomes.

  • Relationship quality is a crucial psychological construct with significant implications for health and well-being.

  • Machine learning techniques, specifically Random Forests, were used to analyze 43 longitudinal datasets from 29 laboratories.

  • Key predictors of relationship quality included perceived-partner commitment, appreciation, sexual satisfaction, and individual differences like life satisfaction and attachment styles.

  • Actor-reported variables were more predictive than partner-reported variables.

  • Individual differences and partner reports did not add predictive value beyond actor-reported relationship-specific variables.

  • Changes in relationship quality over time were largely unpredictable from self-report variables.

Antecedents and consequences of relationship quality (19). Schematic depiction of the field of relationship science. In their work, relationship scientists use an extensive assortment of overlapping individual difference and relationship-specific constructs. These constructs predict the way couple members behave toward and interact with each other, which in turn affects relationship quality and a variety of consequential outcomes. These processes are themselves embedded in social networks as well as broader cultural and historical structures.
Mediational pathway implied by current findings. This figure depicts the mediational model implied by the equivalent predictive power of the “all predictors” vs. “actor relationship” models in Figs. 2 and 3. That is, any effects of self-reported individual differences or partner-reported relationship variables on relationship quality are likely mediated by the actor-reported relationship variables. Individual-difference × relationship variable interactions and actor × partner interactions are not depicted because they are likely quite small. Other constructs in Fig. 1 (e.g., broader contextual forces) are not depicted because they were not examined in this study.

Last updated