[08/08/2024] Text-Based Traitand Cue Judgments
Last updated
Last updated
Gap: While most studies have examined the raw performance of machine learning and artificial intelligence (AI) models, psychometric analyses of AI models as judges are rare. Yet, the importance of psychometric analyses of AI models is rising because publicly available large language models (LLMs), such as ChatGPT (OpenAI, 2024a), now even allow laypeople to interact with AI.
RQ1: To what extent do humans and LLMs judge the same constructs?
RQ2: How well do LLMs perform as judges? (path A_a in Figure 1)
RQ3: How similar are LLM judgments to human judges and judgments of other LLMs?
RQ3.1: How similar are text-based LLM judgments to human judgments and to those from other LLMs? (c-path in Figure 1)
RQ3.2: How reliable are text-based LLM judgments?
RQ4: How much do LLM-judged textual cues explain trait criteria and judgments? (V and U paths in Figure 1)
RQ4.1—and cue utilization—how much cues predict trait judgments
RQ4.2. This analysis will help assess the validity and utilization of cue judgments produced by LLMs.
RQ5: How do LLMs compare to humans and other technologies regarding judgment performance? (Comparing A, V, and U paths in Figure 1)
RQ5.1: How do LLMs compare to humans and other LLMs regarding trait judgment performance? (Comparing A paths in Figure 1)
RQ5.2: How do LLMs compare to humans and LIWC analyses regarding cue judgment performance? (Comparing V and U paths in Figure 1)
RQ6: How well do LLMs perform when explicitly judging the Big Five personality traits?