[08/08/2024] Text-Based Traitand Cue Judgments

312KB

Preregistration_Creative-Text-Based Trait and Cue Judgments.pdf

PDF

Gap: While most studies have examined the raw performance of machine learning and artificial intelligence (AI) models, psychometric analyses of AI models as judges are rare. Yet, the importance of psychometric analyses of AI models is rising because publicly available large language models (LLMs), such as ChatGPT (OpenAI, 2024a), now even allow laypeople to interact with AI.

RQ1: To what extent do humans and LLMs judge the same constructs?

RQ2: How well do LLMs perform as judges? (path A_a in Figure 1)

RQ3: How similar are LLM judgments to human judges and judgments of other LLMs?

RQ3.1: How similar are text-based LLM judgments to human judgments and to those from other LLMs? (c-path in Figure 1)
RQ3.2: How reliable are text-based LLM judgments?

RQ4: How much do LLM-judged textual cues explain trait criteria and judgments? (V and U paths in Figure 1)

RQ4.1—and cue utilization—how much cues predict trait judgments
RQ4.2. This analysis will help assess the validity and utilization of cue judgments produced by LLMs.

RQ5: How do LLMs compare to humans and other technologies regarding judgment performance? (Comparing A, V, and U paths in Figure 1)

RQ5.1: How do LLMs compare to humans and other LLMs regarding trait judgment performance? (Comparing A paths in Figure 1)
RQ5.2: How do LLMs compare to humans and LIWC analyses regarding cue judgment performance? (Comparing V and U paths in Figure 1)

RQ6: How well do LLMs perform when explicitly judging the Big Five personality traits?

Previous[08/08/2024] A large-scale, gamified online assessment Next[08/07/2024] Chuan-Peng Lab

Last updated 1 year ago