😶‍🌫️
Psych
  • Preface
  • [4/9/2025] A One-Stop Calculator and Guide for 95 Effect-Size Variants
  • [4/9/2025] the people make the place
  • [4/9/2025] Personality predicts things
  • [3/31/2025] Response surface analysis with multilevel data
  • [3/11/2025] A Complete Guide to Natural Language Processing
  • [3/4/2025] Personality - Self and Identity
  • [3/1/2025] Updating Vocational Interests Information
  • [2/25/2025] Abilities & Skills
  • [2/22/2025] APA table format
  • [2/19/2025] LLM that replace human participants can harmfully misportray and flatt
  • [2/18/2025] Research Methods Knowledge Base
  • [2/17/2025] Personality - Motives/Interests
  • [2/11/2025] Trait structure
  • [2/10/2025] Higher-order construct
  • [2/4/2025] RL for CAT
  • [2/4/2025] DoWhy | An end-to-end library for causal inference
  • [2/4/2025] DAGitty — draw and analyze causal diagrams
  • [2/2/2025] Personality States
  • [2/2/2025] Psychometric Properties of Automated Video Interview Competency Assessments
  • [2/2/2025] How to diagnose abhorrent science
  • [1/28/2025] LLM and personality/interest items
  • [1/28/2025] Personality - Dispositions
  • [1/28/2025] Causal inference in statistics
  • [1/27/2025] Personality differences between birth order categories and across sibship sizes
  • [1/27/2025] nomological network meta-analysis.
  • [1/25/2025] Classic Papers on Scale Development/Validation
  • [1/17/2025] Personality Reading
  • [1/15/2025] Artificial Intelligence: Redefining the Future of Psychology
  • [1/13/2025] R for Psychometics
  • [12/24/2024] Comparison of interest congruence indices
  • [12/24/2024] Most recent article on interest fit measures
  • [12/24/2024] Grammatical Redundancy in Scales: Using the “ConGRe” Process to Create Better Measures
  • [12/24/2024] Confirmatory Factor Analysis with Word Embeddings
  • [12/24/2024] Can ChatGPT Develop a Psychometrically Sound Situational Judgment Test?
  • [12/24/2024] Using NLP to replace human content coders
  • [11/21/2024] AI Incident Database
  • [11/20/2024] Large Language Model-Enhanced Reinforcement Learning
  • [11/05/2024] Self-directed search
  • [11/04/2024] Interview coding and scoring
  • [11/04/2024] What if there were no personality factors?
  • [11/04/2024] BanditCAT and AutoIRT
  • [10/29/2024] LLM for Literature/Survey
  • [10/27/2024] Holland's Theory of Vocational Choice and Adjustment
  • [10/27/2024] Item Response Warehouse
  • [10/26/2024] EstCRM - the Samejima's Continuous IRT Model
  • [10/23/2024] Idiographic Personality Gaussian Process for Psychological Assessment
  • [10/23/2024] The experience sampling method (ESM)
  • [10/21/2024] Ecological Momentary Assessment (EMA)
  • [10/20/2024] Meta-Analytic Structural Equation Modeling
  • [10/20/2024] Structure of vocational interests
  • [10/17/2024] LLMs for psychological assessment
  • [10/16/2024] Can Deep Neural Networks Inform Theory?
  • [10/16/2024] Cognition & Decision Modeling Laboratory
  • [10/14/2024] Time-Invariant Confounders in Cross-Lagged Panel Models
  • [10/13/2024] Polynomial regression
  • [10/13/2024] Bayesian Mixture Modeling
  • [10/10/2024] Response surface analysis (RSA)
  • [10/10/2024] Text-Based Personality Assessment with LLM
  • [10/09/2024] Circular unidimensional scaling: A new look at group differences in interest structure.
  • [10/07/2024] Video Interview
  • [10/07/2024] Relationship between Measurement and ML
  • [10/07/2024] Conscientiousness × Interest Compensation (CONIC) model
  • [10/03/2024] Response modeling methodology
  • [10/02/2024] Conceptual Versus Empirical Distinctions Among Constructs
  • [10/02/2024] Construct Proliferation
  • [09/23/2024] Psychological Measurement Paradigm through Interactive Fiction Games
  • [09/20/2024] A Computational Method to Reveal Psychological Constructs From Text Data
  • [09/18/2024] H is for Human and How (Not) To Evaluate Qualitative Research in HCI
  • [09/17/2024] Automated Speech Recognition Bias in Personnel Selection
  • [09/16/2024] Congruency Effect
  • [09/11/2024] privacy, security, and trust perceptions
  • [09/10/2024] Measurement, Scale, Survey, Questionnaire
  • [09/09/2024] Reporting Systematic Reviews
  • [09/09/2024] Evolutionary Neuroscience
  • [09/09/2024] On Personality Measures and Their Data
  • [09/09/2024] Two Dimensions of Professor-Student Rapport Differentially Predict Student Success
  • [09/05/2024] The SAPA Personality Inventory
  • [09/05/2024] Moderated mediation
  • [09/03/2024] BiGGen Bench
  • [09/02/2024] LMSYS Chatbot Arena
  • [09/02/2024] Introduction to Measurement Theory Chapters 1, 2 (2.1-2.8) and 3.
  • [09/01/2024] HCI measurememt
  • [08/30/2024] Randomization Test
  • [08/30/2024] Interview Quantative Statistical
  • [08/29/2024] Cascading Model
  • [08/29/2024] Introduction: The White House (IS_202)
  • [08/29/2024] Circular unidimensional scaling
  • [08/28/2024] Sex and Gender Differences (Neur_542_Week2)
  • [08/26/2024] Workplace Assessment and Social Perceptions (WASP) Lab
  • [08/26/2024] Computational Organizational Research Lab
  • [08/26/2024] Reading List (Recommended by Bo)
  • [08/20/2024] Illinois NeuroBehavioral Assessment Laboratory (INBAL)
  • [08/14/2024] Quantitative text analysis
  • [08/14/2024] Measuring complex psychological and sociological constructs in large-scale text
  • [08/14/2024] LLM for Social Science Research
  • [08/14/2024] GPT for multilingual psychological text analysis
  • [08/12/2024] Questionable Measurement Practices and How to Avoid Them
  • [08/12/2024] NLP for Interest (from Dan Putka)
  • [08/12/2024] ONet Interest Profiler (Long and Short Scale)
  • [08/12/2024] ONet Interests Data
  • [08/12/2024] The O*NET-SOC Taxonomy
  • [08/12/2024] ML Ratings for O*Net
  • [08/09/2024] Limited ability of LLMs to simulate human psychological behaviours
  • [08/08/2024] A large-scale, gamified online assessment
  • [08/08/2024] Text-Based Traitand Cue Judgments
  • [08/07/2024] Chuan-Peng Lab
  • [08/07/2024] Modern psychometrics: The science of psychological assessment
  • [08/07/2024] Interactive Survey
  • [08/06/2024] Experimental History
  • [08/06/2024] O*NET Research reports
  • [07/30/2024] Creating a psychological assessment tool based on interactive storytelling
  • [07/24/2024] My Life with a Theory
  • [07/24/2024] NLP for Interest Job Ratings
  • [07/17/2024] Making vocational choices
  • [07/17/2024] Taxonomy of Psychological Situation
  • [07/12/2024] PathChat 2
  • [07/11/2024] Using games to understand the mind
  • [07/10/2024] Gamified Assessments
  • [07/09/2024] Poldracklab Software and Data
  • [07/09/2024] Consensus-based Recommendations for Machine-learning-based Science
  • [07/08/2024] Using AI to assess personal qualities
  • [07/08/2024] AI Psychometrics And Psychometrics Benchmark
  • [07/02/2024] Prompt Engineering Guide
  • [06/28/2024] Observational Methods and Qualitative Data Analysis 5-6
  • [06/28/2024] Observational Methods and Qualitative Data Analysis 3-4
  • [06/28/2024] Interviewing Methods 5-6
  • [06/28/2024] Interviewing Methods 3-4
  • [06/28/2024] What is Qualitative Research 3
  • [06/27/2024] APA Style
  • [06/27/2024] Statistics in Psychological Research 6
  • [06/27/2024] Statistics in Psychological Research 5
  • [06/23/2024] Bayesian Belief Network
  • [06/18/2024] Fair Comparisons in Heterogenous Systems Evaluation
  • [06/18/2024] What should we evaluate when we use technology in education?
  • [06/16/2024] Circumplex Model
  • [06/12/2024] Ways of Knowing in HCI
  • [06/09/2024] Statistics in Psychological Research 1-4
  • [06/08/2024] Mathematics for Machine Learning
  • [06/08/2024] Vocational Interests SETPOINT Dimensions
  • [06/07/2024] How's My PI Study
  • [06/06/2024] Best Practices in Supervised Machine Learning
  • [06/06/2024] SIOP
  • [06/06/2024] Measurement, Design, and Analysis: An Integrated Approach (Chu Recommended)
  • [06/06/2024] Classical Test Theory
  • [06/06/2024] Introduction to Measurement Theory (Bo Recommended)
  • [06/03/2024] EDSL: AI-Powered Research
  • [06/03/2024] Perceived Empathy of Technology Scale (PETS)
  • [06/02/2024] HCI area - Quantitative and Qualitative Modeling and Evaluation
  • [05/26/2024] Psychometrics with R
  • [05/26/2024] Programming Grammer Design
  • [05/25/2024] Psychometric Network Analysis
  • [05/23/2024] Item Response Theory
  • [05/22/2024] Nature Human Behaviour (Jan - 20 May, 2024)
  • [05/22/2024] Nature Human Behaviour - Navigating the AI Frontier
  • [05/22/2024] Computer Adaptive Testing
  • [05/22/2024] Personality Scale (Jim Shard)
  • [05/22/2024] Reliability
  • [05/19/2024] Chatbot (Jim Shared)
  • [05/17/2024] GOMS and Keystroke-Level Model
  • [05/17/2024] The Psychology of Human-Computer Interaction
  • [05/14/2024] Computational Narrative (Mark's Group)
  • [05/14/2024] Validity Coding
  • [05/14/2024] LLM as A Evaluator
  • [05/14/2024] Social Skill Training via LLMs (Diyi's Group)
  • [05/14/2024] AI Persona
  • [05/09/2024] Psychological Methods Journal Sample Articles
  • [05/08/2024] Meta-Analysis
  • [05/07/2024] Mturk
  • [05/06/2024] O*NET Reports and Documents
  • [05/04/2024] NLP and Chatbot on Personality Assessment (Tianjun)
  • [05/02/2024] Reads on Construct Validation
  • [04/25/2024] Reads on Validity
  • [04/18/2024] AI for Assessment
  • [04/17/2024] Interest Assessment
  • [04/16/2024] Personality Long Reading List (Jim)
    • Personality Psychology Overview
      • Why Study Personality Assessment
    • Dimensions and Types
    • Reliability
    • Traits: Two Views
    • Validity--Classical Articles and Reflections
    • Validity-Recent Proposals
    • Multimethod Perspective and Social Desirability
    • Paradigm of Personality Assessment: Multivariate
    • Heritability of personality traits
    • Classical Test-Construction
    • IRT
    • Social desirability in scale construction
    • Traits and culture
    • Paradigms of personality assessment: Empirical
    • Comparison of personality test construction strategies
    • Clinical versus Actuarial (AI) Judgement and Diagnostics
    • Decisions: Importance of base rates
    • Paradigms of Personality Assessment: Psychodynamic
    • Paradigms of Assessment: Interpersonal
    • Paradigms of Personality Assessment: Personological
    • Retrospective reports
    • Research Paradigms
    • Personality Continuity and Change
Powered by GitBook
On this page
  • Chapter 1: Introduction
  • Chapter 2: A Review of Basic Statistical Concepts
  • Chapter 3: Classical True-Score Theory

[09/02/2024] Introduction to Measurement Theory Chapters 1, 2 (2.1-2.8) and 3.

Chapter 1: Introduction

  1. The Use of Tests: Tests are tools for obtaining samples of behavior and are widely used in education, clinical settings, industry, and government.

  2. Definition of Measurement: Measurement involves systematically assigning numbers to individuals to represent their properties.

  3. History of Testing and Measurement: Discusses the evolution of testing from ancient civil-service exams in China to modern psychological and educational assessments.

  4. Organization of the Book: Outlines the book’s chapters, covering statistics, classical theory, reliability, validity, test construction, scoring, scaling, and modern controversies.

  5. Standards for Test Users: Emphasizes the importance of technical competence, fairness, and the ethical use of tests according to established standards.

Chapter 2: A Review of Basic Statistical Concepts

2.1 Introduction

  • Overview: The chapter introduces the mathematical foundations necessary to understand measurement theory. It emphasizes that measurement theory is rooted in statistics and mathematical concepts.

  • Purpose: To ensure readers have the necessary background in statistics, this chapter reviews key statistical concepts and skills.

2.2 Levels of Measurement

  • Four Levels: Measurement can occur at four levels: nominal, ordinal, interval, and ratio.

    • Nominal: Assigns distinct numbers to categories without implying order or magnitude. Example: labeling hair colors as "1" for red, "2" for brown.

    • Ordinal: Involves distinctiveness and order. Higher numbers indicate more of a property, but intervals between values are not necessarily equal. Example: ranking people by height.

    • Interval: Has distinctiveness, order, and equal intervals but lacks an absolute zero. Example: Fahrenheit temperature scale.

    • Ratio: Includes all four characteristics: distinctiveness, order, equal intervals, and an absolute zero. Example: measuring length in inches.

  • Importance: The level of measurement affects the choice of statistical techniques for data analysis.

2.3 Common Statistical Notation and Definitions

  • Constants vs. Variables:

    • Constants: Represent fixed, unchanging values, often symbolized by lower-case letters or Greek letters.

    • Variables: Represent quantities that can change and are typically symbolized by capital italic letters.

    • Subscripts: Used to differentiate multiple variables (e.g., X1X_1X1​, X2X_2X2​ for different scores).

  • Discrete vs. Continuous Variables:

    • Discrete Variables: Can take on specific values (e.g., the number of correct answers on a test).

    • Continuous Variables: Can take on any value within a range (e.g., time taken to complete a task).

  • Summation Notation:

    • The summation sign (Σ\SigmaΣ) indicates that values following the sign should be added together.

    • Summation Rules: Three main rules simplify arithmetic involving summation:

      • Rule 1: Summing a constant nnn times is equivalent to multiplying the constant by nnn.

      • Rule 2: The summation of variables multiplied by a constant is equivalent to the constant times the sum of the variables.

      • Rule 3: The summation sign can be "distributed" to each term when summing more than one term.

2.4 Distributions and Probabilities

  • Frequency Distributions:

    • Frequency Distribution: Shows how often each value of a discrete variable occurs.

    • Relative Frequency: Proportion of times a variable takes on each value, with the sum of all relative frequencies equaling 1.

  • Probability:

    • Defined as the relative frequency of a value in a population.

    • The probability that a variable XXX takes on a value XiX_iXi​ is denoted as p(X=Xi)p(X = X_i)p(X=Xi​).

    • Cumulative Probability: The probability that XXX falls within a certain range is the sum of the probabilities for each value in that range.

2.5 Descriptive Statistics

  • Central Tendency:

    • Mode: The most frequently occurring score in a distribution.

    • Median: The middle score when all scores are ranked in order.

    • Mean: The arithmetic average of all scores, calculated by summing all scores and dividing by the number of scores.

  • Variability:

    • Range: The difference between the highest and lowest scores.

    • Variance: The average of the squared deviations from the mean, indicating the spread of scores.

    • Standard Deviation: The square root of the variance, providing a measure of variability in the same units as the data.

  • Skewness: Describes the symmetry of a distribution.

    • Symmetrical Distribution: Left and right sides are mirror images.

    • Positively Skewed: Tail is longer on the right.

    • Negatively Skewed: Tail is longer on the left.

2.6 Inferential Statistics

  • Purpose: Inferential statistics allow researchers to make generalizations about a population based on a sample.

  • Populations and Samples:

    • Population: The entire group of individuals being studied.

    • Sample: A subset of the population used to make inferences about the population.

  • Random Sampling: Ensures that every individual in the population has an equal chance of being included in the sample, reducing bias.

  • Hypothesis Testing vs. Estimation:

    • Hypothesis Testing: Involves formulating a hypothesis about the population and using sample data to test it.

    • Estimation: Uses sample statistics to estimate population parameters, often presented with confidence intervals.

2.7 The Normal Distribution

  • Definition: A bell-shaped curve with most scores near the middle and fewer scores at the extremes.

  • Standard Normal Distribution: A specific normal distribution with a mean of 0 and a standard deviation of 1.

  • Probability Calculation: The area under the curve between two points represents the probability of observing a score in that range.

  • Z-Scores: Standardized scores that allow for the comparison of scores from different distributions.

2.8 The Pearson Correlation Coefficient

  • Purpose: Measures the strength and direction of the relationship between two variables.

  • Formula:

    • The Pearson correlation coefficient (rrr) is calculated using the covariance of the variables divided by the product of their standard deviations.

  • Interpretation:

    • r=1r = 1r=1 indicates a perfect positive relationship.

    • r=−1r = -1r=−1 indicates a perfect negative relationship.

    • r=0r = 0r=0 indicates no relationship between the variables.

Chapter 3: Classical True-Score Theory

3.1 The Assumptions of Classical True-Score Theory

  • Overview: Classical true-score theory is a fundamental concept in measurement, providing a framework to understand the relationship between observed scores and true scores.

  • True Score and Error:

    • True Score (T): The actual score that a person would get if there were no measurement errors.

    • Observed Score (X): The score that is actually obtained, which includes both the true score and some error component.

    • Error (E): The difference between the observed score and the true score, which is assumed to be random.

  • Key Assumptions:

    • Linearity: The observed score is a linear combination of the true score and the error term (X=T+EX = T + EX=T+E).

    • Independence: The true score and error are uncorrelated.

    • Error Expectation: The average error across a large number of measurements is zero, meaning errors cancel each other out over time.

    • Constant Variance of Errors: The variance of errors is constant across all levels of the true score.

3.2 Summary of Classical True-Score Theory

  • Formula: The observed score can be expressed as X=T+EX = T + EX=T+E, where XXX is the observed score, TTT is the true score, and EEE is the error.

  • Reliability: A key concept in classical true-score theory, reliability refers to the consistency of a measurement. It is defined as the proportion of variance in observed scores that is due to true scores (Reliability=Variance of True ScoresVariance of Observed Scores\text{Reliability} = \frac{\text{Variance of True Scores}}{\text{Variance of Observed Scores}}Reliability=Variance of Observed ScoresVariance of True Scores​).

3.3 Conclusions Derived from Classical True-Score Theory

  • Implications for Measurement:

    • High Reliability: Indicates that the observed scores are a good reflection of true scores, with minimal error.

    • Low Reliability: Suggests that a significant portion of the variance in observed scores is due to error, making the measurement less dependable.

  • Impact on Research:

    • Reliable measurements are crucial for valid conclusions in research. If a test is unreliable, the results may be misleading.

3.4 (Optional) Proofs of Conclusions Derived from Classical True-Score Theory

  • Mathematical Proofs: This section provides detailed mathematical proofs of the key conclusions of classical true-score theory. It includes derivations that show how reliability is related to true and observed score variances and the conditions under which specific formulas hold true.

3.5 Vocabulary

  • Key Terms:

    • True Score (T): The score that reflects the actual level of the attribute being measured, free from error.

    • Observed Score (X): The score obtained from a measurement, which includes both the true score and error.

    • Error (E): The random component that causes the observed score to differ from the true score.

    • Reliability: The extent to which a measurement is free from error and consistently reflects the true score.

3.6 Study Questions

  • Example Questions:

    • What are the key assumptions of classical true-score theory?

    • How is reliability defined in classical true-score theory?

    • Why is it important to understand the relationship between true scores, observed scores, and error in psychological measurement?

3.7 Computational Problems

  • Practice Problems: This section provides computational exercises designed to reinforce the concepts covered in the chapter. Problems may involve calculating reliability, understanding the relationship between true scores and observed scores, and applying the formulas introduced in the chapter.

Key Takeaways:

  • Classical true-score theory provides the foundation for understanding the relationship between an individual's true ability or characteristic and the score observed in a test.

  • Reliability is a crucial concept derived from this theory, indicating how much of the observed score variance is due to true differences among individuals.

  • The assumptions of the theory, such as error being random and independent of true scores, are essential for ensuring that the observed scores provide a meaningful estimate of the true scores.

Previous[09/02/2024] LMSYS Chatbot ArenaNext[09/01/2024] HCI measurememt

Last updated 9 months ago