😶‍🌫️
Psych
  • Preface
  • [4/9/2025] A One-Stop Calculator and Guide for 95 Effect-Size Variants
  • [4/9/2025] the people make the place
  • [4/9/2025] Personality predicts things
  • [3/31/2025] Response surface analysis with multilevel data
  • [3/11/2025] A Complete Guide to Natural Language Processing
  • [3/4/2025] Personality - Self and Identity
  • [3/1/2025] Updating Vocational Interests Information
  • [2/25/2025] Abilities & Skills
  • [2/22/2025] APA table format
  • [2/19/2025] LLM that replace human participants can harmfully misportray and flatt
  • [2/18/2025] Research Methods Knowledge Base
  • [2/17/2025] Personality - Motives/Interests
  • [2/11/2025] Trait structure
  • [2/10/2025] Higher-order construct
  • [2/4/2025] RL for CAT
  • [2/4/2025] DoWhy | An end-to-end library for causal inference
  • [2/4/2025] DAGitty — draw and analyze causal diagrams
  • [2/2/2025] Personality States
  • [2/2/2025] Psychometric Properties of Automated Video Interview Competency Assessments
  • [2/2/2025] How to diagnose abhorrent science
  • [1/28/2025] LLM and personality/interest items
  • [1/28/2025] Personality - Dispositions
  • [1/28/2025] Causal inference in statistics
  • [1/27/2025] Personality differences between birth order categories and across sibship sizes
  • [1/27/2025] nomological network meta-analysis.
  • [1/25/2025] Classic Papers on Scale Development/Validation
  • [1/17/2025] Personality Reading
  • [1/15/2025] Artificial Intelligence: Redefining the Future of Psychology
  • [1/13/2025] R for Psychometics
  • [12/24/2024] Comparison of interest congruence indices
  • [12/24/2024] Most recent article on interest fit measures
  • [12/24/2024] Grammatical Redundancy in Scales: Using the “ConGRe” Process to Create Better Measures
  • [12/24/2024] Confirmatory Factor Analysis with Word Embeddings
  • [12/24/2024] Can ChatGPT Develop a Psychometrically Sound Situational Judgment Test?
  • [12/24/2024] Using NLP to replace human content coders
  • [11/21/2024] AI Incident Database
  • [11/20/2024] Large Language Model-Enhanced Reinforcement Learning
  • [11/05/2024] Self-directed search
  • [11/04/2024] Interview coding and scoring
  • [11/04/2024] What if there were no personality factors?
  • [11/04/2024] BanditCAT and AutoIRT
  • [10/29/2024] LLM for Literature/Survey
  • [10/27/2024] Holland's Theory of Vocational Choice and Adjustment
  • [10/27/2024] Item Response Warehouse
  • [10/26/2024] EstCRM - the Samejima's Continuous IRT Model
  • [10/23/2024] Idiographic Personality Gaussian Process for Psychological Assessment
  • [10/23/2024] The experience sampling method (ESM)
  • [10/21/2024] Ecological Momentary Assessment (EMA)
  • [10/20/2024] Meta-Analytic Structural Equation Modeling
  • [10/20/2024] Structure of vocational interests
  • [10/17/2024] LLMs for psychological assessment
  • [10/16/2024] Can Deep Neural Networks Inform Theory?
  • [10/16/2024] Cognition & Decision Modeling Laboratory
  • [10/14/2024] Time-Invariant Confounders in Cross-Lagged Panel Models
  • [10/13/2024] Polynomial regression
  • [10/13/2024] Bayesian Mixture Modeling
  • [10/10/2024] Response surface analysis (RSA)
  • [10/10/2024] Text-Based Personality Assessment with LLM
  • [10/09/2024] Circular unidimensional scaling: A new look at group differences in interest structure.
  • [10/07/2024] Video Interview
  • [10/07/2024] Relationship between Measurement and ML
  • [10/07/2024] Conscientiousness × Interest Compensation (CONIC) model
  • [10/03/2024] Response modeling methodology
  • [10/02/2024] Conceptual Versus Empirical Distinctions Among Constructs
  • [10/02/2024] Construct Proliferation
  • [09/23/2024] Psychological Measurement Paradigm through Interactive Fiction Games
  • [09/20/2024] A Computational Method to Reveal Psychological Constructs From Text Data
  • [09/18/2024] H is for Human and How (Not) To Evaluate Qualitative Research in HCI
  • [09/17/2024] Automated Speech Recognition Bias in Personnel Selection
  • [09/16/2024] Congruency Effect
  • [09/11/2024] privacy, security, and trust perceptions
  • [09/10/2024] Measurement, Scale, Survey, Questionnaire
  • [09/09/2024] Reporting Systematic Reviews
  • [09/09/2024] Evolutionary Neuroscience
  • [09/09/2024] On Personality Measures and Their Data
  • [09/09/2024] Two Dimensions of Professor-Student Rapport Differentially Predict Student Success
  • [09/05/2024] The SAPA Personality Inventory
  • [09/05/2024] Moderated mediation
  • [09/03/2024] BiGGen Bench
  • [09/02/2024] LMSYS Chatbot Arena
  • [09/02/2024] Introduction to Measurement Theory Chapters 1, 2 (2.1-2.8) and 3.
  • [09/01/2024] HCI measurememt
  • [08/30/2024] Randomization Test
  • [08/30/2024] Interview Quantative Statistical
  • [08/29/2024] Cascading Model
  • [08/29/2024] Introduction: The White House (IS_202)
  • [08/29/2024] Circular unidimensional scaling
  • [08/28/2024] Sex and Gender Differences (Neur_542_Week2)
  • [08/26/2024] Workplace Assessment and Social Perceptions (WASP) Lab
  • [08/26/2024] Computational Organizational Research Lab
  • [08/26/2024] Reading List (Recommended by Bo)
  • [08/20/2024] Illinois NeuroBehavioral Assessment Laboratory (INBAL)
  • [08/14/2024] Quantitative text analysis
  • [08/14/2024] Measuring complex psychological and sociological constructs in large-scale text
  • [08/14/2024] LLM for Social Science Research
  • [08/14/2024] GPT for multilingual psychological text analysis
  • [08/12/2024] Questionable Measurement Practices and How to Avoid Them
  • [08/12/2024] NLP for Interest (from Dan Putka)
  • [08/12/2024] ONet Interest Profiler (Long and Short Scale)
  • [08/12/2024] ONet Interests Data
  • [08/12/2024] The O*NET-SOC Taxonomy
  • [08/12/2024] ML Ratings for O*Net
  • [08/09/2024] Limited ability of LLMs to simulate human psychological behaviours
  • [08/08/2024] A large-scale, gamified online assessment
  • [08/08/2024] Text-Based Traitand Cue Judgments
  • [08/07/2024] Chuan-Peng Lab
  • [08/07/2024] Modern psychometrics: The science of psychological assessment
  • [08/07/2024] Interactive Survey
  • [08/06/2024] Experimental History
  • [08/06/2024] O*NET Research reports
  • [07/30/2024] Creating a psychological assessment tool based on interactive storytelling
  • [07/24/2024] My Life with a Theory
  • [07/24/2024] NLP for Interest Job Ratings
  • [07/17/2024] Making vocational choices
  • [07/17/2024] Taxonomy of Psychological Situation
  • [07/12/2024] PathChat 2
  • [07/11/2024] Using games to understand the mind
  • [07/10/2024] Gamified Assessments
  • [07/09/2024] Poldracklab Software and Data
  • [07/09/2024] Consensus-based Recommendations for Machine-learning-based Science
  • [07/08/2024] Using AI to assess personal qualities
  • [07/08/2024] AI Psychometrics And Psychometrics Benchmark
  • [07/02/2024] Prompt Engineering Guide
  • [06/28/2024] Observational Methods and Qualitative Data Analysis 5-6
  • [06/28/2024] Observational Methods and Qualitative Data Analysis 3-4
  • [06/28/2024] Interviewing Methods 5-6
  • [06/28/2024] Interviewing Methods 3-4
  • [06/28/2024] What is Qualitative Research 3
  • [06/27/2024] APA Style
  • [06/27/2024] Statistics in Psychological Research 6
  • [06/27/2024] Statistics in Psychological Research 5
  • [06/23/2024] Bayesian Belief Network
  • [06/18/2024] Fair Comparisons in Heterogenous Systems Evaluation
  • [06/18/2024] What should we evaluate when we use technology in education?
  • [06/16/2024] Circumplex Model
  • [06/12/2024] Ways of Knowing in HCI
  • [06/09/2024] Statistics in Psychological Research 1-4
  • [06/08/2024] Mathematics for Machine Learning
  • [06/08/2024] Vocational Interests SETPOINT Dimensions
  • [06/07/2024] How's My PI Study
  • [06/06/2024] Best Practices in Supervised Machine Learning
  • [06/06/2024] SIOP
  • [06/06/2024] Measurement, Design, and Analysis: An Integrated Approach (Chu Recommended)
  • [06/06/2024] Classical Test Theory
  • [06/06/2024] Introduction to Measurement Theory (Bo Recommended)
  • [06/03/2024] EDSL: AI-Powered Research
  • [06/03/2024] Perceived Empathy of Technology Scale (PETS)
  • [06/02/2024] HCI area - Quantitative and Qualitative Modeling and Evaluation
  • [05/26/2024] Psychometrics with R
  • [05/26/2024] Programming Grammer Design
  • [05/25/2024] Psychometric Network Analysis
  • [05/23/2024] Item Response Theory
  • [05/22/2024] Nature Human Behaviour (Jan - 20 May, 2024)
  • [05/22/2024] Nature Human Behaviour - Navigating the AI Frontier
  • [05/22/2024] Computer Adaptive Testing
  • [05/22/2024] Personality Scale (Jim Shard)
  • [05/22/2024] Reliability
  • [05/19/2024] Chatbot (Jim Shared)
  • [05/17/2024] GOMS and Keystroke-Level Model
  • [05/17/2024] The Psychology of Human-Computer Interaction
  • [05/14/2024] Computational Narrative (Mark's Group)
  • [05/14/2024] Validity Coding
  • [05/14/2024] LLM as A Evaluator
  • [05/14/2024] Social Skill Training via LLMs (Diyi's Group)
  • [05/14/2024] AI Persona
  • [05/09/2024] Psychological Methods Journal Sample Articles
  • [05/08/2024] Meta-Analysis
  • [05/07/2024] Mturk
  • [05/06/2024] O*NET Reports and Documents
  • [05/04/2024] NLP and Chatbot on Personality Assessment (Tianjun)
  • [05/02/2024] Reads on Construct Validation
  • [04/25/2024] Reads on Validity
  • [04/18/2024] AI for Assessment
  • [04/17/2024] Interest Assessment
  • [04/16/2024] Personality Long Reading List (Jim)
    • Personality Psychology Overview
      • Why Study Personality Assessment
    • Dimensions and Types
    • Reliability
    • Traits: Two Views
    • Validity--Classical Articles and Reflections
    • Validity-Recent Proposals
    • Multimethod Perspective and Social Desirability
    • Paradigm of Personality Assessment: Multivariate
    • Heritability of personality traits
    • Classical Test-Construction
    • IRT
    • Social desirability in scale construction
    • Traits and culture
    • Paradigms of personality assessment: Empirical
    • Comparison of personality test construction strategies
    • Clinical versus Actuarial (AI) Judgement and Diagnostics
    • Decisions: Importance of base rates
    • Paradigms of Personality Assessment: Psychodynamic
    • Paradigms of Assessment: Interpersonal
    • Paradigms of Personality Assessment: Personological
    • Retrospective reports
    • Research Paradigms
    • Personality Continuity and Change
Powered by GitBook
On this page
  • Steps
  • Analysis
  • High-Point Codes
  • Tables

[08/12/2024] ML Ratings for O*Net

Previous[08/12/2024] The O*NET-SOC TaxonomyNext[08/09/2024] Limited ability of LLMs to simulate human psychological behaviours

Last updated 9 months ago

Using Machine Learning to Develop Occupational Interest Profiles and High-Point Codes for the O*NET System

Page 11

Page 78

Page 82 - Rating Instructions and Rating Sheet

Steps

  • Step 1: Acquiring and Preparing Data for Initial Modeling Work

    • Collected and prepared data from various O*NET databases.

    • Constructed a combined dataset from 2008-2013 O*NET data.

    • Addressed data alignment issues across different O*NET-SOC taxonomies.

    • Implemented quality checks on RIASEC ratings and adjusted them based on agreement among raters.

  • Step 2: Developing and Evaluating Initial RIASEC Prediction Models

    • Developed models to predict RIASEC ratings using various machine learning techniques.

    • Created features from occupational text using methods like Bag-of-Words (BoW) and SBERT embeddings.

    • Evaluated different models using cross-validation and selected the best-performing ones.

  • Step 3: Generating Preliminary OIPs and High-Point Codes

    • Generated preliminary Occupational Interest Profiles and High-Point Codes based on the initial models.

  • Step 4: Identifying Occupations for Inclusion in Analyst-Expert Rating Data Collections

    • Identified occupations to be included in further data collections by analysts and experts.

  • Step 5: Collecting and Evaluating Analyst and Expert RIASEC Ratings

    • Conducted rater recruitment and training.

    • Collected ratings from analysts and experts and cleaned the data.

    • Evaluated the reliability and agreement of the collected ratings.

  • Step 6: Refining and Evaluating Final RIASEC Prediction Models for Future Use

    • Refined the prediction models by incorporating additional features like AT, DWA, and IWA.

    • Conducted further evaluations to ensure the models balanced predictive accuracy with practical considerations.

  • Step 7: Finalizing OIPs and High-Point Codes for O*NET 28.1

    • Finalized the Occupational Interest Profiles and High-Point Codes for publication in the O*NET 28.1 database.

    • Provided guidance for future updates to the OIPs and High-Point Codes.

Analysis

  • Cross-Validation and Model Fitting:

    • The team employed a five-fold cross-validation strategy to evaluate the models. This involved splitting the data into training and test sets, tuning hyperparameters, and evaluating model performance using root mean squared error (RMSE) as the primary metric.

    • Elastic Net (EN) regression was predominantly used for model fitting, as it performed best in most cases compared to Sparse Partial Least Squares (SPLS) regression​.

  • Residual Models and Ensemble Predictions:

    • Residual models were developed to account for variance unexplained by the initial models. These models were trained on different feature sets (e.g., alternative titles, Detailed Work Activities (DWA), Intermediate Work Activities (IWA)).

    • An ensemble model was then trained using predictions from the residual models as inputs. This ensemble approach was meant to improve prediction accuracy by combining the strengths of individual models​.

  • Evaluation Metrics:

    • The primary metrics used to evaluate the models were RMSE and multiple R (correlation coefficient). RMSE was favored for making decisions as it provides insights into both the strength of the linear relationship and the scale correspondence between predicted and actual values.

    • Multiple R values were also reported but were not used for decision-making due to potential scaling issues observed during the analysis​.

  • Final Model Evaluation:

    • The final predictions were a combination of the ensemble predictions and the residual model predictions. These were bound within a 1-7 rating scale to align with the RIASEC dimensions.

    • The final model performance was compared to existing benchmarks, showing strong correlations between expert ratings and final predictions, particularly for Realistic, Investigative, Artistic, and Social interests.

  • Residual Analysis:

    • A detailed residual analysis was performed to understand how well the predictions matched the expert ratings across different job families and job zones. This analysis decomposed the variance in residuals to identify whether differences were attributable to job families, job zones, or individual occupations​.

High-Point Codes

Step 1: Convert RIASEC Ratings to Proportions

First, let's assume you have RIASEC ratings for occupations in a DataFrame df with columns corresponding to the RIASEC dimensions

Step 2: Assign High-Point Codes Based on Threshold

Now, we need to identify the top 3 RIASEC dimensions for each occupation and apply the threshold:

“Retain only those high points for an occupation where the RIASEC dimension assigned to that high point had a proportion greater than .17 (i.e., a variable high-point code system). For example, if a third high-point code for an occupation listed a RIASEC dimension with a rating proportion of .15, no third high-point code was listed for that occupation.”

Step 3: Evaluate Agreement Between Predicted and Expert High-Point Codes

For the evaluation, you need a DataFrame df_expert with expert-assigned codes to compare with your predictions:

Step 4: Review and Validate Final High-Point Codes

You can then review and validate the high-point codes by checking the agreement scores:

Tables

Table 2.4. Cross-Validated RMSE Results for Best Specifications for Initial 14 Models for Each RIASEC Dimension

  • Model: Lists the different models used, each representing a different method of text processing or feature extraction. For example:

    • M5: Task SBERT Embeddings

    • M2: SBERT Embeddings for Concatenated Text from Titles, Descriptions, and Tasks

    • M8: SBERT Combined High/Low RIASEC Occupation Similarity

  • N: The number of samples in the training data. For most models, this is 731.

  • # of Features: The number of features used in each model. For instance, M1 uses 1082 features (Bag of Words method), while M5 uses 768 features (Task SBERT Embeddings).

  • Average Cross-Validated RMSE Across Hold-Out Training Folds: The average RMSE calculated during cross-validation on the training data. Lower values indicate better model performance with less prediction error.

  • Cross-Validated RMSE in Testing Data: The RMSE for each model on the testing data during cross-validation. Again, lower values are better and indicate that the model performs well on unseen data.

Table 2.10. Summary of Correlations Among Predictions from Initial Models Used as Features in First-Stage Ensembles

  • Mean Correlations: The mean correlation values are generally high, especially for the Realistic (R), Social (S), and Enterprising (E) dimensions, indicating that the models tend to agree with each other on these dimensions.

  • Text Models Only (M2 – M10): These models show strong agreement, with mean correlations above 0.85 for all dimensions, and a very high maximum correlation reaching 1.00 for some dimensions.

  • Text Models and KSA/GWA Models (M2 – M14): The inclusion of KSA/GWA models slightly reduces the mean correlations, especially in the Investigative (I), Artistic (A), and Conventional (C) dimensions, where the mean correlations drop to around 0.78-0.82.

  • Standard Deviation: The standard deviation is low across the board, indicating that the correlations are consistently high among models, with slightly more variability in the Artistic (A) and Conventional (C) dimensions.

  • Minimum and Maximum Correlations: The minimum correlations for the M2 – M14 range are notably lower, especially in the Artistic (A) and Conventional (C) dimensions, suggesting that some models might not agree as well when incorporating KSA/GWA ratings.

Table 2.11 and Table 2.12 present the regression coefficients and relative importance estimates for the best first-stage ensembles for each RIASEC dimension. These tables provide insights into how different models contribute to the prediction of each RIASEC dimension and the relative importance of each model in the ensemble.

  • B (Regression Coefficient): The coefficient represents the contribution of each model to the prediction of a particular RIASEC dimension. A higher absolute value indicates a stronger contribution.

  • RI (Relative Importance): This value reflects the proportion of the ensemble R2R^2R2 that is attributable to the given model, based on a general dominance analysis. Higher values indicate that the model is more important in the ensemble for predicting that particular dimension.

Using Machine Learning to Develop Occupational Interest Profiles and High-Point Codes for the O*NET System at O*NET Resource Center
Logo