[08/12/2024] ML Ratings for O*Net
Last updated
Last updated
Using Machine Learning to Develop Occupational Interest Profiles and High-Point Codes for the O*NET System
Page 11
Page 78
Page 82 - Rating Instructions and Rating Sheet
Step 1: Acquiring and Preparing Data for Initial Modeling Work
Collected and prepared data from various O*NET databases.
Constructed a combined dataset from 2008-2013 O*NET data.
Addressed data alignment issues across different O*NET-SOC taxonomies.
Implemented quality checks on RIASEC ratings and adjusted them based on agreement among raters.
Step 2: Developing and Evaluating Initial RIASEC Prediction Models
Developed models to predict RIASEC ratings using various machine learning techniques.
Created features from occupational text using methods like Bag-of-Words (BoW) and SBERT embeddings.
Evaluated different models using cross-validation and selected the best-performing ones.
Step 3: Generating Preliminary OIPs and High-Point Codes
Generated preliminary Occupational Interest Profiles and High-Point Codes based on the initial models.
Step 4: Identifying Occupations for Inclusion in Analyst-Expert Rating Data Collections
Identified occupations to be included in further data collections by analysts and experts.
Step 5: Collecting and Evaluating Analyst and Expert RIASEC Ratings
Conducted rater recruitment and training.
Collected ratings from analysts and experts and cleaned the data.
Evaluated the reliability and agreement of the collected ratings.
Step 6: Refining and Evaluating Final RIASEC Prediction Models for Future Use
Refined the prediction models by incorporating additional features like AT, DWA, and IWA.
Conducted further evaluations to ensure the models balanced predictive accuracy with practical considerations.
Step 7: Finalizing OIPs and High-Point Codes for O*NET 28.1
Finalized the Occupational Interest Profiles and High-Point Codes for publication in the O*NET 28.1 database.
Provided guidance for future updates to the OIPs and High-Point Codes.
Cross-Validation and Model Fitting:
The team employed a five-fold cross-validation strategy to evaluate the models. This involved splitting the data into training and test sets, tuning hyperparameters, and evaluating model performance using root mean squared error (RMSE) as the primary metric.
Elastic Net (EN) regression was predominantly used for model fitting, as it performed best in most cases compared to Sparse Partial Least Squares (SPLS) regression.
Residual Models and Ensemble Predictions:
Residual models were developed to account for variance unexplained by the initial models. These models were trained on different feature sets (e.g., alternative titles, Detailed Work Activities (DWA), Intermediate Work Activities (IWA)).
An ensemble model was then trained using predictions from the residual models as inputs. This ensemble approach was meant to improve prediction accuracy by combining the strengths of individual models.
Evaluation Metrics:
The primary metrics used to evaluate the models were RMSE and multiple R (correlation coefficient). RMSE was favored for making decisions as it provides insights into both the strength of the linear relationship and the scale correspondence between predicted and actual values.
Multiple R values were also reported but were not used for decision-making due to potential scaling issues observed during the analysis.
Final Model Evaluation:
The final predictions were a combination of the ensemble predictions and the residual model predictions. These were bound within a 1-7 rating scale to align with the RIASEC dimensions.
The final model performance was compared to existing benchmarks, showing strong correlations between expert ratings and final predictions, particularly for Realistic, Investigative, Artistic, and Social interests.
Residual Analysis:
A detailed residual analysis was performed to understand how well the predictions matched the expert ratings across different job families and job zones. This analysis decomposed the variance in residuals to identify whether differences were attributable to job families, job zones, or individual occupations.
First, let's assume you have RIASEC ratings for occupations in a DataFrame df
with columns corresponding to the RIASEC dimensions
Now, we need to identify the top 3 RIASEC dimensions for each occupation and apply the threshold:
“Retain only those high points for an occupation where the RIASEC dimension assigned to that high point had a proportion greater than .17 (i.e., a variable high-point code system). For example, if a third high-point code for an occupation listed a RIASEC dimension with a rating proportion of .15, no third high-point code was listed for that occupation.”
For the evaluation, you need a DataFrame df_expert
with expert-assigned codes to compare with your predictions:
You can then review and validate the high-point codes by checking the agreement scores:
Table 2.4. Cross-Validated RMSE Results for Best Specifications for Initial 14 Models for Each RIASEC Dimension
Model: Lists the different models used, each representing a different method of text processing or feature extraction. For example:
M5: Task SBERT Embeddings
M2: SBERT Embeddings for Concatenated Text from Titles, Descriptions, and Tasks
M8: SBERT Combined High/Low RIASEC Occupation Similarity
N: The number of samples in the training data. For most models, this is 731.
# of Features: The number of features used in each model. For instance, M1 uses 1082 features (Bag of Words method), while M5 uses 768 features (Task SBERT Embeddings).
Average Cross-Validated RMSE Across Hold-Out Training Folds: The average RMSE calculated during cross-validation on the training data. Lower values indicate better model performance with less prediction error.
Cross-Validated RMSE in Testing Data: The RMSE for each model on the testing data during cross-validation. Again, lower values are better and indicate that the model performs well on unseen data.
Table 2.10. Summary of Correlations Among Predictions from Initial Models Used as Features in First-Stage Ensembles
Mean Correlations: The mean correlation values are generally high, especially for the Realistic (R), Social (S), and Enterprising (E) dimensions, indicating that the models tend to agree with each other on these dimensions.
Text Models Only (M2 – M10): These models show strong agreement, with mean correlations above 0.85 for all dimensions, and a very high maximum correlation reaching 1.00 for some dimensions.
Text Models and KSA/GWA Models (M2 – M14): The inclusion of KSA/GWA models slightly reduces the mean correlations, especially in the Investigative (I), Artistic (A), and Conventional (C) dimensions, where the mean correlations drop to around 0.78-0.82.
Standard Deviation: The standard deviation is low across the board, indicating that the correlations are consistently high among models, with slightly more variability in the Artistic (A) and Conventional (C) dimensions.
Minimum and Maximum Correlations: The minimum correlations for the M2 – M14 range are notably lower, especially in the Artistic (A) and Conventional (C) dimensions, suggesting that some models might not agree as well when incorporating KSA/GWA ratings.
Table 2.11 and Table 2.12 present the regression coefficients and relative importance estimates for the best first-stage ensembles for each RIASEC dimension. These tables provide insights into how different models contribute to the prediction of each RIASEC dimension and the relative importance of each model in the ensemble.
B (Regression Coefficient): The coefficient represents the contribution of each model to the prediction of a particular RIASEC dimension. A higher absolute value indicates a stronger contribution.
RI (Relative Importance): This value reflects the proportion of the ensemble R2R^2R2 that is attributable to the given model, based on a general dominance analysis. Higher values indicate that the model is more important in the ensemble for predicting that particular dimension.