Modeling Approach

Design principles

We build our models with four principles in mind:

  • INCLUSIVITY - our models should promote access and recognize talent in all places
  • INTERPRETABILITY - you should know exactly what our models are doing
  • ACCURACY - our models do a good job at predicting specific and general outcomes
  • CREATIVITY - we embrace new approaches and novel connections
    We are aware that, in some instances, the above principles come into conflict. In these cases, we are proactive and transparent with our users about the decision we made, and our rationale.

The importance of Skills AND Outcomes

Mapping talent on the basis of skills and outcomes leads to accuracy and inclusivity. That’s why AdeptID balances both techniques in its approach.

Skills-based models

We look beyond the job title to see the underlying skills a person has developed. This allows us to map the distance between any two jobs based on those underlying skills.

→ This is not widely adopted by the market, but not particularly novel. It is straightforward if a bit simplistic

Outcomes-based models

We take into account real employment data to surface which skills are associated with successful results.

Using real outcomes makes our models more empirical and frees us from having to adopt a single, rigid taxonomy.

→ This approach is novel and requires a lot of data. If it is unmonitored and lacks the proper safeguards, it can be highly susceptible to bias.

How we define “Skills”

We have an intentionally open-ended definition of “skills”. For us, skills are descriptors of the capacity of individuals that could serve as potential predictors of employment and/or educational outcomes.

To our models, skills are numeric (scalar), predictive features used to describe individuals and occupations. Our sources provide us with >10,000 scalar attributes describing >1,000 distinct occupations. Rather than focusing on a single data source for skills, we use collaborative filtering and singular value decomposition (SVD) to combine skill attributes across a variety of taxonomies into a set of latent skill variables. (see Modeling Techniques for further detail)

How we define “Outcomes”

Our approach to modeling allows us to train models based on different types of outcomes. The conventional definition of a “successful” outcome is a person getting hired for a job in a new occupation. Our models look at hundreds of thousands of past attempted transitions - people trying and either succeeding or failing to get a job in a new occupation - in order to both predict the success of similar transitions and to understand the contributing factors to that success (which allows us to identify skills gaps and training opportunities).

We recognize that “success” shouldn’t just be defined by getting a job, but by thriving in it (e.g. getting promoted, experiencing steady wage gains, or maintaining good health). In private contexts, we have been able to train models to predict and understand these “higher order” definitions of success. However, the core models available through the Mobility API are associated with this more straightforward measure.

Data sources

Skills

  • O*NET
  • Commercially available skills taxonomies, which we have indexed to one another (allowing us to map to any given partner taxonomy)

Outcomes

  • Hiring decisions from employers
  • Training provider placement “outcomes”

Macroeconomic / Other contextual

  • Occupational Employment and Wage Statistics (OEWS) from the Bureau of Labor Statistics (BLS)
  • Integrated Postsecondary Education Data System (IPEDS)
  • Current Population Survey (CPS) from the Census Bureau
  • National (US) Jobs Postings Feeds (EMSI)

Modeling Techniques

Our system combines several forms of AI.

Our core match product recommends talent to jobs and training on the basis of work history, education, skills (both attested & inferred), assessments, and individual preferences. Each of these components has models associated with them, which we ensemble together to create a holistic understanding of “fit” between person and job.

In several of these component models, we use NLP approaches and vectorization to determine semantic meaning for semi-structured text (including job histories, job descriptions, free responses), this allows our models to “understand” talent, training, and demand (jobs).

Our approach to matching uses logistic regression applied to a binary classification problem. We train models on individual-level, historic hiring data from applicant tracking systems (ATS), and other anonymized employment outcomes data from our userbase.

To avoid overfitting to specific sectors of the workforce, we augment the employment outcomes data we have collected from our partners with synthetic data generated using the underlying skill distances between occupations. As we increase the breadth of real outcomes data, we will refresh and/or replace this synthetic data from our models.

To train our models, we construct representation spaces to describe the entities associated with the hiring prediction: talent, employers, and training providers (see below). Our initial public-facing models use only the talent representation space described below. Private-context (and future public) models incorporate additional features for employers and training providers.

Our current models use ~3,000 predictive features based on occupational skills. The first group of features describe the skills associated with an individual’s prior work history. They are constructed using the average skill values of each of the individuals’ prior jobs. The next set of features describes the skills associated with the target occupation of the transition. A third set of features describes the distances between prior work history and target occupations on the basis of constituent skills. The features in each of these categories represent the variation between occupations across all >10,000 raw skill attributes in our dataset. We achieve this compressed representation using singular value decomposition (SVD) on the matrix of occupations and skills described below (see Talent representation space).

With this model formulation, we train our logistic regression model with regularization. Post-training our model allows us to both make novel predictions on new individuals as well as determine what skills and/or attributes are most important to successfully transitioning into new employment (by investigating the learned model weights).

Algorithmic Bias

Any system that uses historic outcomes to build predictive models runs the risk of inheriting bias in those historic examples. This fact requires AdeptID (and all developers of such models) to develop tools and processes to prevent algorithmic bias from influencing the predictions made by its algorithms. At AdeptID our approach contains several pillars:

  • All AdeptID models in production have passed 3rd-party audits.
  • Partnering with mission-aligned organizations (Year Up, Grads of Life, and others)
  • Continuous monitoring of our algorithms for bias
  • Implementation of technical solutions to counteract biases present in training data
  • We are committed to evaluating and integrating all available methods for bias removal into our predictive models. This will always be an area of active research at AdeptID.

Measuring Disparate Impact

Disparate Impact refers to practices in employment, housing, and other areas that adversely affect one group of people of a protected characteristic more than another, even though rules applied by employers or landlords are formally neutral. Wherever possible (i.e. in cases where we’ve received demographic data) we monitor our models for disparate impact.

In addition to having audits performed by third parties, we actively test intenral and external tools to mitigate disparate impact.