Grades 9–10 · AI Leaders · Activity 03 of 06 · Bias Audit Project
🗂️ Reference Card · Use Alongside Your Audit

Bias Taxonomy & Root Cause Guide

Keep this page open while completing your Evidence Ledger (Act 02). Use the taxonomy to label bias types precisely; use the root cause list to explain why the bias exists.

Part A — 5 Types of Algorithmic Bias
1. Representation Bias
Training data does not include enough examples from all groups, so the model performs worse on underrepresented populations.
Example: Face recognition trained mostly on lighter skin tones fails on darker skin tones — not because it was programmed to discriminate, but because it saw very few examples of darker faces.
2. Historical Bias
Training data accurately reflects a past that was itself biased. The AI learns and perpetuates past inequalities as if they are neutral facts.
Example: A hiring AI trained on 10 years of records from a company that hired mostly men learns that "successful employee = male" — because that is what the historical data shows.
3. Linguistic Bias
The model rewards specific registers, dialects, or vocabulary patterns — typically those associated with educated or dominant groups — penalising all others.
Example: An essay grader trained on formal academic English penalises African American Vernacular English (AAVE) as "grammatically incorrect" — despite AAVE being a fully rule-governed dialect.
4. Measurement Bias
The metric used to train or evaluate the model does not accurately capture what it is intended to measure, creating systematic errors.
Example: A recidivism prediction tool uses arrest rates as a proxy for "criminal behaviour." But arrest rates are themselves influenced by discriminatory policing, so the model encodes those disparities.
5. Proxy Bias
A variable that correlates with a protected characteristic (race, gender, etc.) is used as an input, embedding discrimination indirectly even when the protected attribute is excluded.
Example: A loan algorithm excludes race but uses zip code. In a segregated city, zip code correlates strongly with race — the model discriminates by proxy.
Part B — 5 Root Causes of Bias in AI Systems
RC1
Underrepresentation in Training Data
The dataset used to train the model did not include sufficient examples from all demographic groups, causing the model to generalise poorly for underrepresented populations.
RC2
Historical Data Reflecting Societal Inequalities
Training data was sourced from records (hiring, criminal justice, lending) that themselves reflected past discrimination. The model learned those patterns as ground truth.
RC3
Use of Proxy Variables
Variables that correlate with protected characteristics (address, school name, extracurriculars) were included as features, allowing discrimination to occur through indirect pathways.
RC4
Optimisation for a Biased Objective Metric
The model was trained to maximise a metric (engagement, clicks, historical approval rates) that was itself the product of biased human behaviour, amplifying those biases at scale.
RC5
Lack of Diverse Perspectives on Development Team
The team designing, building, and testing the model lacked demographic diversity, causing bias patterns affecting underrepresented groups to go unnoticed during development and testing.
Bias types found in my audit
1. Representation
2. Historical
3. Linguistic
4. Measurement
5. Proxy
Root causes that apply (circle)
RC1 — Underrepresentation
RC2 — Historical data
RC3 — Proxy variables
RC4 — Biased metric
RC5 — Team diversity
Primary root cause for my audit (write RC number + 1-sentence justification):