Skip to main content

Command Palette

Search for a command to run...

Negative R² in Interviews: It’s Not a Bug—It’s Your Model Losing to the Mean

Updated
2 min read
Negative R² in Interviews: It’s Not a Bug—It’s Your Model Losing to the Mean

Negative R² in Interviews: It’s Not a Bug—It’s Your Model Losing to the Mean

Negative R-squared illustration

Many candidates assume R² must lie between 0 and 1. That’s a common misconception. R² is defined as:

R² = 1 − SSE / SST

where SSE is the model's sum of squared errors and SST is the total sum of squares (the variance around the mean). If SSE > SST, the fraction SSE/SST exceeds 1 and R² becomes negative. In plain language: your model predicts worse than a trivial predictor that always outputs the mean ȳ.

Why R² goes negative

  • Baseline reminder: the mean predictor ȳ has SSE = SST, so it has R² = 0 by definition. Negative R² simply means your model's SSE is larger than that baseline.
  • Common causes:
    • Model misspecification: e.g., fitting a straight line to strongly nonlinear data.
    • Evaluating on a test set after overfitting the train set: the model generalizes poorly so SSE_test outgrows SST_test.
    • Forcing a model without an intercept: removing the intercept often increases SSE relative to SST.
    • Bad preprocessing or target leakage issues that break generalization.

Quick numeric example

Suppose SST = 100 and SSE = 150. Then

R² = 1 − 150 / 100 = 1 − 1.5 = −0.5.

Interpretation: your model is making predictions so bad that the mean of y would have been a better forecast.

What to say (and why) in an interview

Be concise and precise. For example:

"Negative R² means the model's SSE exceeds the total variance SST, so it performs worse than the baseline predictor ȳ. Typical causes are misspecification (e.g., linear model for nonlinear data), evaluating on a test set after severe overfitting, or removing the intercept. Remedies include adding an intercept, transforming features, regularization, or choosing a more appropriate model."

That answer shows you know the formula, the intuition (comparison to the mean), and practical causes and fixes.

How to fix or avoid it

  • Restore/allow an intercept unless there's a good reason not to.
  • Try feature transformations (polynomials, logs) or non-linear models.
  • Use regularization (Ridge, Lasso) to prevent overfitting.
  • Validate on a properly held-out test set or with cross-validation.
  • Check data preprocessing and leakage issues.

Bottom line

Negative R² isn't a bug in the metric — it's a signal. It tells you the model is doing worse than the dumb baseline that always predicts the mean. Say that clearly in interviews and follow up with likely causes and fixes.

More from this blog

B

bugfree.ai

363 posts

bugfree.ai is an advanced AI-powered platform designed to help software engineers and data scientist to master system design and behavioral and data interviews.