📊 VIF: The Mathematics Explained

Understanding Multicollinearity Through Real-World Examples

🎯 The Two Different R² Values - DON'T MIX THEM UP!

✅ MAIN MODEL R²

Predicting Your Target

Features → Target Variable

Example:

[TV, Facebook, Email] → Sales
R² = 0.92
HIGH IS GOOD! 📈

This means your features explain 92% of sales variation. Excellent!

🚨 VIF CALCULATION R²

Predicting One Feature from Others

Other Features → One Feature

Example:

[Facebook, Email] → TV
R² = 0.85
HIGH IS BAD! ⚠️

This means other features can predict TV spend. Multicollinearity alert!

📊 Main Model (What We Want)

X₁, X₂, X₃
Y (Target)
R² = 0.95 ✅

🔍 VIF Check (What We Test)

X₂, X₃
X₁ (Another Feature)
R² = 0.85 🚨

⚡ The Dangerous Paradox

A model with R² = 0.95 and high VIF looks amazing but fails in production!

A model with R² = 0.85 and low VIF looks worse but actually performs better!

🎯 The Core Formula (With Context)

VIFi = 1 / (1 - R²i)
Critical Understanding:
This R²i is NOT your model's R²!
i = How well OTHER features predict feature i
High R² in main model = GOOD
High R² in VIF calculation = BAD
Scenario Main Model R² VIF R² (Feature-to-Feature) Interpretation
Ideal Model 0.90 ✅ 0.20 ✅ Perfect! High prediction, low multicollinearity
Deceptive Model 0.95 ✅ 0.90 🚨 Dangerous! Looks good but unstable
Honest Model 0.75 ⚠️ 0.30 ✅ Reliable! Lower R² but trustworthy
🛒 Real Case: E-commerce Disaster

When High R² Fooled Amazon's Data Team

The Misleading Model

Main Model:
Sales ~ TV + Google + YouTube + Email
R² = 0.94 (Looks amazing! 🎉)

But VIF Analysis:
VIF(Google) = 25
VIF(YouTube) = 28
Because: Google ↔ YouTube R² = 0.96

Result: Model suggests cutting TV, boosting YouTube. Sales dropped 15%!

The Fixed Model

Main Model:
Sales ~ TV + Digital_Combined + Email
R² = 0.88 (Lower but honest 📊)

VIF Analysis:
VIF(TV) = 2.1
VIF(Digital) = 2.3
VIF(Email) = 1.5

Result: Clear insights, stable coefficients, 20% sales increase!

💡 Key Lesson: They sacrificed 6% of R² (0.94 → 0.88) but gained a model that actually worked! The high R² was artificial - multiple features were explaining the same variance.

📐 Complete VIF Calculation Example

Marketing Data:

Week TV ($k) Facebook ($k) Instagram ($k) Sales ($k)
1502018500
2602523580
3401514420
4703028650
1 Main Model R² (What we want high):
Sales ~ TV + Facebook + Instagram
R² = 0.92 ← This is GOOD!
Our features explain 92% of sales variance ✅
2 VIF Calculation for Facebook (Check for multicollinearity):
Facebook ~ TV + Instagram
R² = 0.95 ← This is BAD!
Other features can predict Facebook too well 🚨
3 Calculate VIF:
VIF = 1 / (1 - 0.95) = 1 / 0.05 = 20.0
Variance inflated 20x! Coefficients unreliable!
The Trap: Model has high R² (0.92) but high VIF (20). It fits training data well but will fail on new data because it can't separate TV, Facebook, and Instagram effects!
🏠 Scenario: Real Estate Model

Zillow's Two R² Story

Model Performance:

Price ~ SqFt + Beds + Baths + Location
Main Model R² = 0.89 ✅
(Explains 89% of price variation)

VIF Check:

Bedrooms ~ SqFt + Baths + Location
VIF Calculation R² = 0.78 🚨
VIF = 1/(1-0.78) = 4.55
(Moderate multicollinearity)

Decision: Keep model but monitor. Main R² is good, VIF is borderline acceptable.

🏥 Scenario: Patient Risk Model

Hospital's Confusing R² Results

Model Component R² Value Context Good or Bad?
Risk ~ BMI + Weight + BP + Age 0.85 Main Model Good! ✅
Weight ~ BMI + BP + Age 0.92 VIF Check Bad! 🚨
BMI ~ Weight + BP + Age 0.91 VIF Check Bad! 🚨
Age ~ BMI + Weight + BP 0.15 VIF Check Good! ✅
💡 Solution: Remove Weight (keep BMI) since BMI = Weight/Height². New model R² drops to 0.82 but VIF problems solved!

📊 Quick Reference: Which R² Am I Looking At?

Main Model R²
Features → Target
Higher = Better
Goal: > 0.70
VIF Calculation R²
Features → Feature
Lower = Better
Danger: > 0.80
VIF Formula
VIF = 1/(1-R²)
R² = 0.80 → VIF = 5
R² = 0.90 → VIF = 10

The Complete Picture:

R² in VIF VIF Value Main Model R² Model Status
0.50 2.0 0.95 Excellent Model ✅
0.80 5.0 0.90 Good but Monitor ⚠️
0.90 10.0 0.95 Deceptive Model 🚨
0.95 20.0 0.98 Will Fail in Production 💥
🏭 Scenario: Tesla Battery Production

When Perfect R² is Actually Terrible

The Deceptive Success:

Main Model:
Battery_Life ~ Temperature + Pressure + Humidity + Speed
R² = 0.96 (Management loves this! 🎊)

But wait... VIF Analysis:
Temperature ~ Pressure + Others: R² = 0.85, VIF = 6.67
Pressure ~ Temperature + Others: R² = 0.85, VIF = 6.67

Physics says: P ∝ T (Gay-Lussac's Law)
They're measuring the same thing!

What happened: Model couldn't determine if temperature or pressure was the key factor. Changed pressure settings, temperature also changed. Production failed.

Solution: Use only temperature (or only pressure), accept R² = 0.91, get stable production.

🎯 Remember: The Two R² Rule

📈 Main Model R²

What: How well features predict target

Want: HIGH (>0.7)

Means: Good prediction

🔍 VIF Calculation R²

What: How well features predict each other

Want: LOW (<0.8)

Means: Independent features

⚖️ The Trade-off

Accept: Lower main R²

To get: Lower VIF

Result: Reliable model

🏆 The Golden Rule:
"A model with R²=0.85 and VIF<5 beats a model with R²=0.95 and VIF>10 every single time in production!"

High R² + High VIF = House of Cards 🃏
Moderate R² + Low VIF = Rock Solid Foundation 🏛️