📊 VIF: The Mathematics Explained

Understanding Multicollinearity Through Real-World Examples

🎯 The Two Different R² Values - DON'T MIX THEM UP!

✅ MAIN MODEL R²

Predicting Your Target

Features → Target Variable

Example:

[TV, Facebook, Email] → Sales
R² = 0.92
HIGH IS GOOD! 📈

This means your features explain 92% of sales variation. Excellent!

🚨 VIF CALCULATION R²

Predicting One Feature from Others

Other Features → One Feature

Example:

[Facebook, Email] → TV
R² = 0.85
HIGH IS BAD! ⚠️

This means other features can predict TV spend. Multicollinearity alert!

📊 Main Model (What We Want)

X₁, X₂, X₃

↓

Y (Target)

R² = 0.95 ✅

≠

🔍 VIF Check (What We Test)

X₂, X₃

↓

X₁ (Another Feature)

R² = 0.85 🚨

⚡ The Dangerous Paradox

A model with R² = 0.95 and high VIF looks amazing but fails in production!

A model with R² = 0.85 and low VIF looks worse but actually performs better!

🎯 The Core Formula (With Context)

VIF_i = 1 / (1 - R²_i)

Critical Understanding:
This R²_i is NOT your model's R²!
R²_i = How well OTHER features predict feature i
High R² in main model = GOOD
High R² in VIF calculation = BAD

Scenario	Main Model R²	VIF R² (Feature-to-Feature)	Interpretation
Ideal Model	0.90 ✅	0.20 ✅	Perfect! High prediction, low multicollinearity
Deceptive Model	0.95 ✅	0.90 🚨	Dangerous! Looks good but unstable
Honest Model	0.75 ⚠️	0.30 ✅	Reliable! Lower R² but trustworthy

🛒 Real Case: E-commerce Disaster

When High R² Fooled Amazon's Data Team

The Misleading Model

Main Model:
Sales ~ TV + Google + YouTube + Email
R² = 0.94 (Looks amazing! 🎉)

But VIF Analysis:
VIF(Google) = 25
VIF(YouTube) = 28
Because: Google ↔ YouTube R² = 0.96

Result: Model suggests cutting TV, boosting YouTube. Sales dropped 15%!

The Fixed Model

Main Model:
Sales ~ TV + Digital_Combined + Email
R² = 0.88 (Lower but honest 📊)

VIF Analysis:
VIF(TV) = 2.1
VIF(Digital) = 2.3
VIF(Email) = 1.5

Result: Clear insights, stable coefficients, 20% sales increase!

💡 Key Lesson: They sacrificed 6% of R² (0.94 → 0.88) but gained a model that actually worked! The high R² was artificial - multiple features were explaining the same variance.

📐 Complete VIF Calculation Example

Marketing Data:

Week	TV ($k)	Facebook ($k)	Instagram ($k)	Sales ($k)
1	50	20	18	500
2	60	25	23	580
3	40	15	14	420
4	70	30	28	650

1 Main Model R² (What we want high):

Sales ~ TV + Facebook + Instagram
R² = 0.92 ← This is GOOD!
Our features explain 92% of sales variance ✅

2 VIF Calculation for Facebook (Check for multicollinearity):

Facebook ~ TV + Instagram
R² = 0.95 ← This is BAD!
Other features can predict Facebook too well 🚨

3 Calculate VIF:

VIF = 1 / (1 - 0.95) = 1 / 0.05 = 20.0
Variance inflated 20x! Coefficients unreliable!

The Trap: Model has high R² (0.92) but high VIF (20). It fits training data well but will fail on new data because it can't separate TV, Facebook, and Instagram effects!

🏠 Scenario: Real Estate Model

Zillow's Two R² Story

Model Performance:

Price ~ SqFt + Beds + Baths + Location
Main Model R² = 0.89 ✅
(Explains 89% of price variation)

VIF Check:

Bedrooms ~ SqFt + Baths + Location
VIF Calculation R² = 0.78 🚨
VIF = 1/(1-0.78) = 4.55
(Moderate multicollinearity)

Decision: Keep model but monitor. Main R² is good, VIF is borderline acceptable.

🏥 Scenario: Patient Risk Model

Hospital's Confusing R² Results

Model Component	R² Value	Context	Good or Bad?
Risk ~ BMI + Weight + BP + Age	0.85	Main Model	Good! ✅
Weight ~ BMI + BP + Age	0.92	VIF Check	Bad! 🚨
BMI ~ Weight + BP + Age	0.91	VIF Check	Bad! 🚨
Age ~ BMI + Weight + BP	0.15	VIF Check	Good! ✅

💡 Solution: Remove Weight (keep BMI) since BMI = Weight/Height². New model R² drops to 0.82 but VIF problems solved!

📊 Quick Reference: Which R² Am I Looking At?

Main Model R²
Features → Target
Higher = Better
Goal: > 0.70

VIF Calculation R²
Features → Feature
Lower = Better
Danger: > 0.80

VIF Formula
VIF = 1/(1-R²)
R² = 0.80 → VIF = 5
R² = 0.90 → VIF = 10

The Complete Picture:

R² in VIF	VIF Value	Main Model R²	Model Status
0.50	2.0	0.95	Excellent Model ✅
0.80	5.0	0.90	Good but Monitor ⚠️
0.90	10.0	0.95	Deceptive Model 🚨
0.95	20.0	0.98	Will Fail in Production 💥

🏭 Scenario: Tesla Battery Production

When Perfect R² is Actually Terrible

The Deceptive Success:

Main Model:
Battery_Life ~ Temperature + Pressure + Humidity + Speed
R² = 0.96 (Management loves this! 🎊)

But wait... VIF Analysis:
Temperature ~ Pressure + Others: R² = 0.85, VIF = 6.67
Pressure ~ Temperature + Others: R² = 0.85, VIF = 6.67

Physics says: P ∝ T (Gay-Lussac's Law)
They're measuring the same thing!

What happened: Model couldn't determine if temperature or pressure was the key factor. Changed pressure settings, temperature also changed. Production failed.

Solution: Use only temperature (or only pressure), accept R² = 0.91, get stable production.

🎯 Remember: The Two R² Rule

📈 Main Model R²

What: How well features predict target

Want: HIGH (>0.7)

Means: Good prediction

🔍 VIF Calculation R²

What: How well features predict each other

Want: LOW (<0.8)

Means: Independent features

⚖️ The Trade-off

Accept: Lower main R²

To get: Lower VIF

Result: Reliable model

🏆 The Golden Rule:
"A model with R²=0.85 and VIF<5 beats a model with R²=0.95 and VIF>10 every single time in production!"

High R² + High VIF = House of Cards 🃏
Moderate R² + Low VIF = Rock Solid Foundation 🏛️