📊 VIF: The Mathematics Explained
Understanding Multicollinearity Through Real-World Examples
🎯 The Two Different R² Values - DON'T MIX THEM UP!
✅ MAIN MODEL R²
Predicting Your Target
Features → Target Variable
Example:
[TV, Facebook, Email] → Sales
R² = 0.92
HIGH IS GOOD! 📈
This means your features explain 92% of sales variation. Excellent!
🚨 VIF CALCULATION R²
Predicting One Feature from Others
Other Features → One Feature
Example:
[Facebook, Email] → TV
R² = 0.85
HIGH IS BAD! ⚠️
This means other features can predict TV spend. Multicollinearity alert!
📊 Main Model (What We Want)
X₁, X₂, X₃
↓
Y (Target)
R² = 0.95 ✅
≠
🔍 VIF Check (What We Test)
X₂, X₃
↓
X₁ (Another Feature)
R² = 0.85 🚨
⚡ The Dangerous Paradox
A model with R² = 0.95 and high VIF looks amazing but fails in production!
A model with R² = 0.85 and low VIF looks worse but actually performs better!
🎯 The Core Formula (With Context)
VIFi = 1 / (1 - R²i)
Critical Understanding:
This R²i is NOT your model's R²!
R²i = How well OTHER features predict feature i
High R² in main model = GOOD
High R² in VIF calculation = BAD
Scenario |
Main Model R² |
VIF R² (Feature-to-Feature) |
Interpretation |
Ideal Model |
0.90 ✅ |
0.20 ✅ |
Perfect! High prediction, low multicollinearity |
Deceptive Model |
0.95 ✅ |
0.90 🚨 |
Dangerous! Looks good but unstable |
Honest Model |
0.75 ⚠️ |
0.30 ✅ |
Reliable! Lower R² but trustworthy |
🛒 Real Case: E-commerce Disaster
When High R² Fooled Amazon's Data Team
The Misleading Model
Main Model:
Sales ~ TV + Google + YouTube + Email
R² = 0.94 (Looks amazing! 🎉)
But VIF Analysis:
VIF(Google) = 25
VIF(YouTube) = 28
Because: Google ↔ YouTube R² = 0.96
Result: Model suggests cutting TV, boosting YouTube. Sales dropped 15%!
The Fixed Model
Main Model:
Sales ~ TV + Digital_Combined + Email
R² = 0.88 (Lower but honest 📊)
VIF Analysis:
VIF(TV) = 2.1
VIF(Digital) = 2.3
VIF(Email) = 1.5
Result: Clear insights, stable coefficients, 20% sales increase!
💡 Key Lesson: They sacrificed 6% of R² (0.94 → 0.88) but gained a model that actually worked!
The high R² was artificial - multiple features were explaining the same variance.
📐 Complete VIF Calculation Example
Marketing Data:
Week |
TV ($k) |
Facebook ($k) |
Instagram ($k) |
Sales ($k) |
1 | 50 | 20 | 18 | 500 |
2 | 60 | 25 | 23 | 580 |
3 | 40 | 15 | 14 | 420 |
4 | 70 | 30 | 28 | 650 |
1
Main Model R² (What we want high):
Sales ~ TV + Facebook + Instagram
R² = 0.92 ← This is GOOD!
Our features explain 92% of sales variance ✅
2
VIF Calculation for Facebook (Check for multicollinearity):
Facebook ~ TV + Instagram
R² = 0.95 ← This is BAD!
Other features can predict Facebook too well 🚨
3
Calculate VIF:
VIF = 1 / (1 - 0.95) = 1 / 0.05 = 20.0
Variance inflated 20x! Coefficients unreliable!
The Trap: Model has high R² (0.92) but high VIF (20).
It fits training data well but will fail on new data because it can't
separate TV, Facebook, and Instagram effects!
🏠 Scenario: Real Estate Model
Zillow's Two R² Story
Model Performance:
Price ~ SqFt + Beds + Baths + Location
Main Model R² = 0.89 ✅
(Explains 89% of price variation)
VIF Check:
Bedrooms ~ SqFt + Baths + Location
VIF Calculation R² = 0.78 🚨
VIF = 1/(1-0.78) = 4.55
(Moderate multicollinearity)
Decision: Keep model but monitor. Main R² is good, VIF is borderline acceptable.
🏥 Scenario: Patient Risk Model
Hospital's Confusing R² Results
Model Component |
R² Value |
Context |
Good or Bad? |
Risk ~ BMI + Weight + BP + Age |
0.85 |
Main Model |
Good! ✅ |
Weight ~ BMI + BP + Age |
0.92 |
VIF Check |
Bad! 🚨 |
BMI ~ Weight + BP + Age |
0.91 |
VIF Check |
Bad! 🚨 |
Age ~ BMI + Weight + BP |
0.15 |
VIF Check |
Good! ✅ |
💡 Solution: Remove Weight (keep BMI) since BMI = Weight/Height².
New model R² drops to 0.82 but VIF problems solved!
📊 Quick Reference: Which R² Am I Looking At?
Main Model R²
Features → Target
Higher = Better
Goal: > 0.70
VIF Calculation R²
Features → Feature
Lower = Better
Danger: > 0.80
VIF Formula
VIF = 1/(1-R²)
R² = 0.80 → VIF = 5
R² = 0.90 → VIF = 10
The Complete Picture:
R² in VIF |
VIF Value |
Main Model R² |
Model Status |
0.50 |
2.0 |
0.95 |
Excellent Model ✅ |
0.80 |
5.0 |
0.90 |
Good but Monitor ⚠️ |
0.90 |
10.0 |
0.95 |
Deceptive Model 🚨 |
0.95 |
20.0 |
0.98 |
Will Fail in Production 💥 |
🏭 Scenario: Tesla Battery Production
When Perfect R² is Actually Terrible
The Deceptive Success:
Main Model:
Battery_Life ~ Temperature + Pressure + Humidity + Speed
R² = 0.96 (Management loves this! 🎊)
But wait... VIF Analysis:
Temperature ~ Pressure + Others: R² = 0.85, VIF = 6.67
Pressure ~ Temperature + Others: R² = 0.85, VIF = 6.67
Physics says: P ∝ T (Gay-Lussac's Law)
They're measuring the same thing!
What happened: Model couldn't determine if temperature or pressure was the key factor.
Changed pressure settings, temperature also changed. Production failed.
Solution: Use only temperature (or only pressure), accept R² = 0.91, get stable production.
🎯 Remember: The Two R² Rule
📈 Main Model R²
What: How well features predict target
Want: HIGH (>0.7)
Means: Good prediction
🔍 VIF Calculation R²
What: How well features predict each other
Want: LOW (<0.8)
Means: Independent features
⚖️ The Trade-off
Accept: Lower main R²
To get: Lower VIF
Result: Reliable model
🏆 The Golden Rule:
"A model with R²=0.85 and VIF<5 beats a model with R²=0.95 and VIF>10 every single time in production!"
High R² + High VIF = House of Cards 🃏
Moderate R² + Low VIF = Rock Solid Foundation 🏛️