Research
Using Machine Learning to Analyze and Predict India's GDP Growth Based on Socioeconomic Indicators
This project explores the use of machine learning to predict GDP growth for India. India's economy is projected to become the second largest economy in the world by 2075, according to Goldman Sachs. My model employs socioeconomic indicators such as education enrollment rates, public debt ratios, and consumption patterns to build a regression model for forecasting GDP. By identifying key features and understanding their correlations with GDP growth, this model provides insights into economic trends and policy planning.
Tools Used
Goal
Uncover patterns in GDP growth
Loaded socioeconomic indicators dataset from the World Bank and handled missing values using linear interpolation and backfill methods. Normalized features with Z-score standardization for consistent scaling and enhanced model performance.
Selected key indicators with the highest correlations to GDP growth through statistical analysis. Reduced multicollinearity by examining relationships among features, ensuring robust predictions. Read research papers to understand relationships between features and GDP growth.
Built a regression model using XGBoost to predict GDP growth. Split the data into training (pre-2010) and testing (post-2010) sets. Evaluated performance using Mean Squared Error (MSE) and correlation coefficient.
Validated predictions by comparing model outputs with World Bank GDP forecasts, demonstrating the model's accuracy and reliability in economic forecasting.
Features | Correlation |
---|---|
GDP Growth | 1.0 |
School Enrollment, Tertiary, Female | 0.21 |
Public and Publicly Guaranteed Debt Service | -0.12 |
Prevalence of Stunting, Height for Age | -0.35 |
School Enrollment, Secondary | -0.33 |
Terms of Trade Adjustment | -0.32 |
Inflation, Consumer Prices | -0.03 |
Children Out of School, Male | -0.37 |
Taxes on Goods and Services | -0.27 |
Pupil-teacher Ratio, Primary | -0.22 |
Services, Value Added | 0.72 |
GNI Growth | 1.0 |
Final Consumption Expenditure | 0.85 |
Real Interest Rate | -0.15 |
Changes in Inventories | 0.41 |
Electric Power Consumption | 0.3 |
Air Transport, Passengers Carried | 0.23 |
The model achieved an MSE of 5.4% compared to the World Bank's model, that has an MSE of 9.8%. One reason was due to my model's ability to capture the covid dip in 2020, which the World Bank's model failed to do. My model takes in the 19 features from the year before, making it more accurate in predicting the next year's GDP growth. Additionally, my model achieved a correlation coefficient of 0.95, demonstrating the model's accuracy over the years. This research won third place in the 2024 National Student Data Corps Data Science Symposium due to the impacts it can have on the world of Economics.
This research has the potential to revolutionize the field of economics by providing policymakers with a more accurate and reliable tool for forecasting GDP growth. By identifying key indicators that influence economic performance, this model can inform policy decisions and help governments allocate resources more effectively. Especially in developing countries like India, where economic growth is critical for poverty reduction and social development, this model can play a crucial role in shaping the country's future. Given the fact I was able to design, build, and train this model within a week, the potential for further research and development is immense.