DariefLet's talk

Research

Indian Economic Research

Using Machine Learning to Analyze and Predict India's GDP Growth Based on Socioeconomic Indicators

This project explores the use of machine learning to predict GDP growth for India. India's economy is projected to become the second largest economy in the world by 2075, according to Goldman Sachs. My model employs socioeconomic indicators such as education enrollment rates, public debt ratios, and consumption patterns to build a regression model for forecasting GDP. By identifying key features and understanding their correlations with GDP growth, this model provides insights into economic trends and policy planning.

Tools Used

Python
Pandas
NumPy
Seaborn
Scikit-learn
Data Handling

Goal

Uncover patterns in GDP growth

Methodology


  1. Data Preprocessing

    Loaded socioeconomic indicators dataset from the World Bank and handled missing values using linear interpolation and backfill methods. Normalized features with Z-score standardization for consistent scaling and enhanced model performance.

  2. Feature Engineering

    Selected key indicators with the highest correlations to GDP growth through statistical analysis. Reduced multicollinearity by examining relationships among features, ensuring robust predictions. Read research papers to understand relationships between features and GDP growth.

  3. Modeling

    Built a regression model using XGBoost to predict GDP growth. Split the data into training (pre-2010) and testing (post-2010) sets. Evaluated performance using Mean Squared Error (MSE) and correlation coefficient.

  4. Comparison

    Validated predictions by comparing model outputs with World Bank GDP forecasts, demonstrating the model's accuracy and reliability in economic forecasting.

FeaturesCorrelation
GDP Growth1.0
School Enrollment, Tertiary, Female0.21
Public and Publicly Guaranteed Debt Service-0.12
Prevalence of Stunting, Height for Age-0.35
School Enrollment, Secondary-0.33
Terms of Trade Adjustment-0.32
Inflation, Consumer Prices-0.03
Children Out of School, Male-0.37
Taxes on Goods and Services-0.27
Pupil-teacher Ratio, Primary-0.22
Services, Value Added0.72
GNI Growth1.0
Final Consumption Expenditure0.85
Real Interest Rate-0.15
Changes in Inventories0.41
Electric Power Consumption0.3
Air Transport, Passengers Carried0.23

Results

The model achieved an MSE of 5.4% compared to the World Bank's model, that has an MSE of 9.8%. One reason was due to my model's ability to capture the covid dip in 2020, which the World Bank's model failed to do. My model takes in the 19 features from the year before, making it more accurate in predicting the next year's GDP growth. Additionally, my model achieved a correlation coefficient of 0.95, demonstrating the model's accuracy over the years. This research won third place in the 2024 National Student Data Corps Data Science Symposium due to the impacts it can have on the world of Economics.

Impacts

This research has the potential to revolutionize the field of economics by providing policymakers with a more accurate and reliable tool for forecasting GDP growth. By identifying key indicators that influence economic performance, this model can inform policy decisions and help governments allocate resources more effectively. Especially in developing countries like India, where economic growth is critical for poverty reduction and social development, this model can play a crucial role in shaping the country's future. Given the fact I was able to design, build, and train this model within a week, the potential for further research and development is immense.