Precise Identification of Multi-Regional Relative Poverty: A Two-Stage Knowledge-Distilled Adaptive Framework

Guanghuang Liu

doi:10.71204/hefd5f67

Authors

Guanghuang Liu Taiyuan Normal University Author

DOI:

https://doi.org/10.71204/hefd5f67

Keywords:

Relative Poverty Prediction, Gradient-Boosted Trees, Knowledge Distillation, Spatial Heterogeneity, extremely imbalanced classification, Bayesian Optimization

Abstract

As poverty reduction strategies shift comprehensively toward alleviating relative poverty, precisely identifying multidimensional relative poverty populations has become critical for social governance. However, existing data-driven models often face algorithmic bottlenecks—such as spatial heterogeneity, regional sample sparsity, and extreme category imbalance—when applied to complex scenarios characterized by vast territories and significant regional disparities. To address this, this study proposes a two-stage knowledge-distilled adaptive gradient boosting framework (TKDAF). First, in the prior knowledge extraction stage (Stage I), a base structure-regularized gradient boosting tree model is constructed. Combined with SHAP game-theoretic attribution, this stage quantifies and extracts the global objective weights of poverty-inducing features across regions. Second, in the spatial adaptive enhancement stage (Stage II), the S-DAGB (Spatial-Adaptive Distilled Gradient Boosting) core prediction model is introduced. It achieves deep integration of multiple regularization and feature enhancement mechanisms by incorporating: feature space nonlinear reconstruction based on prior knowledge, a dynamic category weighting mechanism based on effective sample size (ENS), and spatial adaptive optimization using the TPE Bayesian algorithm.Empirical results based on the 2020 China Family Panel Studies (CFPS) multidimensional dataset demonstrate that the S-DAGB model not only effectively overcomes the generalization bottleneck of deep tree models in the sample-constrained Northeast region (achieving 93.52% accuracy),but also significantly improves precision in regions with highly heterogeneous features and extreme class imbalance, such as central and western China. This effectively reduces wasteful allocation of poverty alleviation resources caused by false positive errors. This study provides an algorithmic solution that combines high accuracy with interpretability for precise identification of relative poverty in complex data distribution scenarios.

References

Alkire, S., & Foster, J. (2011). Counting and multidimensional poverty measurement. Journal of Public Economics, 95(7-8), 476-487.

Bergstra, J., Bardenet, R., Bengio, Y., et al. (2011). Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems (pp. 2546-2554).

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Chawla, N. V., Bowyer, K. W., Hall, L. O., et al. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).

Cui, Y., Jia, M., Lin, T. Y., et al. (2019). Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 9268-9277).

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232.

Jean, N., Burke, M., Xie, M., et al. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790-794.

Ke, G., Meng, Q., Finley, T., et al. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (pp. 3146-3154).

Li, X., Zhou, Y., & Chen, Y. (2020). Theory and methods for regional multidimensional poverty measurement. Acta Geographica Sinica, 75(4), 753-768.

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765-4774).

Qian, Y., Wang, C., & Wang, J. (2022). Mutual information and decision tree algorithm for eliminating random consistency. Journal of Shanxi University (Natural Science Edition), 45(5), 1206-1215.

Ravallion, M., & Chen, S. (2007). China’s (uneven) progress against poverty. Journal of Development Economics, 82(1), 1-42.

Shi, Y., Ding, T., Qi, X., et al. (2024). An explainable model for relative poverty identification and early warning. Journal of Shanxi University (Natural Science Edition), 47(1), 155-165.

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (pp. 2951-2959).

Sun, J., & Xia, T. (2019). The evolution of China’s poverty alleviation strategy and post-2020 relative poverty governance. Chinese Rural Economy, (10), 98-111.

Wang, B., Luo, Q., Chen, G., et al. (2022). Differences and dynamics of multidimensional poverty in rural China from multiple perspectives analysis. Journal of Geographical Sciences, 32(8), 1383-1404.

Wang, S., & Zeng, X. (2018). Preliminary exploration of post-2020 poverty issues. Journal of Hohai University (Philosophy and Social Sciences Edition), 20(2), 7-13.

Wang, X., & Feng, H. (2020). China’s multidimensional relative poverty standards post-2020: International experience and policy orientations. Chinese Rural Economy, (3), 2-21.

Zou, W., & Fang, Y. (2011). A dynamic multidimensional study on poverty in China. Economic Research Journal, (12), 42-55.

Precise Identification of Multi-Regional Relative Poverty: A Two-Stage Knowledge-Distilled Adaptive Framework

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

JournalInformation

Make a Submission

DownloadTemplate

Keywords

Similar Articles

Dairy Product Production Prediction Based on BiLSTM-Attention model

Flood Probability Prediction Based on xLSTM

Wind Farm Forecasting Based on MLSTM

RF-PBFT is an Improved PBFT Consensus Algorithm Based on the Random Forest Algorithm