高级检索

基于成对样本比较的相对贫困识别特征正交筛选方法

Orthogonal Feature Screening Method for Relative Poverty Identification Based on Pairwise Sample Comparison

  • 摘要: 为解决相对贫困识别特征筛选问题,提出一种基于成对样本比较的正交筛选方法。采用成对比较方式收集“存在相对贫困”和“不存在相对贫困”两类成对样本集;基于同类样本拉近、非同类样本推远的思想设计特征子集评估函数,且采用正交试验筛选特征。为验证方法有效性,以大别山区356户建档立卡农户和212户非建档立卡农户为样本,随机构建四组成对样本集筛选四组关键特征,采用逻辑回归、决策树、支持向量机、深度神经网络、随机森林、Boosting和朴素贝叶斯7种分类器进行性能测试。结果表明:除决策树分类器外,其余6种分类器在四组关键特征上的识别准确率、灵敏度、特异度和AUC值均超过90%;不同样本集筛选的特征识别性能差异较小,四组关键特征均能达到全特征集的识别效果。本文方法原理简单、操作便捷,适用于缺乏相对贫困划分标准或难以制定相对贫困划分标准的情形,能有效筛选识别特征。

     

    Abstract: To address the issue of feature selection for relative poverty identification, an orthogonal selection method based on pairwise-sample comparison was proposed. Paired sample sets of “relative poverty” and “non-relative poverty” were collected by means of pairwise by means of pairwise comparison. Then, a new feature subset evaluation function was designed based on the idea of pulling similar samples closer and pushing dissimilar samples further apart. Finally, orthogonal experimental design was employed to select features. To validate the effectiveness of the method, 356 registered poor households and 212 non-registered poor households from the Dabie Mountain area were considered as research subjects. Four sets of paired sample sets were randomly constructed to screen four groups of key features,and seven classifiers including logistic regression, decision tree, support vector machine, deep neural network, random forest, Boosting, and naive Bayes were tested for performance evaluation. The results indicate that,with the exception of the decision tree, accuracy, sensitivity, specificity, and AUC values exceeding 90% are achieved by the other six classifiers across all four sets of key features. Minimal variation is observed in the identification performance of features selected from different sample sets, and comparable performance to that of the full feature set is attained by all four sets of key features.The proposed method is characterized by its simple principle and operational convenience, making it suitable for scenarios where relative poverty classification standards are lacking or difficult to establish, thereby enabling effective screening of identification features.

     

/

返回文章
返回