高级检索
常志朋,陈闻鹤. 基于成对样本比较的相对贫困识别特征正交筛选方法[J]. 安徽工业大学学报(自然科学版),xxxx,x(x):x-xx. DOI: 10.12415/j.issn.1671-7872.24029
引用本文: 常志朋,陈闻鹤. 基于成对样本比较的相对贫困识别特征正交筛选方法[J]. 安徽工业大学学报(自然科学版),xxxx,x(x):x-xx. DOI: 10.12415/j.issn.1671-7872.24029
CHANG Zhipeng, CHEN Wenhe. An Orthogonal Selection Method for Identification Features of Relative Poverty Based on Pairwise Sample Comparison[J]. Journal of Anhui University of Technology(Natural Science). DOI: 10.12415/j.issn.1671-7872.24029
Citation: CHANG Zhipeng, CHEN Wenhe. An Orthogonal Selection Method for Identification Features of Relative Poverty Based on Pairwise Sample Comparison[J]. Journal of Anhui University of Technology(Natural Science). DOI: 10.12415/j.issn.1671-7872.24029

基于成对样本比较的相对贫困识别特征正交筛选方法

An Orthogonal Selection Method for Identification Features of Relative Poverty Based on Pairwise Sample Comparison

  • 摘要: 为了解决相对贫困识别特征的筛选问题,提出1种基于成对样本比较的正交筛选方法。采用“成对比较”的方式收集“存在相对贫困”和“不存在相对贫困”两类成对样本集;基于拉近同类样本推远非同类样本的思想构建1种特征子集评估函数;采用原理较为简单的正交试验筛选特征。为验证方法的有效性,以大别山区356个建档立卡农户和212个非建档立卡农户为研究对象,随机构建4组成对样本集筛选出4组关键特征,选取逻辑回归、决策树、支持向量机、深度神经网络、随机森林、Boosting和朴素贝叶斯7种分类器验证4组关键特征的识别性能。结果表明:逻辑回归、支持向量机、深度神经网络、随机森林、Boosting和朴素贝叶斯6种分类器在4组关键特征上的识别准确率、灵敏度、特异度和AUC值均超过90%以上;采用不同的成对样本筛选出的关键特征其识别性能差异不大;4组关键特征的识别性能均能达到采用全部特征的识别性能。提出的方法原理简单易于操作,可在缺少相对贫困划分标准或难以制定相对贫困划分标准的情况下筛选出相对贫困的识别特征。

     

    Abstract: To solve identification feature selection problem of relative poverty, an orthogonal selection method based on pairwise sample comparison was proposed. Two types of paired sample sets were collected by means of pairwise comparison, one was the paired samples of “existing relative poverty” and the other is the paired samples of “non-existing relative poverty”. Then, a new feature subset evaluation function was constructed based on the idea of pulling similar samples closer and pushing dissimilar samples further apart. Finally, a relatively simple orthogonal experiment was employed to select features. To validate the effectiveness of the method, 356 registered poor households and 212 non-registered poor households from the Dabie Mountain area were considered as research subjects. Four sets of paired sample sets were randomly constructed to select out four sets of key features. Various classifiers including logistic regression, decision tree, support vector machine, deep neural network, random forest, boosting, and naive Bayes were used to evaluate the identification performance of these key features. The results indicate that Six classifiers, logistic regression, support vector machine, deep neural network, random forest, boosting and naive Bayes, adopt four groups of key features for identification, and the identification accuracy, sensitivity, specificity and AUC value can all exceed 90%. There is little difference in the identification performance of these key features selected by different paired sample combinations. The identification performance of all four groups of key features can reach the identification performance of all features. The proposed method is simple in principle and easy to operate, and can select out the identification features of relative poverty in the absence of relative poverty classification standards or difficult to formulate the classification standards.

     

/

返回文章
返回