高级检索

集成遗传算法在特征基因选取中的应用

Application of Ensemble Genetic Algorithm in Feature Gene Selection

  • 摘要: 结合Filter和Wrapper方法的优点,提出一种基于集成遗传算法(FSEGA)的特征选择方法,用于从基因表达谱数据中选择特征基因。根据基因正负样本的分布关系定义信息指标过滤噪声基因,在递归特征消除过程中根据基因的集成权值生成候选基因子集,选择分类测试中具有最高AUC(接收者工作特征曲线下的面积)值的候选基因子集作为基因表达谱数据集的特征基因子集,将支持向量机(SVM)用于算法的适应度函数,研究FSEGA方法与分类器算法之间的关系,对5个肿瘤特征基因表达谱数据集进行基因选取实验。结果表明,采用提出的集成特征选取方法选取的特征基因集合含丰富类别信息,重复性较好,提高了肿瘤特征基因选取的稳定性和鲁棒性。

     

    Abstract: Combining the advantages of Filter and Wrapper methods, a feature selection method based on ensemble genetic algorithm (FSEGA) was proposed to select feature genes from gene expression profile data set. The distribution of positive and negative samples of genes was used to define information index, and to filter noise genes. In the process of recursive informative elimination, candidate gene subsets were generated according to the integrated weights of genes. The candidate gene subset with the highest AUC (area under the receiver operating characteristic curve) value in classification tests was selected as the feature gene subset of gene expression profile dataset. Support vector machine (SVM) was applied to the fitness function of the algorithm, and the relationship between FSEGA method and classifier algorithm was studied. Gene selection experiments were carried out on 5 tumor gene expression profile data sets. The results show that the feature gene set selected by the proposed method of integrated feature selection has rich category information and good repeatability, which improves the stability and robustness of tumor feature gene selection.

     

/

返回文章
返回