Abstract:
Combining the advantages of Filter and Wrapper methods, a feature selection method based on ensemble genetic algorithm (FSEGA) was proposed to select feature genes from gene expression profile data set. The distribution of positive and negative samples of genes was used to define information index, and to filter noise genes. In the process of recursive informative elimination, candidate gene subsets were generated according to the integrated weights of genes. The candidate gene subset with the highest AUC (area under the receiver operating characteristic curve) value in classification tests was selected as the feature gene subset of gene expression profile dataset. Support vector machine (SVM) was applied to the fitness function of the algorithm, and the relationship between FSEGA method and classifier algorithm was studied. Gene selection experiments were carried out on 5 tumor gene expression profile data sets. The results show that the feature gene set selected by the proposed method of integrated feature selection has rich category information and good repeatability, which improves the stability and robustness of tumor feature gene selection.