高级检索

基于随机映射的气相色谱-质谱库搜索结果集提取

Extraction of Result Set for Gas Chromatography-Mass Spectrometry Database Search Based on Random Mapping

  • 摘要: 作为一种快速实现质谱分子匹配的方法,基于随机映射的质谱库搜索方法选取前几个匹配相似度最高的候选分子组成结果集,但由于缺乏准确的阈值设定依据,该方法容易丢失部分正确结果,致使识别率降低。针对该问题,采用统计学方法对随机映射质谱库搜索方法的结果集进行分析,发现:在匹配成功分子中,有96.60%的匹配相似度大于0.85;在非最高相似度匹配成功的分子中,有97.19%其所对应的相似度与最高相似度的差值不大于0.07。基于此,改进现有的基于随机映射质谱库搜索方法,提出一种更为精准的动态截取结果集提取法。实验结果表明:提出的方法可将现有方法的识别率提高1.89%,平均匹配准确率达98.60%,从而使分子的定性识别更为准确;算法的稳健性进一步提高。

     

    Abstract: As a fast method of matching the mass spectral of target molecules with that of standard compounds, random mapping-based mass spectral library searching algorithms select some candidate molecules with top high similarity values to make up a candidate set. However, due to the lack of accurate threshold setting basis, this strategy is easy to miss some correct results, which will definitely reduce the identification accuracy. In order to address this problem, statistical methods were adopted in this paper to analyze the candidate set of the original method, and the experiment results show that, the similarity values of 96.60% of query molecules are greater than 0.85 if the correct matching can be got; among those correct matching is not occur with the highest similarity, 97.19% of the similarity difference between the matched molecule and the highest one are less than 0.07. Based on these findings, an accurate candidate set extraction algorithm, called dynamic interception algorithm, is proposed in this paper by improving the current random mapping-based mass spectral library searching approach. The experimental results show that the proposed method can increase the identification accuracy of the existing method by 1.89%, and the average value reaches 98.60%, hence the molecules can be identified more accurately, and the robustness of the algorithm is improved.

     

/

返回文章
返回