Abstract:
As a fast method of matching the mass spectral of target molecules with that of standard compounds, random mapping-based mass spectral library searching algorithms select some candidate molecules with top high similarity values to make up a candidate set. However, due to the lack of accurate threshold setting basis, this strategy is easy to miss some correct results, which will definitely reduce the identification accuracy. In order to address this problem, statistical methods were adopted in this paper to analyze the candidate set of the original method, and the experiment results show that, the similarity values of 96.60% of query molecules are greater than 0.85 if the correct matching can be got; among those correct matching is not occur with the highest similarity, 97.19% of the similarity difference between the matched molecule and the highest one are less than 0.07. Based on these findings, an accurate candidate set extraction algorithm, called dynamic interception algorithm, is proposed in this paper by improving the current random mapping-based mass spectral library searching approach. The experimental results show that the proposed method can increase the identification accuracy of the existing method by 1.89%, and the average value reaches 98.60%, hence the molecules can be identified more accurately, and the robustness of the algorithm is improved.