Our article, “Comparative Analysis of Chemical Structure String Representations for Neural Machine Translation,” has been published in Lecture Notes in Computer Science (LNCS) as part of the ICANN 2025 conference proceedings. In this work, we present a comparative analysis of SMILES, DeepSMILES, and SELFIES for chemical structure string representations in neural machine translation. Using transformer-based models, we trained translations between each representation and IUPAC nomenclature. All three performed comparably; SMILES achieved a slightly higher exact-match accuracy of 99.30% with stereochemistry and 99.21% without. Scaling from 1, 10, to 50 million compounds preserved the small gaps and further narrowed them at larger sizes. These results support continued use of SMILES for transformer-based NMT in cheminformatics, given its mature tooling and broad ecosystem.
Rajan, K., Zielesny, A., Steinbeck, C. (2026). Comparative Analysis of Chemical Structure String Representations for Neural Machine Translation. In: Senn, W., et al. Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions. ICANN 2025. Lecture Notes in Computer Science, vol 16072. Springer, Cham. https://doi.org/10.1007/978-3-032-04552-2_2