MICER: a pre-trained encoder-decoder architecture for molecular image captioning

October 2022 Jiacai Yi, Chengkun Wu, Xiaochen Zhang, Xinyi Xiao, Yanlong Qiu, Wentao Zhao, Tingjun Hou, Dongsheng Cao Bioinformatics

Automatic recognition of chemical structures from molecular images provides an important avenue for the rediscovery of chemicals. Traditional rule-based approaches rely on expert knowledge and struggle with diverse drawing styles. We propose MICER, which leverages a pre-trained encoder on large-scale molecular images to learn robust visual representations, coupled with an attention-based decoder to translate molecular images into SMILES strings. Fine-tuning the pre-trained model dramatically boosts performance on molecular image captioning benchmarks, achieving state-of-the-art optical chemical structure recognition accuracy.