Breakthroughs in artificial intelligence, especially deep learning, have greatly impacted the scientific research field. Accompanied by the improvement of computational performance and the increase of data volume, deep learning has become a powerful tool for data mining, causal analysis and decision optimization.
We utilize deep learning to extract effective information from large-scale spectral data. On the one hand, mining the relationship between spectral signals and molecular structures will assist in peak attribution and spectral resolution; on the other hand, efficient and accurate classification algorithms also provide promising solutions for practical applications, such as rapid bacterial detection and disease diagnosis based on Raman spectroscopy. Our current research focuses on:
1. Constructing the relationship between vibrational spectra and moledcular structure
Vibrational spectroscopy can provide rich molecular fingerprint information, which is important for analyzing the structure of molecules. Currently, the analysis of vibrational spectra requires a wealth of expert knowledge, and there is a lack of uniformity in the attribution of some of the peaks. If reliable spectral structure relationships can be established, the accuracy of the existing spectral analysis will be greatly improved. Traditional methods are highly dependent on theoretical calculations (e.g., DFT), however, the high computational cost limits their application to complex molecules. In recent years, data-driven methods have provided novel ideas to address this problem. We are using deep learning to model molecular and vibrational spectra to enable structure-based spectral prediction and spectral-based molecular generation.
2. Feature extraction and practical applications of biological Raman spectroscopy
Raman spectroscopy is well suited for the detection of biological samples, especially for monitoring in situ life processes. However, the resolution of biological Raman spectra suffers from challenges such as weak signals, many interferences and poor reproducibility. Machine learning is very good at extracting key information from a large number of complex signals, and is therefore widely used for feature extraction of biological Raman spectra. However, with the deepening of research and the expansion of applications, the spectra in different detection systems and characterization environments are significantly different, and now need to be re-modeled for different dataset. The development of general and efficient spectral model that can be fine-tuned for different datasets will greatly reduce the cost of data acquisition and modeling at the same time. We are utilizing deep learning combined with a series of pre-training strategies to achieve this goal.
In addition, characterization instrumentation is critical for scientists to understand reaction processes. Improving the characterization resolution of existing instruments relies on hardware tuning and algorithm optimization. Digitization and automation of instruments can help improve characterization accuracy and research efficiency. We have developed deep learning-based spectral/image processing algorithms to greatly improve the spatial and temporal resolution of existing characterization instruments. In addition, we are actively building a digital instrument platform, including but not limited to a unified Raman spectroscopy database and online responsive cloud-based algorithm services. In the future, we will develop reinforcement learning-based algorithms to empower instrument decision-making. Our current research focuses on:
1. Characterizing instrumental spatio-temporal resolution enhancement
There is an obvious trade-off between the signal-to-noise ratio/resolution of Raman spectra and the sampling speed. Detection of transient chemical reactions in situ can be greatly facilitated if high temporal resolution signals can be acquired while maintaining their spectral quality. To this end, we have developed a series of algorithms, including CNN-based super-resolution reconstruction, PEER, CLRMA, and Signal2Signal, which are based on fast acquisition and low quality signal reconstruction. We will continue to explore more efficient and lightweight solutions.
2. Constructing algorithm platform on cloud
The iteration and enhancement of algorithms can not be separated from the feedback of users, in order to get timely feedback and reduce the usage threshold of the above algorithms, we have developed a web service called Raman cloud. Users can handle their spectra/imaging in an instant interactive way. We will collect your feedback and continue to iterate, and try to utilize deep learning to recommend the best spectral processing strategy to users. In the future, we hope to embed it inside the instrument to break through the barriers between hardware and software, and build an intelligent instrument with autonomous analysis and decision-making capabilities.