Links
Contact Info.
  • Address:天津市西青区宾水西道399号天津工业大学化学与化工学院化学工程与工艺系6D518
  • Zip:300387
  • Tel:022-83955663
  • Fax:022-83955663
  • Email:bianxihui@163.com
Current Location :> Home > Publications > Text
Study on the Selection of Spectral Preprocessing Methods
writer:Pengyao Diwu, Xihui Bian*, Zifang Wang, Wei Liu
keywords:Preprocessing method; Complex sample; Partial least squares; Parameter optimization; Method selection
source:期刊
specific source:Spectroscopy and Spectral Analysis 2019, 39 (9), 2800-2806
Issue time:2019年
Spectral signals of complex samples are usually disturbed by stray light, noise, baseline drift and other undesirable factors, which can affect the final qualitative and quantitative analysis results. Therefore, it is necessary to pretreat the raw spectra before modeling. How to find a proper preprocessing method from the existing spectral preprocessing methods is a difficult problem. One strategy is to choose the optimal preprocessing by observing the characteristics of the spectral signal directly, which does not require modeling and is more explanatory. However, it may be difficult and subjective for subtle or multiple interferences and lead to misleading results. Another strategy is based on the modeling performance, which does not need observe the spectral characteristics, but numerous processing methods need to investigate which is time-consuming for large datasets. In summary, it is necessary to explore which selection method is more scientific and reasonable. In this study, nine datasets were used to investigate the necessity of preprocessing and the choice of preprocessing methods by arranging and combining of 10 preprocessing methods. Firstly, the latent variables of partial least squares (PLS), the window size of first derivative (1st Der), second derivative (2nd Der) and SG smoothing, the wavelet function and decomposition scale of continuous wavelet transform (CWT) were optimized, respectively. Then, non-preprocessing and 10 preprocessing methods including 1st Der, 2nd Der, CWT, multiplicative scatter correction (MSC), standard normal variate (SNV), SG smoothing, mean centering, normalization, Pareto scaling, auto scaling were combined in order of baseline correction, scattering correction, smoothing and scaling. A total of 120 preprocessing and their combinations were obtained. Finally, the characteristics of spectral signals and the root mean squared error of prediction (RMSEP) with PLS for 120 preprocessing methods were analyzed for the nine datasets and the same dataset with different components. Results show that compared with observing the characteristics of spectral signals, the optimal preprocessing method can be selected more accurately according to the modeling performance of the spectra and predictive components. For most datasets, appropriate preprocessing method can improve the modeling performance. For different datasets, the optimal preprocessing method is different because of the different information and complexity of the datasets. For the same dataset, the optimal preprocessing methods for different components are also different even if the spectra are the same. Thus, it can be concluded that no universal preprocessing method exists. The optimal preprocessing method is related to the spectra and the predictive components. Furthermore, it is an effective way to select the optimal pretreatment method by sorting and combining the existing preprocessing methods according to the preprocessing purpose.