About Classifion

Take a tour

Step-by-step example

Workplace infrastructure

Inside the box

Download & Order

Spectrino website

Sicyon website

 Written by and
copyright
Teodor Krastev

 

The objective: based on their mass-spectra to classify an unknown sample to be (or not) one of number of known substances.

That example follows the shortest, most automated way for classification. That will work in the majority of the cases with decent data. Classifion will estimate the quality of your data for you.

1. You have to take at least 7 measurements per substance (assuming 3 or more substances). The typical number of measurements is 15 (+/-5). Keep the same conditions (as usually) for all the measurements.

2. Examine the data visually and dismiss the ones which are obviously bad. Including them in the training set probably won't harm the classification, but will increase the processing time for optimization.

3. Convert the mass-spectra into XY (mass/intensity) ASCII files with ".txt" extension. The deliminator between X and Y values is tab character (ASCII 9) or space character (ASCII 20). It is recommended that all the files to be in the same directory.

4. Open Classifion, and create new spec-tree (File menu).

5. Create separate group for each substance (File menu) and add the respective mass-spectra into each of them.

6. Press from All groups for boundaries in Options (right). Check, if all the options are unchecked, only "Normalize" should be checked. Check in PCAMD module Optimize page, the preset to be Medium correction.

7. Open Autopilot from Macro menu. Press Fly and enjoy the view.

8. Examine the Autopilot results
 
The first column are the results from the optimization: number of excluded as inconsistent (bad) spectra and the compactness of the cluster in dispersion units (should be around 1).

Second column is the distribution dispersion of MD of all the samples identified as positive match. The smaller is this number, the better. But a big difference between the compactness (first column in brackets) and this dispersion would suggest that the valid sample number for that group is dangerously low. 

Third column contains type I errors, or false negative. There will be merely always that type of error, because of many reasons - different matrix effects, different measurement conditions, misalignments of the instrument, human errors, etc. Having this type of error does not necessarily mean that the classification works poorly, it could be any of the reasons mentioned. Of course supervised training could improve the results.

Forth column contains type II errors, or false positive. If you have that type of error, that usually means that training needs to be supervised.
The final purpose of your classification will define which type of error is more important, hence optimized.
 

9. After you having your list of training sets in Classify module, you can classify any other sample you have. Just load and select the group with unknown specs (more than one measurement of your sample is recommended) and click Analyze Act. Group.

For more information - download Classifion and see the help.