Hi

I'm in the final stage of preparing an mzIdentML parser for submission to 
Bioconductor (https://github.com/thomasp85/mzID) The parser is intended to be 
quite sparse and not interpret the content of the mzIdentML file that much.

One feature I would like to include though, is that each scan gets annotated 
with an mzR compatible acquisition number for better interoperability between 
the two parsers.

The HUPO specifications for the mzIdentML format specifies that each scan in 
the file is labelled with a spectrumID and a reference to the ms data file. 
Furthermore each ms data file should have a spectrum ID format specified 
according to the controlled vocabulary.

The content of the spectrumID can thus be either e.g. 'scanID=<someInteger>' , 
'spectrum=<someInteger>', 'scan=<someInteger>' or even more elaborate: 
'sample=<someInteger> period=<someInteger> cycle=<someInteger> 
experiment=<someInteger>', depending on the machine producing the ms data.

When an ms data file gets parsed by mzR it is all conveniently dropped and 
replaced by an acquisitionNum, that uniquely identifies the scan. This is quite 
easy to handle for spectrumID's consisting of only e.g. 'scan=<someInteger>' 
but for spectrumID's with more than one identifier it gets a bit more fuzzy and 
I don't like guessing.

So the question is: How can I ensure that I extract the right value from the 
spectrumID for an mzR compatible acquisitionNum? I realize that the generation 
of the acquisitionNum in mzR is probably handled by the RAMP module, but I hope 
some of the mzR folks (or others) can help.

best

Thomas Pedersen, PhD student at the Technical University of Denmark (DTU)
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to