Hi,
I am using the devel version of Bioconductor as part of the development of my package chimera. Testing a new function in chimera, that uses Rsubread package, I encountered a problem in converting a sam file generated by Rsubread in a bam file.
I used the function asBam from Rsamtools and I got the following error:

In doTryCatch(return(expr), name, parentenv, handler) :
  Parse error at line 14667325: sequence and quality are inconsistent

I managed to run asBam if I use only the sam file till line 14667324
Instead I get the above error if I use a sam file finishing at line 14667325

The line that create the problem is the following:

HWI-ST169:273:D0YW6ACXX:2:1201:4070:162856 141 * 0 0 * * 0 0 AAAAAAGGGTTGAATTATTTTCACTTGCCCACGTAGTTTATGAATGTGGGAAATAGCTTCAAAGACAGATTAAATGATTTGCCCAAGGCCACAGAAAAGAG @@@FFFFFHABHHJGGBFIGIFHGIJHGJGJIFBGHDBG9BDAFIIDHIIGCHCHI<GACC@ADHHHE;7?@DEFED>@;ACCC>ABB;AAD<BC> 77 * 0 0 * * 0 0 CATGGATGAGGAGAATGAGGATTTTGCGCCGGCTGCTCAGAAGATACCGTGAATCTAAGAAGATCGATCGCCACATGTATCACAGCCTGTACCTGAAGGGG @@@DD?BADHF<D<ACG>FFE;BBF@B?@C@F:(?1.=)))883)8=7@(65??EEBDEC37;;>???=BB@<BBCCACBDDCC:?BCBC:@#########

Does anybody has an idea of what is wrong in this line?
There is any way to validate the sam file before running asBam to detect and filtered out lines that might create problems in the conversion into Bam?
Cheers
Raf

########
sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods
[8] base

other attached packages:
[1] Rsamtools_1.13.16     Biostrings_2.29.3 GenomicRanges_1.13.15
[4] XVector_0.1.0         IRanges_1.19.8        BiocGenerics_0.7.2

loaded via a namespace (and not attached):
[1] bitops_1.0-5   stats4_3.0.0   zlibbioc_1.7.0

--

----------------------------------------
Prof. Raffaele A. Calogero
Bioinformatics and Genomics Unit
MBC Centro di Biotecnologie Molecolari
Via Nizza 52, Torino 10126
tel.   ++39 0116706457
Fax    ++39 0112366457
Mobile ++39 3333827080
email: raffaele.calog...@unito.it
       raffaele[dot]calogero[at]gmail[dot]com
www:   http://www.bioinformatica.unito.it

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to