Hi Robert,
Thanks for the bug report and reproducible example. Now fixed in release
1.12.2 and devel 1.13.4.
I've also updated the docs to better explain how the Seqinfo objects are
propagated / merged when supplied as 'genome'.
Valerie
On 10/23/2014 06:45 AM, Robert Castelo wrote:
hi there,
in my package VariantFiltering i have an example VCF file from a Hapmap
CEU trio including three chromosomes only to illustrate its vignette.
i've come across a problem with the function readVcf() in
VariantAnnotation that may be specific of the situation of a VCF file
not having all chromosomes, but which it will be great for me if this
could be addressed.
the problem is reproduced as follows:
===========================
library(VariantAnnotation)
fl <- file.path(system.file("extdata", package="VariantFiltering"),
"CEUtrio.vcf.bgz")
vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl)))
Error in GenomeInfoDb:::makeNewSeqnames(x, new2old = new2old,
seqlevels(value)) :
when 'new2old' is NULL, the first elements in the
supplied 'seqlevels' must be identical to 'seqlevels(x)'
====================
this is caused because although i'm providing the Seqinfo object derived
from the header of the VCF file itself, at some point the ordering of
the seqlevels between the header and the rest of the VCF file differs
due to the smaller subset of chromosomes in the VCF file.
This can be easily fixed by replacing the line:
if (length(newsi) > length(oldsi)) {
within the .scanVcfToVCF() function in methods-readVcf.R, by
if (length(newsi) >= length(oldsi)) {
this is happening both in release and devel. i'm pasting below my
sessionInfo() for the release.
let me know if you think this fix is feasible or i'm wrongly using the
function readVcf(). i'm basically trying to use readVcf() without having
to figure out the appropriate value for the argument 'genome', i.e.,
without knowing beforehand what version of the genome was used to
produce the VCF file.
thanks!!
robert.
sessionInfo()
R version 3.1.1 Patched (2014-10-13 r66751)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
[7] LC_PAPER=en_US.UTF8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices
[6] utils datasets methods base
other attached packages:
[1] VariantAnnotation_1.12.0 Rsamtools_1.18.0
[3] Biostrings_2.34.0 XVector_0.6.0
[5] GenomicRanges_1.18.0 GenomeInfoDb_1.2.0
[7] IRanges_2.0.0 S4Vectors_0.4.0
[9] BiocGenerics_0.12.0 vimcom_1.0-0
[11] setwidth_1.0-3 colorout_1.0-3
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.28.0 base64enc_0.1-2
[3] BatchJobs_1.4 BBmisc_1.7
[5] Biobase_2.26.0 BiocParallel_1.0.0
[7] biomaRt_2.22.0 bitops_1.0-6
[9] brew_1.0-6 BSgenome_1.34.0
[11] checkmate_1.4 codetools_0.2-9
[13] DBI_0.3.1 digest_0.6.4
[15] fail_1.2 foreach_1.4.2
[17] GenomicAlignments_1.2.0 GenomicFeatures_1.18.0
[19] iterators_1.0.7 RCurl_1.95-4.3
[21] RSQLite_0.11.4 rtracklayer_1.26.0
[23] sendmailR_1.2-1 stringr_0.6.2
[25] tools_3.1.1 XML_3.98-1.1
[27] zlibbioc_1.12.0
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel