Hi, I discovered the following problem in the geno() elements of the VCF object when I load multiple regions from a .vcf file:
## Correct > tmp = readVcf(system.file("extdata", "chr22.vcf.gz", > package="VariantAnnotation"), "hg19")[1:10] > geno(tmp)$GT HG00096 HG00097 HG00099 HG00100 HG00101 rs7410291 "0|0" "0|0" "1|0" "0|0" "0|0" rs147922003 "0|0" "0|0" "0|0" "0|0" "0|0" rs114143073 "0|0" "0|0" "0|0" "0|0" "0|0" rs141778433 "0|0" "0|0" "0|0" "0|0" "0|0" rs182170314 "0|0" "0|0" "0|0" "0|0" "0|0" rs115145310 "0|0" "0|0" "0|0" "0|0" "0|0" rs186769856 "0|0" "0|0" "0|0" "0|0" "0|0" rs77627744 "0|0" "0|0" "0|0" "0|0" "0|0" rs193230365 "0|0" "0|0" "0|0" "0|0" "0|0" rs9627788 "0|0" "0|0" "1|0" "0|0" "0|0" ## Wrong: Load regions separately using ScanVcfParam(which= ) > tmp2 = readVcf(system.file("extdata", "chr22.vcf.gz", > package="VariantAnnotation"), "hg19", param=ScanVcfParam(which=rowData(tmp))) > geno(tmp2)$GT HG00096 HG00097 HG00099 HG00100 HG00101 rs7410291 "0|0" "0|0" "0|0" "0|0" "0|0" rs147922003 "0|0" "0|0" "0|0" "0|0" "0|0" rs114143073 "1|0" "0|0" "0|0" "0|0" "0|0" rs141778433 "0|0" "0|0" "0|0" "0|0" "0|0" rs182170314 "0|0" "0|0" "0|0" "0|0" "0|0" rs115145310 "0|0" "0|0" "0|0" "0|0" "0|0" rs186769856 "0|0" "0|0" "0|0" "0|0" "0|0" rs77627744 "0|0" "0|0" "0|0" "0|0" "1|0" rs193230365 "0|0" "0|0" "0|0" "0|0" "0|0" rs9627788 "0|0" "0|0" "0|0" "0|0" "0|0" Note how the elements have been added by column instead of by row. The following solved the issue for me: $svn diff VariantAnnotation/ Index: VariantAnnotation/R/AllUtilities.R =================================================================== --- VariantAnnotation/R/AllUtilities.R (revision 73920) +++ VariantAnnotation/R/AllUtilities.R (working copy) @@ -62,8 +62,8 @@ cmb } else { trans <- lapply(elt, t) - cmb <- matrix(t(do.call(c, trans)), - length(lst[["rowData"]]), d[2]) + cmb <- matrix(do.call(c, trans), + length(lst[["rowData"]]), d[2], byrow=TRUE) cmb } }) My suspicion is that the transposition t() on line 65 didn't do the desired job, as do.call() returned a vector.. I don't fully understand the code, so it may be worth double checking. For example, I didn't check the case when the individual geno entries contain more than a single value for each variant and sample. The same problem also occurred for VariantAnnotation_1.4.9. Best wishes, Moritz P.S. > sessionInfo() R Under development (unstable) (2013-02-05 r61843) Platform: x86_64-apple-darwin11.4.2 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] VariantAnnotation_1.5.41 Rsamtools_1.11.20 Biostrings_2.27.11 [4] biomaRt_2.15.0 GenomicRanges_1.11.34 IRanges_1.17.35 [7] BiocGenerics_0.5.6 BiocInstaller_1.9.6 RColorBrewer_1.0-5 loaded via a namespace (and not attached): [1] AnnotationDbi_1.21.12 Biobase_2.19.2 bitops_1.0-5 [4] BSgenome_1.27.1 DBI_0.2-5 GenomicFeatures_1.11.13 [7] RCurl_1.95-3 RSQLite_0.11.2 rtracklayer_1.19.9 [10] stats4_3.0.0 tools_3.0.0 XML_3.95-0.1 [13] zlibbioc_1.5.0 --- Moritz Gerstung Postdoctoral Fellow Cancer Genome Project Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Hinxton, Cambridgeshire CB10 1SA UK -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel