you're right, i was just thinking about b37 (w/o MT) and hg19 but one can have the same seqlevelsStyle with two different builds such as hg18 and hg19. Well then, the solution i was using below,
genome(vcf) <- genome(txdb)[intersect(names(genome(vcf)), names(genome(txdb)))] was not that bad, but it's a bit cumbersome to write, something like what you propose below looks better to me. I also wonder why each chromosome should have a genome build (which forces me to intersect in the line before), could not we assume that all chromosomes come from the same build? On 10/25/14 12:56 AM, Michael Lawrence wrote: > Not sure I understand. Setting the seqlevelsStyle cannot change the > genome build, so the two Seqinfos will remain incompatible in that > way. I think what you want is to be able to say "let's consider these > two genome builds to be the same", which seems reasonable after > dropping chrM. I was proposing that this could be done with something > like: > > genome(vcf, rectifySeqlevels=TRUE) <- genome(txdb) > > For convenience, we might want that argument to be TRUE by default, > but that would change current behavior. > > > On Fri, Oct 24, 2014 at 3:41 PM, Robert Castelo > <robert.cast...@upf.edu <mailto:robert.cast...@upf.edu>> wrote: > > hi Michael, > > if we assume that a seqname style does not imply a specific genome > build, then i'd say that the error below about having incompatible > genomes should not pop up because sequence styles have been > already matched, right? > > > > On 10/24/14 10:22 PM, Michael Lawrence wrote: >> I don't think a seqname style implies a specific genome build. >> But the inverse might make sense. Given a genome build >> identifier, we could check for consistent naming. Perhaps an >> option on "genome<-" could support this? >> >> >> >> On Fri, Oct 24, 2014 at 11:52 AM, Valerie Obenchain >> <voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote: >> >> This is a good question. I'm not sure we want >> seqlevelsStyle() to also alter the genome value. I think it's >> a reasonable request but I'd like to open it up to >> discussion. I've cc'd a few others for input. >> >> Valerie >> >> >> >> On 10/24/14 09:05, Robert Castelo wrote: >> >> hi Valerie, >> >> thanks for the quick fix and updating the documentation, >> i have a >> further question about the seqinfo slot and particularly >> the use of >> seqlevelsStyle(). Let me illustrate it with an example again: >> >> >> ============== >> library(VariantAnnotation) >> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >> >> ## read again the same VCF file >> fl <- file.path(system.file("extdata", >> package="VariantFiltering"), >> "CEUtrio.vcf.bgz") >> vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl))) >> >> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >> >> ## select the standard chromosomes >> vcf <- keepStandardChromosomes(vcf) >> >> ## since the input VCF file had NCBI style, let's match >> ## the style of the TxDb annotations >> seqlevelsStyle(vcf) <- seqlevelsStyle(txdb) >> >> ## drop the mitochondrial chromosome (b/c of the >> different lengths >> ## between b37 and hg19 >> vcf <- dropSeqlevels(vcf, "chrM") >> >> ## try to annotate the location of the variants. it >> prompts an >> ## error because the 'genome' slot of the Seqinfo object >> still >> ## has b37 after running seqlevelsStyle >> vcf_annot <- locateVariants(vcf, txdb, AllVariants()) >> Error in mergeNamedAtomicVectors(genome(x), genome(y), what = >> c("sequence", : >> sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, >> chr8, chr9, >> chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, >> chr18, chr19, >> chr20, chr21, chr22, chrX, chrY, chrM have incompatible >> genomes: >> - in 'x': b37, b37, b37, b37, b37, b37, b37, b37, b37, >> b37, b37, b37, >> b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, >> b37, b37 >> - in 'y': hg19, hg19, hg19, hg19, hg19, hg19, hg19, >> hg19, hg19, hg19, >> hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, >> hg19, hg19, hg19, >> hg19, hg19, hg19 >> >> ## this can be fixed by setting the 'genome' slot to the >> values of >> ## the TxDb object >> genome(vcf) <- genome(txdb)[intersect(names(genome(vcf)), >> names(genome(txdb)))] >> >> ## now this works >> vcf_annot <- locateVariants(vcf, txdb, AllVariants()) >> ================= >> >> so my question is, should not seqlevelsStyle() also >> change the 'genome' >> slot of the Seqinfo object in the updated object? >> >> if not, would the solution be updating the 'genome' slot >> in the way i >> did it? >> >> thanks! >> robert. >> >> >> >> On 10/23/2014 11:14 PM, Valerie Obenchain wrote: >> >> Hi Robert, >> >> Thanks for the bug report and reproducible example. >> Now fixed in release >> 1.12.2 and devel 1.13.4. >> >> I've also updated the docs to better explain how the >> Seqinfo objects are >> propagated / merged when supplied as 'genome'. >> >> Valerie >> >> >> On 10/23/2014 06:45 AM, Robert Castelo wrote: >> >> hi there, >> >> in my package VariantFiltering i have an example >> VCF file from a Hapmap >> CEU trio including three chromosomes only to >> illustrate its vignette. >> i've come across a problem with the function >> readVcf() in >> VariantAnnotation that may be specific of the >> situation of a VCF file >> not having all chromosomes, but which it will be >> great for me if this >> could be addressed. >> >> the problem is reproduced as follows: >> >> =========================== >> library(VariantAnnotation) >> >> fl <- file.path(system.file("extdata", >> package="VariantFiltering"), >> "CEUtrio.vcf.bgz") >> >> vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl))) >> Error in GenomeInfoDb:::makeNewSeqnames(x, >> new2old = new2old, >> seqlevels(value)) : >> when 'new2old' is NULL, the first elements in the >> supplied 'seqlevels' must be identical to >> 'seqlevels(x)' >> ==================== >> >> this is caused because although i'm providing the >> Seqinfo object derived >> from the header of the VCF file itself, at some >> point the ordering of >> the seqlevels between the header and the rest of >> the VCF file differs >> due to the smaller subset of chromosomes in the >> VCF file. >> >> This can be easily fixed by replacing the line: >> >> if (length(newsi) > length(oldsi)) { >> >> within the .scanVcfToVCF() function in >> methods-readVcf.R, by >> >> if (length(newsi) >= length(oldsi)) { >> >> this is happening both in release and devel. i'm >> pasting below my >> sessionInfo() for the release. >> >> let me know if you think this fix is feasible or >> i'm wrongly using the >> function readVcf(). i'm basically trying to use >> readVcf() without having >> to figure out the appropriate value for the >> argument 'genome', i.e., >> without knowing beforehand what version of the >> genome was used to >> produce the VCF file. >> >> thanks!! >> robert. >> >> >> sessionInfo() >> R version 3.1.1 Patched (2014-10-13 r66751) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 >> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 >> [7] LC_PAPER=en_US.UTF8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats4 parallel stats graphics grDevices >> [6] utils datasets methods base >> >> other attached packages: >> [1] VariantAnnotation_1.12.0 Rsamtools_1.18.0 >> [3] Biostrings_2.34.0 XVector_0.6.0 >> [5] GenomicRanges_1.18.0 GenomeInfoDb_1.2.0 >> [7] IRanges_2.0.0 S4Vectors_0.4.0 >> [9] BiocGenerics_0.12.0 vimcom_1.0-0 >> [11] setwidth_1.0-3 colorout_1.0-3 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.28.0 base64enc_0.1-2 >> [3] BatchJobs_1.4 BBmisc_1.7 >> [5] Biobase_2.26.0 BiocParallel_1.0.0 >> [7] biomaRt_2.22.0 bitops_1.0-6 >> [9] brew_1.0-6 BSgenome_1.34.0 >> [11] checkmate_1.4 codetools_0.2-9 >> [13] DBI_0.3.1 digest_0.6.4 >> [15] fail_1.2 foreach_1.4.2 >> [17] GenomicAlignments_1.2.0 GenomicFeatures_1.18.0 >> [19] iterators_1.0.7 RCurl_1.95-4.3 >> [21] RSQLite_0.11.4 rtracklayer_1.26.0 >> [23] sendmailR_1.2-1 stringr_0.6.2 >> [25] tools_3.1.1 XML_3.98-1.1 >> [27] zlibbioc_1.12.0 >> >> _______________________________________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.org> mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> >> >> >> >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel