hi Michael, if we assume that a seqname style does not imply a specific genome build, then i'd say that the error below about having incompatible genomes should not pop up because sequence styles have been already matched, right?
On 10/24/14 10:22 PM, Michael Lawrence wrote: > I don't think a seqname style implies a specific genome build. But the > inverse might make sense. Given a genome build identifier, we could > check for consistent naming. Perhaps an option on "genome<-" could > support this? > > > > On Fri, Oct 24, 2014 at 11:52 AM, Valerie Obenchain > <voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote: > > This is a good question. I'm not sure we want seqlevelsStyle() to > also alter the genome value. I think it's a reasonable request but > I'd like to open it up to discussion. I've cc'd a few others for > input. > > Valerie > > > > On 10/24/14 09:05, Robert Castelo wrote: > > hi Valerie, > > thanks for the quick fix and updating the documentation, i have a > further question about the seqinfo slot and particularly the > use of > seqlevelsStyle(). Let me illustrate it with an example again: > > > ============== > library(VariantAnnotation) > library(TxDb.Hsapiens.UCSC.hg19.knownGene) > > ## read again the same VCF file > fl <- file.path(system.file("extdata", > package="VariantFiltering"), > "CEUtrio.vcf.bgz") > vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl))) > > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > > ## select the standard chromosomes > vcf <- keepStandardChromosomes(vcf) > > ## since the input VCF file had NCBI style, let's match > ## the style of the TxDb annotations > seqlevelsStyle(vcf) <- seqlevelsStyle(txdb) > > ## drop the mitochondrial chromosome (b/c of the different lengths > ## between b37 and hg19 > vcf <- dropSeqlevels(vcf, "chrM") > > ## try to annotate the location of the variants. it prompts an > ## error because the 'genome' slot of the Seqinfo object still > ## has b37 after running seqlevelsStyle > vcf_annot <- locateVariants(vcf, txdb, AllVariants()) > Error in mergeNamedAtomicVectors(genome(x), genome(y), what = > c("sequence", : > sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, > chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, > chr19, > chr20, chr21, chr22, chrX, chrY, chrM have incompatible genomes: > - in 'x': b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, > b37, b37, > b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37, b37 > - in 'y': hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, > hg19, hg19, > hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, hg19, > hg19, hg19, > hg19, hg19, hg19 > > ## this can be fixed by setting the 'genome' slot to the values of > ## the TxDb object > genome(vcf) <- genome(txdb)[intersect(names(genome(vcf)), > names(genome(txdb)))] > > ## now this works > vcf_annot <- locateVariants(vcf, txdb, AllVariants()) > ================= > > so my question is, should not seqlevelsStyle() also change the > 'genome' > slot of the Seqinfo object in the updated object? > > if not, would the solution be updating the 'genome' slot in > the way i > did it? > > thanks! > robert. > > > > On 10/23/2014 11:14 PM, Valerie Obenchain wrote: > > Hi Robert, > > Thanks for the bug report and reproducible example. Now > fixed in release > 1.12.2 and devel 1.13.4. > > I've also updated the docs to better explain how the > Seqinfo objects are > propagated / merged when supplied as 'genome'. > > Valerie > > > On 10/23/2014 06:45 AM, Robert Castelo wrote: > > hi there, > > in my package VariantFiltering i have an example VCF > file from a Hapmap > CEU trio including three chromosomes only to > illustrate its vignette. > i've come across a problem with the function readVcf() in > VariantAnnotation that may be specific of the > situation of a VCF file > not having all chromosomes, but which it will be great > for me if this > could be addressed. > > the problem is reproduced as follows: > > =========================== > library(VariantAnnotation) > > fl <- file.path(system.file("extdata", > package="VariantFiltering"), > "CEUtrio.vcf.bgz") > > vcf <- readVcf(fl, seqinfo(scanVcfHeader(fl))) > Error in GenomeInfoDb:::makeNewSeqnames(x, new2old = > new2old, > seqlevels(value)) : > when 'new2old' is NULL, the first elements in the > supplied 'seqlevels' must be identical to 'seqlevels(x)' > ==================== > > this is caused because although i'm providing the > Seqinfo object derived > from the header of the VCF file itself, at some point > the ordering of > the seqlevels between the header and the rest of the > VCF file differs > due to the smaller subset of chromosomes in the VCF file. > > This can be easily fixed by replacing the line: > > if (length(newsi) > length(oldsi)) { > > within the .scanVcfToVCF() function in > methods-readVcf.R, by > > if (length(newsi) >= length(oldsi)) { > > this is happening both in release and devel. i'm > pasting below my > sessionInfo() for the release. > > let me know if you think this fix is feasible or i'm > wrongly using the > function readVcf(). i'm basically trying to use > readVcf() without having > to figure out the appropriate value for the argument > 'genome', i.e., > without knowing beforehand what version of the genome > was used to > produce the VCF file. > > thanks!! > robert. > > > sessionInfo() > R version 3.1.1 Patched (2014-10-13 r66751) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 > [7] LC_PAPER=en_US.UTF8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats4 parallel stats graphics grDevices > [6] utils datasets methods base > > other attached packages: > [1] VariantAnnotation_1.12.0 Rsamtools_1.18.0 > [3] Biostrings_2.34.0 XVector_0.6.0 > [5] GenomicRanges_1.18.0 GenomeInfoDb_1.2.0 > [7] IRanges_2.0.0 S4Vectors_0.4.0 > [9] BiocGenerics_0.12.0 vimcom_1.0-0 > [11] setwidth_1.0-3 colorout_1.0-3 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.28.0 base64enc_0.1-2 > [3] BatchJobs_1.4 BBmisc_1.7 > [5] Biobase_2.26.0 BiocParallel_1.0.0 > [7] biomaRt_2.22.0 bitops_1.0-6 > [9] brew_1.0-6 BSgenome_1.34.0 > [11] checkmate_1.4 codetools_0.2-9 > [13] DBI_0.3.1 digest_0.6.4 > [15] fail_1.2 foreach_1.4.2 > [17] GenomicAlignments_1.2.0 GenomicFeatures_1.18.0 > [19] iterators_1.0.7 RCurl_1.95-4.3 > [21] RSQLite_0.11.4 rtracklayer_1.26.0 > [23] sendmailR_1.2-1 stringr_0.6.2 > [25] tools_3.1.1 XML_3.98-1.1 > [27] zlibbioc_1.12.0 > > _______________________________________________ > Bioc-devel@r-project.org > <mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel