Hi Valerie, Thank you for the quick fix - it works like a charm!
Michael On 03/10/2013 02:45 AM, Valerie Obenchain wrote: > Hi Michael, > > Thanks for reporting the bug and sending a reproducible example. Sorry > it took a few days to get to this. > > A fix has been checked into versions 1.4.12 (release) and 1.5.44 (devel). > > Valerie > > > On 03/07/13 05:43, Michael Stadler wrote: >> Here is the second attachment ("variants.vcf") that was missing from the >> first message. >> >> Michael >> >> On 03/07/2013 02:40 PM, Michael Stadler wrote: >>> Dear Valerie and colleagues, >>> >>> I have recently started using the VariantAnnotation package, which is a >>> huge asset for much of my work with sequence variants - thank you! >>> >>> I have a question regarding the following example (it makes use of two >>> small text files, "annot.gtf" and "variants.vcf", attached to this >>> message): >>> >>> library(VariantAnnotation) >>> library(GenomicFeatures) >>> library(BSgenome.Celegans.UCSC.ce6) >>> >>> # build a TranscriptDb from "annot.gtf" >>> chrominfo <- data.frame(chrom=as.character(seqnames(Celegans)), >>> length=seqlengths(Celegans), >>> is_circular=FALSE) >>> txdb <- makeTranscriptDbFromGFF(file="annot.gtf", >>> format="gtf", >>> chrominfo=chrominfo, >>> dataSource="WormBase v. WS190", >>> species="Caenorhabditis elegans") >>> txdb >>> #TranscriptDb object: >>> #| Db type: TranscriptDb >>> #| Supporting package: GenomicFeatures >>> #| Data source: WormBase v. WS190 >>> #| Organism: Caenorhabditis elegans >>> #| miRBase build ID: NA >>> #| transcript_nrow: 7 >>> #| exon_nrow: 42 >>> #| cds_nrow: 42 >>> #| Db created by: GenomicFeatures package from Bioconductor >>> #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) >>> #| GenomicFeatures version at creation time: 1.11.14 >>> #| RSQLite version at creation time: 0.11.2 >>> #| DBSCHEMAVERSION: 1.0 >>> >>> # read in sequence variants >>> vcf <- readVcf("variants.vcf", genome="ce6") >>> dim(vcf) >>> #[1] 3 1 >>> >>> # predict the impact on coding sequences >>> aa <- predictCoding(query=vcf, subject=txdb, seqSource=Celegans) >>> length(aa) >>> #[1] 7 >>> >>> >>> # My question is related to the impact of variant "chrV:15822727": >>> # This SNV is overlapping two transcripts (same base, plus strand), >>> # yet the reported VARCODON values are different (GAT -> TAT/AAT): >>> aa[names(aa)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")] >>> >>> >>> #GRanges with 2 ranges and 5 metadata columns: >>> # seqnames ranges strand | >>> # <Rle> <IRanges> <Rle> | >>> # chrV:15822727 chrV [15822727, 15822727] + | >>> # chrV:15822727 chrV [15822727, 15822727] + | >>> # REF ALT varAllele >>> # <DNAStringSet> <DNAStringSetList> <DNAStringSet> >>> # chrV:15822727 G A A >>> # chrV:15822727 G A A >>> # REFCODON VARCODON >>> # <DNAStringSet> <DNAStringSet> >>> # chrV:15822727 GAT TAT >>> # chrV:15822727 GAT AAT >>> # --- >>> # seqlengths: >>> # chrI chrV >>> # NA NA >>> >>> >>> # interestingly, when altering the annotation or the set of variants, >>> # this contradictions disapears (GAT -> AAT): >>> vcf2 <- vcf[-1] # remove the first SNV >>> aa2 <- predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) >>> length(aa2) >>> #[1] 5 >>> aa2[names(aa2)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")] >>> >>> >>> #GRanges with 2 ranges and 5 metadata columns: >>> # seqnames ranges strand | >>> # <Rle> <IRanges> <Rle> | >>> # chrV:15822727 chrV [15822727, 15822727] + | >>> # chrV:15822727 chrV [15822727, 15822727] + | >>> # REF ALT varAllele >>> # <DNAStringSet> <DNAStringSetList> <DNAStringSet> >>> # chrV:15822727 G A A >>> # chrV:15822727 G A A >>> # REFCODON VARCODON >>> # <DNAStringSet> <DNAStringSet> >>> # chrV:15822727 GAT AAT >>> # chrV:15822727 GAT AAT >>> # --- >>> # seqlengths: >>> # chrI chrV >>> # NA NA >>> >>> Do you know why this could be the case? >>> Regards, >>> Michael >>> >>> >>> Here is my envirnoment: >>> sessionInfo() >>> R Under development (unstable) (2013-02-25 r62061) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets >>> [7] methods base >>> >>> other attached packages: >>> [1] BSgenome.Celegans.UCSC.ce6_1.3.19 >>> [2] BSgenome_1.27.1 >>> [3] GenomicFeatures_1.11.14 >>> [4] AnnotationDbi_1.21.12 >>> [5] Biobase_2.19.3 >>> [6] VariantAnnotation_1.5.42 >>> [7] Rsamtools_1.11.21 >>> [8] Biostrings_2.27.11 >>> [9] GenomicRanges_1.11.35 >>> [10] IRanges_1.17.35 >>> [11] BiocGenerics_0.5.6 >>> [12] RColorBrewer_1.0-5 >>> >>> loaded via a namespace (and not attached): >>> [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2 >>> [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5 >>> [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 >>> [10] zlibbioc_1.5.0 >>> >>> >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel