Here is the second attachment ("variants.vcf") that was missing from the first message.
Michael On 03/07/2013 02:40 PM, Michael Stadler wrote: > Dear Valerie and colleagues, > > I have recently started using the VariantAnnotation package, which is a > huge asset for much of my work with sequence variants - thank you! > > I have a question regarding the following example (it makes use of two > small text files, "annot.gtf" and "variants.vcf", attached to this message): > > library(VariantAnnotation) > library(GenomicFeatures) > library(BSgenome.Celegans.UCSC.ce6) > > # build a TranscriptDb from "annot.gtf" > chrominfo <- data.frame(chrom=as.character(seqnames(Celegans)), > length=seqlengths(Celegans), > is_circular=FALSE) > txdb <- makeTranscriptDbFromGFF(file="annot.gtf", > format="gtf", > chrominfo=chrominfo, > dataSource="WormBase v. WS190", > species="Caenorhabditis elegans") > txdb > #TranscriptDb object: > #| Db type: TranscriptDb > #| Supporting package: GenomicFeatures > #| Data source: WormBase v. WS190 > #| Organism: Caenorhabditis elegans > #| miRBase build ID: NA > #| transcript_nrow: 7 > #| exon_nrow: 42 > #| cds_nrow: 42 > #| Db created by: GenomicFeatures package from Bioconductor > #| Creation time: 2013-03-07 14:21:08 +0100 (Thu, 07 Mar 2013) > #| GenomicFeatures version at creation time: 1.11.14 > #| RSQLite version at creation time: 0.11.2 > #| DBSCHEMAVERSION: 1.0 > > # read in sequence variants > vcf <- readVcf("variants.vcf", genome="ce6") > dim(vcf) > #[1] 3 1 > > # predict the impact on coding sequences > aa <- predictCoding(query=vcf, subject=txdb, seqSource=Celegans) > length(aa) > #[1] 7 > > > # My question is related to the impact of variant "chrV:15822727": > # This SNV is overlapping two transcripts (same base, plus strand), > # yet the reported VARCODON values are different (GAT -> TAT/AAT): > aa[names(aa)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")] > > #GRanges with 2 ranges and 5 metadata columns: > # seqnames ranges strand | > # <Rle> <IRanges> <Rle> | > # chrV:15822727 chrV [15822727, 15822727] + | > # chrV:15822727 chrV [15822727, 15822727] + | > # REF ALT varAllele > # <DNAStringSet> <DNAStringSetList> <DNAStringSet> > # chrV:15822727 G A A > # chrV:15822727 G A A > # REFCODON VARCODON > # <DNAStringSet> <DNAStringSet> > # chrV:15822727 GAT TAT > # chrV:15822727 GAT AAT > # --- > # seqlengths: > # chrI chrV > # NA NA > > > # interestingly, when altering the annotation or the set of variants, > # this contradictions disapears (GAT -> AAT): > vcf2 <- vcf[-1] # remove the first SNV > aa2 <- predictCoding(query=vcf2, subject=txdb, seqSource=Celegans) > length(aa2) > #[1] 5 > aa2[names(aa2)=="chrV:15822727",c("REF","ALT","varAllele","REFCODON","VARCODON")] > > #GRanges with 2 ranges and 5 metadata columns: > # seqnames ranges strand | > # <Rle> <IRanges> <Rle> | > # chrV:15822727 chrV [15822727, 15822727] + | > # chrV:15822727 chrV [15822727, 15822727] + | > # REF ALT varAllele > # <DNAStringSet> <DNAStringSetList> <DNAStringSet> > # chrV:15822727 G A A > # chrV:15822727 G A A > # REFCODON VARCODON > # <DNAStringSet> <DNAStringSet> > # chrV:15822727 GAT AAT > # chrV:15822727 GAT AAT > # --- > # seqlengths: > # chrI chrV > # NA NA > > Do you know why this could be the case? > Regards, > Michael > > > Here is my envirnoment: > sessionInfo() > R Under development (unstable) (2013-02-25 r62061) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets > [7] methods base > > other attached packages: > [1] BSgenome.Celegans.UCSC.ce6_1.3.19 > [2] BSgenome_1.27.1 > [3] GenomicFeatures_1.11.14 > [4] AnnotationDbi_1.21.12 > [5] Biobase_2.19.3 > [6] VariantAnnotation_1.5.42 > [7] Rsamtools_1.11.21 > [8] Biostrings_2.27.11 > [9] GenomicRanges_1.11.35 > [10] IRanges_1.17.35 > [11] BiocGenerics_0.5.6 > [12] RColorBrewer_1.0-5 > > loaded via a namespace (and not attached): > [1] DBI_0.2-5 RCurl_1.95-4.1 RSQLite_0.11.2 > [4] XML_3.95-0.1 biomaRt_2.15.0 bitops_1.0-5 > [7] rtracklayer_1.19.9 stats4_3.0.0 tools_3.0.0 > [10] zlibbioc_1.5.0 > > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- -------------------------------------------- Michael Stadler, PhD Head of Computational Biology Friedrich Miescher Institute Basel (Switzerland) Phone : +41 61 697 6492 Fax : +41 61 697 3976 Mail : michael.stad...@fmi.ch --------------------------------------------
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel