that's great Hervé and Sonali! thanks for the quick response.
best, m On Thu, Jun 18, 2015 at 10:07 AM, Sonali Arora <sar...@fredhutch.org> wrote: > Hi Michael, Herve, > > > On 6/17/2015 9:43 PM, Hervé Pagès wrote: >> >> Hi Michael, >> >> On 06/17/2015 12:35 AM, Michael Love wrote: >>> >>> Background: >>> >>> With previous approaches that I would recommend to users for building >>> txdb along the way of making count tables, it was desirable that the >>> GTF release information would *automatically* be passed into the >>> metadata of the rowRanges of the SummarizedExperiment. >>> >>> for example, in parathyroidSE, using makeTxDbFromBiomart: >>> >>> ... >>> BioMart database version : chr "ENSEMBL GENES 72 (SANGER UK)" >>> BioMart dataset : chr "hsapiens_gene_ensembl" >>> BioMart dataset description : chr "Homo sapiens genes (GRCh37.p11)" >>> >>> or, alternatively, using makeTxDbFromGFF, it would be present in the >>> name of the GTF file: >>> >>> ... >>> Data source: chr "Homo_sapiens.GRCh37.75.gtf" >>> >>> >>> Question: >>> >>> I'm now interested in switching over to using GTF files available >>> through AnnotationHub, and wondering how we can maintain this >>> automatic propagation of metadata. >>> >>> here's some example code: >>> >>> library(GenomicFeatures) >>> library(AnnotationHub) >>> ah <- AnnotationHub() >>> z <- query(ah, c("Ensembl","gtf","Caenorhabditis elegans","release-80")) >>> stopifnot(length(z) == 1) >>> z$title >>> >>> [1] "Caenorhabditis_elegans.WBcel235.80.gtf" >>> >>> gtf <- ah[[names(z)]] >>> metadata(gtf) >>> >>> $AnnotationHubName >>> [1] "AH47045" >> >> >> A first (and hopefully easy) improvement would be that the GRanges and >> other objects on AnnotationHub had more information in their metadata(). > > > The GRanges from AnnotationHub now include more information in their > metadata() > slot. This is added in devel (2.1.27). > > library(AnnotationHub) > ah = AnnotationHub() > packageVersion('AnnotationHub') > gtf <- query(ah, c('gtf', 'homo sapiens', '77', 'ensembl')) > gr <- gtf[[1]] > gr > metadata(gr) > >> metadata(gr) > $AnnotationHubName > [1] "AH28812" > > $`File Name` > [1] "Homo_sapiens.GRCh38.77.gtf.gz" > > $`Data Source` > [1] > "ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz" > > $Provider > [1] "Ensembl" > > $Organism > [1] "Homo sapiens" > > $`Taxonomy ID` > [1] 9606 > > > Only some of the fields have been added here, but note that all others can > be accessed with - > > mcols(ah["AH47045"]) > > or > > ah[metadata(gr)$AnnotationHubName] > > > Thanks, > Sonali. > >> >> In the mean time, here is a workaround (in 2 steps): >> >> 1) Add some useful metadata to 'gtf' (grabbed from the AnnotationHub >> object): >> >> addMetadataFromAnnotationHub <- function(x, ah) >> { >> stopifnot(length(ah) == 1L) >> metadata0 <- list( >> `Data source`=ah$sourceurl, >> `Provider`=ah$dataprovider, >> `Organism`=ah$species, >> `Taxonomy ID`=ah$taxonomyid >> ) >> metadata(x) <- c(metadata0, metadata(x)) >> x >> } >> >> gtf <- addMetadataFromAnnotationHub(gtf, z) >> >> metadata(gtf) >> # $`Data source` >> # [1] >> "ftp://ftp.ensembl.org/pub/release-80/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.80.gtf.gz" >> # >> # $Provider >> # [1] "Ensembl" >> # >> # $Organism >> # [1] "Caenorhabditis elegans" >> # >> # $`Taxonomy ID` >> # [1] 6239 >> # >> # $AnnotationHubName >> # [1] "AH47045" >> >> 2) Pass the metadata to makeTxDbFromGRanges(): >> >> txdb <- makeTxDbFromGRanges(gtf, >> metadata=data.frame( >> name=names(metadata(gtf)), >> value=as.character(metadata(gtf)))) >> txdb >> # TxDb object: >> # Db type: TxDb >> # Supporting package: GenomicFeatures >> # Data source: >> ftp://ftp.ensembl.org/pub/release-80/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.80.gtf.gz >> # Provider: Ensembl >> # Organism: Caenorhabditis elegans >> # Taxonomy ID: 6239 >> # AnnotationHubName: AH47045 >> # Genome: WBcel235 >> # transcript_nrow: 57834 >> # exon_nrow: 173506 >> # cds_nrow: 131562 >> # Db created by: GenomicFeatures package from Bioconductor >> # Creation time: 2015-06-17 12:24:51 -0700 (Wed, 17 Jun 2015) >> # GenomicFeatures version at creation time: 1.21.13 >> # RSQLite version at creation time: 1.0.0 >> # DBSCHEMAVERSION: 1.1 >> >> organism(txdb) >> # [1] "Caenorhabditis elegans" >> taxonomyId(txdb) >> # [1] 6239 >> >> Step 2) should be made easier because the metadata is already in 'gtf' >> so there is no reason why the user would need to pass it again thru the >> 'metadata' argument. I'll made that change to makeTxDbFromGRanges(). >> >> H. >> >>> >>> txdb <- makeTxDbFromGRanges(gtf) >>> metadata(txdb) >>> >>> name >>> 1 Db type >>> 2 Supporting package >>> 3 Genome >>> 4 transcript_nrow >>> 5 exon_nrow >>> 6 cds_nrow >>> 7 Db created by >>> 8 Creation time >>> 9 GenomicFeatures version at creation time >>> 10 RSQLite version at creation time >>> 11 DBSCHEMAVERSION >>> value >>> 1 TxDb >>> 2 GenomicFeatures >>> 3 WBcel235 >>> 4 57834 >>> 5 173506 >>> 6 131562 >>> 7 GenomicFeatures package from Bioconductor >>> 8 2015-06-17 09:31:10 +0200 (Wed, 17 Jun 2015) >>> 9 1.21.13 >>> 10 1.0.0 >>> 11 1.1 >>> >>> >>> sessionInfo() >>> >>> R Under development (unstable) (2015-04-29 r68278) >>> Platform: x86_64-apple-darwin12.5.0 (64-bit) >>> Running under: OS X 10.10.3 (Yosemite) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats4 parallel stats graphics grDevices datasets utils >>> [8] methods base >>> >>> other attached packages: >>> [1] AnnotationHub_2.1.26 GenomicFeatures_1.21.13 >>> AnnotationDbi_1.31.16 >>> [4] Biobase_2.29.1 GenomicRanges_1.21.15 GenomeInfoDb_1.5.7 >>> [7] IRanges_2.3.11 S4Vectors_0.7.5 BiocGenerics_0.15.2 >>> [10] testthat_0.10.0 knitr_1.10 BiocInstaller_1.19.6 >>> >>> loaded via a namespace (and not attached): >>> [1] Rcpp_0.11.6 compiler_3.3.0 >>> [3] futile.logger_1.4.1 XVector_0.9.1 >>> [5] bitops_1.0-6 futile.options_1.0.0 >>> [7] tools_3.3.0 zlibbioc_1.15.0 >>> [9] biomaRt_2.25.1 digest_0.6.8 >>> [11] RSQLite_1.0.0 memoise_0.2.1 >>> [13] shiny_0.12.1 DBI_0.3.1 >>> [15] rtracklayer_1.29.10 httr_0.6.1 >>> [17] stringr_1.0.0 Biostrings_2.37.2 >>> [19] R6_2.0.1 XML_3.98-1.2 >>> [21] BiocParallel_1.3.26 lambda.r_1.1.7 >>> [23] magrittr_1.5 Rsamtools_1.21.8 >>> [25] htmltools_0.2.6 GenomicAlignments_1.5.9 >>> [27] SummarizedExperiment_0.1.5 xtable_1.7-4 >>> [29] mime_0.3 interactiveDisplayBase_1.7.0 >>> [31] httpuv_1.3.2 stringi_0.4-1 >>> [33] RCurl_1.95-4.6 crayon_1.3.0 >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> > > -- > Thanks and Regards, > Sonali > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel