Dear Johannes, I am going to update the GRanges object ( from Ensembl's GTF files) with the necessary information. Once that's done, I'll post back here.
Thanks, Sonali. On 4/10/2015 12:18 PM, Rainer Johannes wrote: > dear Sonali, Herve, > >> On 10 Apr 2015, at 19:59, Herv� Pag�s <hpa...@fredhutch.org >> <mailto:hpa...@fredhutch.org>> wrote: >> >> Hi Johannes, Sonali, >> >> On 04/10/2015 09:40 AM, Arora, Sonali wrote: >>> Hi Rainer, >>> >>> Just to be clear - what do you want to be available from AnnotationHub() >>> in the end? >>> >>> Currently the GTF files from Ensembl are already present inside the >>> AnnotationHub >>> >>> library(AnnotationHub) >>> ah = AnnotationHub() >>> gtf <- query(ah, "GTF") >>> gtf <- query(gtf, "Ensembl") >>> gtf[1] >>> gtf[[1]] # returned to you as GenomicRanges object. >>> >>> - why not get the GTF files directly from AnnotationHub instead of >>> getting them from the ftp site? Then you can make your EnsDb classes >>> from these GRanges. >>> It will also make your recipe faster because you will not have to >>> download the file and parse the object. >> >> A GRanges object is not the same as a GTF file and I guess Johannes >> wants access to the GTF file. Are these GTF files available on >> AnnotationHub? >> > > yes, you're right. I wanted access to the GTF file and most likely > understood the AnnotationHub idea wrong... my idea was to build a > recipe that takes as input the GTF file (as the > makeEnsemblGtfToGRanges) and generates from that the EnsDb SQLite > database file. I thought that these SQLite files would be generated on > the fly on the user's computer, but I guess that stuff is processed > once and stored on your servers, right? > > >> @Johannes - Here is one alternative: You could take a different approach >> and implement some equivalent of makeTxDbFromGRanges() for EnsDb >> objects. So people could just do: >> >> library(ensembldb) >> ensdb <- makeEnsDbFromGRanges(gtf[[1]]) >> >> like they can do right now with makeTxDbFromGRanges(): >> >> library(GenomicFeatures) >> txdb <- makeTxDbFromGRanges(gtf[[1]]) >> >> That way you don't need a recipe or try to add things to >> AnnotationHub at all. >> > > that's a good idea, I will implement that too. just want to make sure > that I can get all data I'll need (also the genome build version, > Ensembl version etc from the GRanges, most likely I have to guess that > from the file name of the RData file). > >> @Sonali - These GRanges objects I get from AnnotationHub have no genome >> information and their seqlevels are not sorted: >> >> > seqinfo(gtf[[1]]) >> Seqinfo object with 22 sequences from an unspecified genome; no >> seqlengths: >> seqnames seqlengths isCircular genome >> X <NA> <NA> <NA> >> 9 <NA> <NA> <NA> >> 8 <NA> <NA> <NA> >> 7 <NA> <NA> <NA> >> 6 <NA> <NA> <NA> >> ... ... ... ... >> 12 <NA> <NA> <NA> >> 11 <NA> <NA> <NA> >> 10 <NA> <NA> <NA> >> 1 <NA> <NA> <NA> >> MT <NA> <NA> <NA> >> >> I know it's easy enough to sort the seqlevels with sortSeqlevels() but >> what about having these things done by the recipe instead? >> > > I also have a suggestion there: what if you used also > the fetchChromLengthsFromEnsembl from the GenomicFeatures package? the > GTF files are anyway from Ensembl, so getting the seqinfo from there > would make sense... and I wouldn't have to fetch it separately to > build the EnsDb. > > thanks! > jo > >> Thanks, >> H. >> >> >>> >>> Thanks, >>> Sonali. >>> >>> >>> On 4/9/2015 11:14 PM, Rainer Johannes wrote: >>>> dear all, >>>> >>>> I have added a recipe to the AnnotationHubData to provide EnsDb >>>> classes (from my ensembldb package) based on GTF files from Ensembl. >>>> Now, after adding the recipe to the AnnotationHubData package and >>>> installing it (following the vignettes from the AnnotationHub and >>>> AnnotationHubData) I called >>>> >>>> updateResources(AnnotationHubRoot=getWd(), BiocVersion=biocVersion(), >>>> preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE, >>>> metadataOnly=TRUE) >>>> >>>> and got the output: >>>> >>>> Ailuropoda_melanoleuca.ailMel1.78.gtf.gz >>>> Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz >>>> Anolis_carolinensis.AnoCar2.0.78.gtf.gz >>>> Astyanax_mexicanus.AstMex102.78.gtf.gz >>>> Bos_taurus.UMD3.1.78.gtf.gz >>>> Caenorhabditis_elegans.WBcel235.78.gtf.gz >>>> Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz >>>> Canis_familiaris.CanFam3.1.78.gtf.gz >>>> Cavia_porcellus.cavPor3.78.gtf.gz >>>> Chlorocebus_sabaeus.ChlSab1.1.78.gtf.gz >>>> Choloepus_hoffmanni.choHof1.78.gtf.gz >>>> Ciona_intestinalis.KH.78.gtf.gz >>>> Ciona_savignyi.CSAV2.0.78.gtf.gz >>>> Danio_rerio.Zv9.78.gtf.gz >>>> Dasypus_novemcinctus.Dasnov3.0.78.gtf.gz >>>> Dipodomys_ordii.dipOrd1.78.gtf.gz >>>> Drosophila_melanogaster.BDGP5.78.gtf.gz >>>> Error in function (type, msg, asError = TRUE) : Access denied: 530 >>>> >>>> I guess that must be related to the Ensembl ftp? Is anybody else >>>> experiencing this error? >>>> >>>> cheers, jo >>>> >>>> >>>> my session info: >>>> >>>>> sessionInfo() >>>> R Under development (unstable) (2015-03-04 r67940) >>>> Platform: x86_64-apple-darwin14.3.0/x86_64 (64-bit) >>>> Running under: OS X 10.10.3 (Yosemite) >>>> >>>> locale: >>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] parallel stats4 stats graphics grDevices utils >>>> datasets >>>> [8] methods base >>>> >>>> other attached packages: >>>> [1] AnnotationHubData_0.0.205 futile.logger_1.4 >>>> [3] AnnotationHub_1.99.81 GenomicRanges_1.19.52 >>>> [5] GenomeInfoDb_1.3.16 IRanges_2.1.43 >>>> [7] S4Vectors_0.5.22 BiocGenerics_0.13.11 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] Rcpp_0.11.5 BiocInstaller_1.17.7 >>>> [3] XVector_0.7.4 futile.options_1.0.0 >>>> [5] GenomicFeatures_1.19.37 bitops_1.0-6 >>>> [7] tools_3.2.0 zlibbioc_1.13.3 >>>> [9] biomaRt_2.23.5 digest_0.6.8 >>>> [11] BSgenome_1.35.20 jsonlite_0.9.15 >>>> [13] RSQLite_1.0.0 shiny_0.11.1 >>>> [15] DBI_0.3.1 rtracklayer_1.27.11 >>>> [17] httr_0.6.1 stringr_0.6.2 >>>> [19] Biostrings_2.35.12 Biobase_2.27.3 >>>> [21] R6_2.0.1 AnnotationDbi_1.29.21 >>>> [23] XML_3.98-1.1 BiocParallel_1.1.24 >>>> [25] RJSONIO_1.3-0 ensembldb_0.99.15 >>>> [27] lambda.r_1.1.7 Rsamtools_1.19.50 >>>> [29] htmltools_0.2.6 GenomicAlignments_1.3.34 >>>> [31] AnnotationForge_1.9.7 mime_0.3 >>>> [33] interactiveDisplayBase_1.5.6 xtable_1.7-4 >>>> [35] httpuv_1.3.2 RCurl_1.95-4.5 >>>> [37] VariantAnnotation_1.13.47 >>>> _______________________________________________ >>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> -- >> Herv� Pag�s >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail:hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel