I'm going to suggest a use case that may motivate this type of development.
Up to 2010 or so, data packages generally made sense. You have about 100-500MB of serialized or pre-serialized stuff. Installing it in an R package is unpleasant from a resource consumption perspective but it works, you can use data/extdata and work with data with programmatic access, documentation and checkability. More recently, it is easy to come across data resources that we'd like to have package-like control over/access to, but installing such packages makes no sense. The volume is too big, and you want to work with the resource with non-R tools as well from time to time. You don't want to move the data. We should have a protocol for "packaging" data without installing it. A digest of the raw data resource should be computed and kept in the registry. A registered file can be part of a package that can be checked and installed, but the data themselves do not move. Genomic data in S3 buckets should provide a basic use case. The digest is recomputed whenever we want to start working with the registry/package to verify that we are working with the intended artifact. On Tue, Mar 11, 2014 at 11:11 AM, Gabriel Becker <gmbec...@ucdavis.edu>wrote: > Would it be better to let the user (registerer) specify a function, which > could be a simple class constructor or something more complex in cases > where that would be useful? > > This could allow the concept to generalize to other things, such as > databases that might need some startup machinery called before they are > actually useful to the user. > > This would also deal with Michael's point about package/where since > functions have their own "where" information. Unless I'm missing some other > intent for specifying a specific package? > > ~G > > > On Tue, Mar 11, 2014 at 5:59 AM, Michael Lawrence < > lawrence.mich...@gene.com > > wrote: > > > rtracklayer essentially has this, although registration is implicit > through > > extension of RTLFile or RsamtoolsFile, and the extension is taken from > the > > class name. There is a BigWigFile, corresponding to ".bigwig", and that > is > > extended by BWFile to support the ".bw" extension. The expectation is > that > > other packages would extend RTLFile to implictly register handlers. I'm > > not sure there is a use case for generalization, but this proposal makes > > registration more explicit, which is probably a good thing. rtracklayer > was > > just piggy backing on S4 registration. > > > > I'm a little bit confused by the use of Lists rather than individual File > > objects. Are you also proposing that all RTLFiles would need a > > corresponding List, and that there would need to be an RTLFileList method > > for the various generics? > > > > It may not be necessary to specify the package name. There should be an > > environment (where) argument that defaults to topenv(parent.frame()), and > > that should suffice. > > > > Michael > > > > > > On Mon, Mar 10, 2014 at 8:46 PM, Valerie Obenchain <voben...@fhcrc.org > > >wrote: > > > > > Hi all, > > > > > > I'm soliciting feedback on the idea of a general file 'registry' that > > > would identify file types by their extensions. This is similar in > spirit > > to > > > FileForformat() in rtracklayer but a more general abstraction that > could > > be > > > used across packages. The goal is to allow a user to supply only file > > > name(s) to a method instead of first creating a 'File' class such as > > > BamFile, FaFile, BigWigFile etc. > > > > > > A first attempt at this is in the GenomicFileViews package ( > > > https://github.com/Bioconductor/GenomicFileViews). A registry (lookup) > > is > > > created as an environment at load time: > > > > > > .fileTypeRegistry <- new.env(parent=emptyenv() > > > > > > Files are registered with an information triplet consisting of class, > > > package and regular expression to identify the extension. In > > > GenomicFileViews we register FaFileList, BamFileList and BigWigFileList > > but > > > any 'File' class can be registered that has a constructor of the same > > name. > > > > > > .onLoad <- function(libname, pkgname) > > > { > > > registerFileType("FaFileList", "Rsamtools", "\\.fa$") > > > registerFileType("FaFileList", "Rsamtools", "\\.fasta$") > > > registerFileType("BamFileList", "Rsamtools", "\\.bam$") > > > registerFileType("BigWigFileList", "rtracklayer", "\\.bw$") > > > } > > > > > > The makeFileType() helper creates the appropriate class. This function > is > > > used behind the scenes to do the lookup and coerce to the correct > 'File' > > > class. > > > > > > > makeFileType(c("foo.bam", "bar.bam")) > > > BamFileList of length 2 > > > names(2): foo.bam bar.bam > > > > > > New types can be added at any time with registerFileType(): > > > > > > registerFileType(NewClass, NewPackage, "\\.NewExtension$") > > > > > > > > > Thoughts: > > > > > > (1) If this sounds generally useful where should it live? rtracklayer, > > > GenomicFileViews or other? Alternatively it could be its own > lightweight > > > package (FileRegister) that creates the registry and provides the > > helpers. > > > It would be up to the package authors that depend on FileRegister to > > > register their own files types at load time. > > > > > > (2) To avoid potential ambiguities maybe searching should be by regex > and > > > package name. Still a work in progress. > > > > > > > > > Valerie > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > -- > Gabriel Becker > Graduate Student > Statistics Department > University of California, Davis > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel