Except for the checksum, the existing File classes should support this, where the package provides a dataset via data() that is just the serialized File object (path). One could create a FileWithChecksum class that decorates a File object with a checksum. Any attempts to read the file are intercepted by the decorator, which verifies the checksum, and then delegates.
Michael On Tue, Mar 11, 2014 at 8:53 AM, Vincent Carey <st...@channing.harvard.edu>wrote: > I'm going to suggest a use case that may motivate this type of development. > > Up to 2010 or so, data packages generally made sense. You have about > 100-500MB of serialized or pre-serialized stuff. Installing it in an R > package is unpleasant from a resource consumption perspective but it works, > you can use data/extdata and work with data with programmatic access, > documentation and checkability. > > More recently, it is easy to come across data resources that we'd like to > have package-like control over/access to, but installing such packages > makes no sense. The volume is too big, and you want to work with the > resource with non-R tools as well from time to time. You don't want to > move the data. > > We should have a protocol for "packaging" data without installing it. A > digest of the raw data resource should be computed and kept in the > registry. A registered file can be part of a package that can be checked > and installed, but the data themselves do not move. Genomic data in S3 > buckets should provide a basic use case. > > The digest is recomputed whenever we want to start working with the > registry/package to verify that we are working with the intended artifact. > > > On Tue, Mar 11, 2014 at 11:11 AM, Gabriel Becker <gmbec...@ucdavis.edu>wrote: > >> Would it be better to let the user (registerer) specify a function, which >> could be a simple class constructor or something more complex in cases >> where that would be useful? >> >> This could allow the concept to generalize to other things, such as >> databases that might need some startup machinery called before they are >> actually useful to the user. >> >> This would also deal with Michael's point about package/where since >> functions have their own "where" information. Unless I'm missing some >> other >> intent for specifying a specific package? >> >> ~G >> >> >> On Tue, Mar 11, 2014 at 5:59 AM, Michael Lawrence < >> lawrence.mich...@gene.com >> > wrote: >> >> > rtracklayer essentially has this, although registration is implicit >> through >> > extension of RTLFile or RsamtoolsFile, and the extension is taken from >> the >> > class name. There is a BigWigFile, corresponding to ".bigwig", and that >> is >> > extended by BWFile to support the ".bw" extension. The expectation is >> that >> > other packages would extend RTLFile to implictly register handlers. I'm >> > not sure there is a use case for generalization, but this proposal makes >> > registration more explicit, which is probably a good thing. rtracklayer >> was >> > just piggy backing on S4 registration. >> > >> > I'm a little bit confused by the use of Lists rather than individual >> File >> > objects. Are you also proposing that all RTLFiles would need a >> > corresponding List, and that there would need to be an RTLFileList >> method >> > for the various generics? >> > >> > It may not be necessary to specify the package name. There should be an >> > environment (where) argument that defaults to topenv(parent.frame()), >> and >> > that should suffice. >> > >> > Michael >> > >> > >> > On Mon, Mar 10, 2014 at 8:46 PM, Valerie Obenchain <voben...@fhcrc.org >> > >wrote: >> > >> > > Hi all, >> > > >> > > I'm soliciting feedback on the idea of a general file 'registry' that >> > > would identify file types by their extensions. This is similar in >> spirit >> > to >> > > FileForformat() in rtracklayer but a more general abstraction that >> could >> > be >> > > used across packages. The goal is to allow a user to supply only file >> > > name(s) to a method instead of first creating a 'File' class such as >> > > BamFile, FaFile, BigWigFile etc. >> > > >> > > A first attempt at this is in the GenomicFileViews package ( >> > > https://github.com/Bioconductor/GenomicFileViews). A registry >> (lookup) >> > is >> > > created as an environment at load time: >> > > >> > > .fileTypeRegistry <- new.env(parent=emptyenv() >> > > >> > > Files are registered with an information triplet consisting of class, >> > > package and regular expression to identify the extension. In >> > > GenomicFileViews we register FaFileList, BamFileList and >> BigWigFileList >> > but >> > > any 'File' class can be registered that has a constructor of the same >> > name. >> > > >> > > .onLoad <- function(libname, pkgname) >> > > { >> > > registerFileType("FaFileList", "Rsamtools", "\\.fa$") >> > > registerFileType("FaFileList", "Rsamtools", "\\.fasta$") >> > > registerFileType("BamFileList", "Rsamtools", "\\.bam$") >> > > registerFileType("BigWigFileList", "rtracklayer", "\\.bw$") >> > > } >> > > >> > > The makeFileType() helper creates the appropriate class. This >> function is >> > > used behind the scenes to do the lookup and coerce to the correct >> 'File' >> > > class. >> > > >> > > > makeFileType(c("foo.bam", "bar.bam")) >> > > BamFileList of length 2 >> > > names(2): foo.bam bar.bam >> > > >> > > New types can be added at any time with registerFileType(): >> > > >> > > registerFileType(NewClass, NewPackage, "\\.NewExtension$") >> > > >> > > >> > > Thoughts: >> > > >> > > (1) If this sounds generally useful where should it live? rtracklayer, >> > > GenomicFileViews or other? Alternatively it could be its own >> lightweight >> > > package (FileRegister) that creates the registry and provides the >> > helpers. >> > > It would be up to the package authors that depend on FileRegister to >> > > register their own files types at load time. >> > > >> > > (2) To avoid potential ambiguities maybe searching should be by regex >> and >> > > package name. Still a work in progress. >> > > >> > > >> > > Valerie >> > > >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioc-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > >> >> >> >> -- >> Gabriel Becker >> Graduate Student >> Statistics Department >> University of California, Davis >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel