I'm going to suggest a use case that may motivate this type of development.

Up to 2010 or so, data packages generally made sense.  You have about
100-500MB of serialized or pre-serialized stuff.  Installing it in an R
package is unpleasant from a resource consumption perspective but it works,
you can use data/extdata and work with data with programmatic access,
documentation and checkability.

More recently, it is easy to come across data resources that we'd like to
have package-like control over/access to, but installing such packages
makes no sense.  The volume is too big, and you want to work with the
resource with non-R tools as well from time to time.  You don't want to
move the data.

We should have a protocol for "packaging" data without installing it.  A
digest of the raw data resource should be computed and kept in the
registry.  A registered file can be part of a package that can be checked
and installed, but the data themselves do not move.  Genomic data in S3
buckets should provide a basic use case.

The digest is recomputed whenever we want to start working with the
registry/package to verify that we are working with the intended artifact.


On Tue, Mar 11, 2014 at 11:11 AM, Gabriel Becker <gmbec...@ucdavis.edu>wrote:

> Would it be better to let the user (registerer) specify a function, which
> could be a simple class constructor or something more complex in cases
> where that would be useful?
>
> This could allow the concept to generalize to other things, such as
> databases that might need some startup machinery called before they are
> actually useful to the user.
>
> This would also deal with Michael's point about package/where since
> functions have their own "where" information. Unless I'm missing some other
> intent for specifying a specific package?
>
> ~G
>
>
> On Tue, Mar 11, 2014 at 5:59 AM, Michael Lawrence <
> lawrence.mich...@gene.com
> > wrote:
>
> > rtracklayer essentially has this, although registration is implicit
> through
> > extension of RTLFile or RsamtoolsFile, and the extension is taken from
> the
> > class name. There is a BigWigFile, corresponding to ".bigwig", and that
> is
> > extended by BWFile to support the ".bw" extension. The expectation is
> that
> > other packages would extend RTLFile to implictly register handlers.  I'm
> > not sure there is a use case for generalization, but this proposal makes
> > registration more explicit, which is probably a good thing. rtracklayer
> was
> > just piggy backing on S4 registration.
> >
> > I'm a little bit confused by the use of Lists rather than individual File
> > objects. Are you also proposing that all RTLFiles would need a
> > corresponding List, and that there would need to be an RTLFileList method
> > for the various generics?
> >
> > It may not be necessary to specify the package name. There should be an
> > environment (where) argument that defaults to topenv(parent.frame()), and
> > that should suffice.
> >
> > Michael
> >
> >
> > On Mon, Mar 10, 2014 at 8:46 PM, Valerie Obenchain <voben...@fhcrc.org
> > >wrote:
> >
> > > Hi all,
> > >
> > > I'm soliciting feedback on the idea of a general file 'registry' that
> > > would identify file types by their extensions. This is similar in
> spirit
> > to
> > > FileForformat() in rtracklayer but a more general abstraction that
> could
> > be
> > > used across packages. The goal is to allow a user to supply only file
> > > name(s) to a method instead of first creating a 'File' class such as
> > > BamFile, FaFile, BigWigFile etc.
> > >
> > > A first attempt at this is in the GenomicFileViews package (
> > > https://github.com/Bioconductor/GenomicFileViews). A registry (lookup)
> > is
> > > created as an environment at load time:
> > >
> > > .fileTypeRegistry <- new.env(parent=emptyenv()
> > >
> > > Files are registered with an information triplet consisting of class,
> > > package and regular expression to identify the extension. In
> > > GenomicFileViews we register FaFileList, BamFileList and BigWigFileList
> > but
> > > any 'File' class can be registered that has a constructor of the same
> > name.
> > >
> > > .onLoad <- function(libname, pkgname)
> > > {
> > >     registerFileType("FaFileList", "Rsamtools", "\\.fa$")
> > >     registerFileType("FaFileList", "Rsamtools", "\\.fasta$")
> > >     registerFileType("BamFileList", "Rsamtools", "\\.bam$")
> > >     registerFileType("BigWigFileList", "rtracklayer", "\\.bw$")
> > > }
> > >
> > > The makeFileType() helper creates the appropriate class. This function
> is
> > > used behind the scenes to do the lookup and coerce to the correct
> 'File'
> > > class.
> > >
> > > > makeFileType(c("foo.bam", "bar.bam"))
> > > BamFileList of length 2
> > > names(2): foo.bam bar.bam
> > >
> > > New types can be added at any time with registerFileType():
> > >
> > > registerFileType(NewClass, NewPackage, "\\.NewExtension$")
> > >
> > >
> > > Thoughts:
> > >
> > > (1) If this sounds generally useful where should it live? rtracklayer,
> > > GenomicFileViews or other? Alternatively it could be its own
> lightweight
> > > package (FileRegister) that creates the registry and provides the
> > helpers.
> > > It would be up to the package authors that depend on FileRegister to
> > > register their own files types at load time.
> > >
> > > (2) To avoid potential ambiguities maybe searching should be by regex
> and
> > > package name. Still a work in progress.
> > >
> > >
> > > Valerie
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to