We do similar testing (mostly upstream of package building) for GEOmetadb and SRAdb. I have been thinking of this problem as "integration testing" rather than "unit testing".
https://stackoverflow.com/questions/5357601/whats-the-difference-between-unit-tests-and-integration-tests The build system as it exists is great for unit testing, but not so much for integration tests. Workflow-based "testing" might fall under the integration testing definition. The unit testing frameworks in R (testthat, for example) can be applied to integration testing, but I think it is worth keeping the two types of testing somewhat separate for the reasons pointed out in the stackoverflow discussion. Sean On Thu, Oct 26, 2017 at 11:41 PM, Levi Waldron <lwaldron.resea...@gmail.com> wrote: > The specific cases I had in mind were curatedMetagenomicData and > curatedTCGAData - thinking that the entire databases should be downloaded > and syntax-checked at some point, because there could be problems either > with the remote data files or how the convenience downloading functions > process them. But they're big downloads, so it's slow and hundreds of GB > download. > > Regular http size and timestamp tests would be lightweight and good (sounds > doable if I don't exactly how yet). But still, in curatedTCGAData > especially, the post-download processing is complicated: converting tables > to SummarizedExperiment and RaggedExperiment, mapping the columns in each > omics data type to the clinical & pathological data, and assembling a > MultiAssayExperiment containing any combination of omics types for one or > more cancer types, and adding metadata (sorry for the shameless plug for > the forthcoming curatedTCGAData...). But anyways I'm not sure how to know > the objects will be assembled correctly without testing them in normal use > situations. > > For non wastefulness, it's actually something where a workflow-type check > would make sense - only run if a size, timestamp, or downloader/assembler > file is changed, not documentation or some other change unrelated to the > download & assembly. > > > > On Thu, Oct 26, 2017 at 5:11 AM, Martin Morgan < > martin.mor...@roswellpark.org> wrote: > > > There is currently some capacity in the build system to support > 'extended' > > builds. > > > > One possibility would be to provide facilities for packages to 'opt in' > to > > a distinct 'extended' build, with a (weekly?) build report. One could > also > > just increase the timeouts of the current builds. > > > > I think there is considerable value to imposing relatively severe time > and > > space limitations on packages. A lot of R code is very poorly written, > and > > the limits force the developer to confront that; admittedly a common > > response is not to write better R code. The unit test concept is really > > about highly focused tests on modular software; my own 'long' tests have > in > > retrospect often been misguided attempts to throw the kitchen sink at > code > > and hope that it covers things, rather than to decompose complicated > > functions into testable units that can then be assembled with some degree > > of confidence. Some of the most challenging code to test involves web > > services; probably the approach is not to perform numerous queries but to > > verify that that the service is responsive and providing a version that > > your package supports, with non-web queries validating conformance to the > > version. Often build times are dominated by vignettes analyzing 'real' > > data; these are probably more suited to ExperimentData packages where > there > > are already more liberal space and time limits, and where the extended > > computation time does not undermine the pedagogical value of easily > > reproduced vignette code. > > > > I wonder how many people would opt in to an extended build. That wasn't, > > for instance, what Levi asked about at the start of the thread. > > > > Martin > > > > > > On 10/25/2017 01:05 PM, Vincent Carey wrote: > > > >> What about some more hardware to improve throughput? I think > complicating > >> the test > >> driving software is less desirable -- although perhaps it is just a day > of > >> week check somewhere. > >> I can imagine that it fails on wednesday but then passes on thursday and > >> developer ignores the event... > >> The failure has to become sticky. I vote for more hardware and a > uniform > >> and stringent testing protocol. > >> > >> If there is no grant money for hardware maybe we have to look for more > >> commercial sponsorship. This > >> part of the project should not be pinching pennies. > >> > >> > >> On Wed, Oct 25, 2017 at 12:56 PM, Kasper Daniel Hansen < > >> kasperdanielhan...@gmail.com> wrote: > >> > >> I think we need to think about this in the long term. Can we add support > >>> for these major tests in the build system, perhaps not every day, but > >>> every > >>> week or month? The alternative, that it is up to the developer, is not > >>> great I think. We should still advocate for people writing quicker > >>> tests, > >>> but there are some things which just take time. The advantage of the > >>> build > >>> system is that it gets tested on the official 3 platforms, with > official > >>> setup. > >>> > >>> Best, > >>> Kasper > >>> > >>> > >>> > >>> On Wed, Oct 25, 2017 at 11:27 AM, Michael Lawrence < > >>> lawrence.mich...@gene.com> wrote: > >>> > >>> Looks like BiocCodeTools should start checking whether people are using > >>>> that and at least make a NOTE of it. > >>>> > >>>> On Tue, Oct 24, 2017 at 8:17 PM, Peter Hickey <peter.hic...@gmail.com > > > >>>> wrote: > >>>> > >>>> A partial answer if you are using the 'testthat' framework: you can > use > >>>>> `testthat::skip_on_bioc()` to specify that a test should be skipped > if > >>>>> > >>>> it > >>> > >>>> is running on the BioC build machines. The test will otherwise be run > >>>>> (e.g., during local development). There are some other > >>>>> > >>>> `testthat::skip*()` > >>>> > >>>>> functions that may also be useful. > >>>>> Cheers, > >>>>> Pete > >>>>> > >>>>> On Wed, 25 Oct 2017 at 12:47 Levi Waldron < > lwaldron.resea...@gmail.com > >>>>> > >>>> > >>>> wrote: > >>>>> > >>>>> Any thoughts about how to implement optional or "extra" unit tests, > >>>>>> > >>>>> that > >>>> > >>>>> are too resource intensive to be part of the Bioconductor daily > >>>>>> > >>>>> builds, > >>> > >>>> but > >>>>> > >>>>>> that should be run once in a while, say with major updates? > >>>>>> > >>>>>> [[alternative HTML version deleted]] > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioc-devel@r-project.org mailing list > >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>>> > >>>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioc-devel@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>> > >>>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> _______________________________________________ > >>>> Bioc-devel@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>> > >>>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioc-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>> > >>> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> > > > > This email message may contain legally privileged and/or...{{dropped:2}} > > > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > -- > Levi Waldron > http://www.waldronlab.org > Assistant Professor of Biostatistics CUNY School of Public Health > US: +1 646-364-9616 Skype: > levi.waldron > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel