Re: [Bioc-devel] Controlling vignette compilation order

Pages, Herve Tue, 18 Dec 2018 08:52:55 -0800

Hi Aaron,

Right now 'R CMD build' evaluates all vignettes in the same R session. 
Personally I see this as an undesirable feature and hope that it will 
change in the future. Problem with this is that when a vignette hits the 
max DLL limit, breaking it down into smaller vignettes doesn't help. 
Another problem is that sometimes using 'R CMD Stangle && source()' does 
not reproduce a bug triggered by 'R CMD build'. I can spend a lot of 
time scratching my head on this until I finally realize that I first 
have to evaluate one of the other vignettes in order to reproduce the bug.


On this note I wish 'R CMD build' would show progress by printing the 
name of the vignettes it's currently evaluating (like 'R CMD check' does 
during the 'checking running R code from vignettes' step). Should be an 
easy improvement and it would already help a lot.

That being said I'm also sympathetic to your use case where sometimes a 
big monolithic vignette needs to be broken down into smaller units. I 
don't know of any way to control the order of evaluation other than 
using a Makefile for that though.

H.


On 12/18/18 06:58, Aaron Lun wrote:
> @Michael In this case, the resource produced by vignette X is a 
> SingleCellExperiment object containing the results of various processing 
> steps (normalization, clustering, etc.) described in that vignette.
>
> I can imagine a lazy evaluation model for this, but it wouldn’t be pretty. If 
> I had another vignette Y that depended on the SCE produced by vignette X, I 
> would need Y to execute all of the steps in X if X hadn’t already been run 
> before Y. This gets us into the territory of Makefile-like dependencies, 
> which seems even more complicated than simply specifying a compilation order.
>
> You might ask why X and Y are split into two separate vignettes. The use of 
> different vignettes is motivated by the complexity of the workflows:
>
> - Vignette 1 demonstrates core processing steps for one read-based 
> single-cell RNAseq dataset.
> - Vignette 2 demonstrates (slightly different) core steps for a UMI-based 
> dataset.
> - … so on for a bunch of other core steps for different types of data.
> - Vignette 6 demonstrates extra optional steps for the two SCEs produced by 
> vignettes 1 & 3.
> - … and so on for a bunch of other optional steps.
>
> The separation between core and optional steps into separate documents is 
> desirable. From a pedagogical perspective, I would very much like to get the 
> reader through all the core steps before even considering the extra steps, 
> which would just be confusing if presented so early on. Previously, 
> everything was in a single document, which was difficult to read (for users) 
> and to debug (for me), especially because I had to use contrived variable 
> names to avoid clashes between different sections of the workflow that did 
> similar things.
>
> @Martin I’ve been using BiocFileCache for all of the online resources that 
> are used in the workflow. However, this is only for my (and the reader’s) 
> convenience. I use a local cache rather than the system default, to ensure 
> that the downloaded files are removed after package build. This is 
> intentional as it forces the package builder to try to re-download resources 
> when compiling the vignette, thus ensuring the validity of the URLs. For a 
> similar reason, I would prefer not to cache the result objects for use in 
> different R sessions. I could imagine caching the result objects for use by a 
> different vignette in the same build session, but this gets back to the 
> problem of ensuring that the result object is generated by one vignette 
> before it is needed by another vignette.
>
> -A
>
>> On 18 Dec 2018, at 14:14, Martin Morgan <mtmorgan.b...@gmail.com> wrote:
>>
>> Also perhaps using BiocFileCache so that the result object is only generated 
>> once, then cached for future (different session) use.
>>
>> On 12/18/18, 8:35 AM, "Bioc-devel on behalf of Michael Lawrence" 
>> <bioc-devel-boun...@r-project.org on behalf of lawrence.mich...@gene.com> 
>> wrote:
>>
>>     I would recommend against dependencies across vignettes. Ideally someone
>>     can pick up a vignette and execute the code independently of any other
>>     documentation. Perhaps you could move the code generating those shared
>>     resources to the package. They could behave lazily, only generating the
>>     resource if necessary, otherwise reusing it. That would also make it easy
>>     for people to write their own documents using those resources.
>>
>>     Michael
>>
>>     On Tue, Dec 18, 2018 at 5:22 AM Aaron Lun <
>>     infinite.monkeys.with.keyboa...@gmail.com> wrote:
>>
>>> In a number of my workflow packages (e.g., simpleSingleCell), I rely on a
>>> specific compilation order for my vignettes. This is because some vignettes
>>> set up resources or objects that are to be used by later vignettes.
>>>
>>>  From what I understand, vignettes are compiled in alphanumeric ordering of
>>> their file names. As such, I give my vignettes fairly structured names,
>>> e.g., “work-1-reads.Rmd”, “work-2-umi.Rmd” and so on.
>>>
>>> However, it becomes rather annoying when I want to add a new vignette in
>>> the middle somewhere. This results in some unnatural numberings, e.g.,
>>> “work-0”, “3b”, which are ugly and unintuitive. This is relevant as
>>> BiocStyle::Biocpkg() links between vignettes require you to use the
>>> destination vignette’s file name; so difficult names complicate linking,
>>> especially if the names continually change to reflect new orderings.
>>>
>>> Is there an easier way to control vignette compilation order? WRE provides
>>> no (obvious) guidance, so I would like to know what non-standard hacks are
>>> known to work on the build machines. I can imagine something dirty whereby
>>> one ”reference” vignette contains code to “rmarkdown::render" all other
>>> vignettes in the specified order… ugh.
>>>
>>> -A
>>>
>>> _______________________________________________
>>> Bioc-devel@r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=
>>>
>>>
>>      [[alternative HTML version deleted]]
>>
>>     _______________________________________________
>>     Bioc-devel@r-project.org mailing list
>>     
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=
>>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=HK4GXFTI9jQmnGIwuZCIG3W6Mv_gfilqE0XppSWaO2I&s=AFfRo761pnzXCPY6EnVmNDZZ_Qg7oN8anptEHNVL4l0&e=

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Controlling vignette compilation order

Reply via email to