On 23/08/2021 16:35, Fabricio de Almeida wrote:
Hi, Hervé.
Thank you for making this clear to me. I will try to think of an optimal
solution for this. The issue here is that my package works as the
pipeline itself, similarly to how ORFik works.
Out of curiosity, I just checked how ORFik and KnowSeq handle this
situation:
* for STAR, for instance, ORFik simply comments the function that runs
STAR in @examples
(https://github.com/Roleren/ORFik/blob/master/R/STAR.R
<https://github.com/Roleren/ORFik/blob/master/R/STAR.R>). Quite a
hacky solution to avoid the overuse of \donttest{}.
* KnowSeq includes a function to download all external software
(https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R
<https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R>),
and it includes \donttest{} in some functions.
I will see if I can include \donttest{} in as many functions with
external dependencies as I can and add some other dependencies in
SystemRequirements to satisfy the 80% testable code in @examples.
We discourage this approach because it generally hurts reproducibility
and reliability of the software. It's unfortunate that other packages
are doing this.
A better approach is to make sure that all the steps in your pipeline
are automatically tested on a regular basis, even if that means that we
must install more things on the build machines. As long as these things
are easy to install (e.g. a simple 'apt-get install mafft' on Ubuntu) we
should be fine. Things might be a little bit more complicated on other
platforms, in which case you may need to consider disabling some
examples and/or tests on these platforms. But that should be the last
resort.
Hope this makes sense.
Thanks,
H.
Best,
/=========================/
/
/
/Fabrício de Almeida Silva/
/Undergraduate degree in Biological Sciences (UENF)/
/MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)/
/Laboratório de Química e Função de Proteínas e Peptídeos
(LQFPP/CBB/UENF - RJ/Brazil)/
/Personal website: /https://almeidasilvaf.github.io
------------------------------------------------------------------------
*De:* Hervé Pagès <hpages.on.git...@gmail.com>
*Enviado:* segunda-feira, 23 de agosto de 2021 16:57
*Para:* Fabricio de Almeida <fabricio_almeidasi...@hotmail.com>;
bioc-devel@r-project.org <bioc-devel@r-project.org>
*Assunto:* Re: [Bioc-devel] External dependencies and reproducibility in
all platforms
Hi Fabricio,
If your package requires external software/libraries/tools in order to
pass 'R CMD build' and 'R CMD check', then please list them in the
SystemRequirements field of your DESCRIPTION file. In addition, we
kindly ask you to provide an INSTALL file in the top-level folder of
your package source tree that documents how to install these external
deps on all the supported platforms.
BTW I'm not sure that KnowSeq or ORFik have external system
requirements. I don't see that they have a SystemRequirements field.
Only openPrimeR has one but it's not clear to me that the package
actually needs all the things listed there e.g. for example MAFFT is
listed but we don't have it on the build machines.
FWIW most packages avoid having to depend on external tools like
SRAtoolkit, STAR or salmon by assuming that this step of the analysis
was already taken care of, and by focusing on the downstream analysis.
These packages often include the output of the upstream analysis as a
small dataset and start from there.
Hope this helps,
Best,
H.
On 23/08/2021 07:10, Fabricio de Almeida wrote:
Dear Bioc developers,
I am writing a package that contains external dependencies, and I'd like to
know what are the best practices to submit this kind of package to Bioconductor.
The external dependencies are standard RNA-seq analysis algorithms, such as SRAtoolkit, STAR and salmon. I have seen other Bioc packages with external dependencies, such as KnowSeq (https://bioconductor.org/packages/release/bioc/html/KnowSeq.html
<https://bioconductor.org/packages/release/bioc/html/KnowSeq.html>),
ORFik
(https://www.bioconductor.org/packages/release/bioc/html/ORFik.html
<https://www.bioconductor.org/packages/release/bioc/html/ORFik.html>),
and openPrimeR
(https://bioconductor.org/packages/release/bioc/html/openPrimeR.html
<https://bioconductor.org/packages/release/bioc/html/openPrimeR.html>),
but it is not clear how they handle the dependencies in the Bioconductor
build system.
I have a conda environment containing all the dependencies + R 4.1.0, which
works fine. However, conda is not the best option, as some dependencies may not
exist in all OS, particularly in Windows.
Perhaps a Docker container with the dependencies in an Ubuntu OS would ensure
reproducibility in all platforms, but what should I do for the package to pass
all checks in the Bioc build system?
Any help is appreciated.
Best,
=========================
Fabr�cio de Almeida Silva
Undergraduate degree in Biological Sciences (UENF)
MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)
Laborat�rio de Qu�mica e Fun��o de Prote�nas e Pept�deos (LQFPP/CBB/UENF -
RJ/Brazil)
Personal website: https://almeidasilvaf.github.io
<https://almeidasilvaf.github.io>
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
--
Hervé Pagès
Bioconductor Core Team
hpages.on.git...@gmail.com
--
Hervé Pagès
Bioconductor Core Team
hpages.on.git...@gmail.com
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel