Thank you for the suggestions, Hervé. Indeed, the best thing to do is to document everything. I was considering using {basilisk} or {herper} to keep a conda environment for functions that depend on external software, but I think they are made for Python code only via reticulate.
Is there a fast way to see all software installed in the Bioc build system? Best, ========================= Fabrício de Almeida Silva Undergraduate degree in Biological Sciences (UENF) MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil) Laboratório de Química e Função de Proteínas e Peptídeos (LQFPP/CBB/UENF - RJ/Brazil) Personal website: https://almeidasilvaf.github.io ________________________________ De: Hervé Pagès <hpages.on.git...@gmail.com> Enviado: segunda-feira, 23 de agosto de 2021 18:53 Para: Fabricio de Almeida <fabricio_almeidasi...@hotmail.com>; bioc-devel@r-project.org <bioc-devel@r-project.org> Assunto: Re: [Bioc-devel] External dependencies and reproducibility in all platforms On 23/08/2021 16:35, Fabricio de Almeida wrote: > Hi, Hervé. > > > Thank you for making this clear to me. I will try to think of an optimal > solution for this. The issue here is that my package works as the > pipeline itself, similarly to how ORFik works. > > Out of curiosity, I just checked how ORFik and KnowSeq handle this > situation: > > * for STAR, for instance, ORFik simply comments the function that runs > STAR in @examples > (https://github.com/Roleren/ORFik/blob/master/R/STAR.R > <https://github.com/Roleren/ORFik/blob/master/R/STAR.R>). Quite a > hacky solution to avoid the overuse of \donttest{}. > * KnowSeq includes a function to download all external software > > (https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R > > <https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R>), > and it includes \donttest{} in some functions. > > > I will see if I can include \donttest{} in as many functions with > external dependencies as I can and add some other dependencies in > SystemRequirements to satisfy the 80% testable code in @examples. We discourage this approach because it generally hurts reproducibility and reliability of the software. It's unfortunate that other packages are doing this. A better approach is to make sure that all the steps in your pipeline are automatically tested on a regular basis, even if that means that we must install more things on the build machines. As long as these things are easy to install (e.g. a simple 'apt-get install mafft' on Ubuntu) we should be fine. Things might be a little bit more complicated on other platforms, in which case you may need to consider disabling some examples and/or tests on these platforms. But that should be the last resort. Hope this makes sense. Thanks, H. > > > Best, > > /=========================/ > > / > / > > /Fabrício de Almeida Silva/ > > /Undergraduate degree in Biological Sciences (UENF)/ > > /MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)/ > > /Laboratório de Química e Função de Proteínas e Peptídeos > (LQFPP/CBB/UENF - RJ/Brazil)/ > > /Personal website: /https://almeidasilvaf.github.io > > > ------------------------------------------------------------------------ > *De:* Hervé Pagès <hpages.on.git...@gmail.com> > *Enviado:* segunda-feira, 23 de agosto de 2021 16:57 > *Para:* Fabricio de Almeida <fabricio_almeidasi...@hotmail.com>; > bioc-devel@r-project.org <bioc-devel@r-project.org> > *Assunto:* Re: [Bioc-devel] External dependencies and reproducibility in > all platforms > Hi Fabricio, > > If your package requires external software/libraries/tools in order to > pass 'R CMD build' and 'R CMD check', then please list them in the > SystemRequirements field of your DESCRIPTION file. In addition, we > kindly ask you to provide an INSTALL file in the top-level folder of > your package source tree that documents how to install these external > deps on all the supported platforms. > > BTW I'm not sure that KnowSeq or ORFik have external system > requirements. I don't see that they have a SystemRequirements field. > Only openPrimeR has one but it's not clear to me that the package > actually needs all the things listed there e.g. for example MAFFT is > listed but we don't have it on the build machines. > > FWIW most packages avoid having to depend on external tools like > SRAtoolkit, STAR or salmon by assuming that this step of the analysis > was already taken care of, and by focusing on the downstream analysis. > These packages often include the output of the upstream analysis as a > small dataset and start from there. > > Hope this helps, > > Best, > H. > > > On 23/08/2021 07:10, Fabricio de Almeida wrote: >> Dear Bioc developers, >> >> I am writing a package that contains external dependencies, and I'd like to >> know what are the best practices to submit this kind of package to >> Bioconductor. >> >> The external dependencies are standard RNA-seq analysis algorithms, such as >> SRAtoolkit, STAR and salmon. I have seen other Bioc packages with external >> dependencies, such as KnowSeq >> (https://bioconductor.org/packages/release/bioc/html/KnowSeq.html > <https://bioconductor.org/packages/release/bioc/html/KnowSeq.html>), > ORFik > (https://www.bioconductor.org/packages/release/bioc/html/ORFik.html > <https://www.bioconductor.org/packages/release/bioc/html/ORFik.html>), > and openPrimeR > (https://bioconductor.org/packages/release/bioc/html/openPrimeR.html > <https://bioconductor.org/packages/release/bioc/html/openPrimeR.html>), > but it is not clear how they handle the dependencies in the Bioconductor > build system. >> >> I have a conda environment containing all the dependencies + R 4.1.0, which >> works fine. However, conda is not the best option, as some dependencies may >> not exist in all OS, particularly in Windows. >> >> Perhaps a Docker container with the dependencies in an Ubuntu OS would >> ensure reproducibility in all platforms, but what should I do for the >> package to pass all checks in the Bioc build system? >> >> Any help is appreciated. >> >> Best, >> >> >> ========================= >> >> >> Fabr�cio de Almeida Silva >> >> Undergraduate degree in Biological Sciences (UENF) >> >> MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil) >> >> Laborat�rio de Qu�mica e Fun��o de Prote�nas e Pept�deos (LQFPP/CBB/UENF - >> RJ/Brazil) >> >> Personal website: https://almeidasilvaf.github.io >> <https://almeidasilvaf.github.io> >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > > -- > Hervé Pagès > > Bioconductor Core Team > hpages.on.git...@gmail.com -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel