Personally, I wouldn't trust remotes to get the Bioconductor repositories, and 
hence the dependency graph, correct. I say this mostly because of problems to 
get BiocManager to get the repositories right during each phase of the 
Biocoonductor release cycle, not to diss the remotes package.

I'd grab the immediate dependencies ONLY of the new package from the 
DESCRIPTION file (does remotes have a function to do this? I'd trust it to do a 
better job than my hack, but I'd double check it)

  dcf = read.dcf("DESCRIPTION", c("Depends", "Imports", "LinkingTo", 
"Enhances", "Suggests"))
  deps = unlist(strsplit(dcf, ",[[:space:]]*"))
  deps = sub(" .*", "", deps)                    # no version info
  deps = setdiff(deps, c("R", NA, rownames(installed.packages(priority = 
"high"))))

I'd then do the builds with

  BiocManager::install(deps)

If that failed and I wanted to 'peel back' a layer of responsibility to get 
closer to a minimal reproducible example, I'd do

  install.packages(deps, repos = BiocManager::repositories())

I believe that (maybe you can confirm?) this fails for your case. 

BiocManager::repositories() is just a named character vector -- in devel it is 
currently

  > dput(BiocManager::repositories())
  c(BioCsoft = "https://bioconductor.org/packages/3.11/bioc";,
    BioCann =   "https://bioconductor.org/packages/3.11/data/annotation";,
    BioCexp = "https://bioconductor.org/packages/3.11/data/experiment";,
    BioCworkflows = "https://bioconductor.org/packages/3.11/workflows";,
    CRAN = "https://cran.rstudio.com";)

So I conclude that the problem is actually IN BASE R, and that the fix is in 
the incredibly complicated logic of install.packages.

I guess I would further simplify the problem by eliminating from 'deps' the 
packages that do not contribute to the problem. So I wonder what a minimal 
'deps' looks like? This would be much more helpful for understanding the 
problem than many 1000's of lines of output from CI.

This might help to come up with a simple example to demonstrate Charlotte's 
conclusion that the source packages are installed in the wrong order. And that 
might also lead to a difference between parallel (options(Ncpus = 8), for 
example) versus serial (options(Ncpus = NULL)) builds...

Thanks for your exhaustive work on this!

Martin

On 4/27/20, 2:03 AM, "Leonardo Collado Torres" <lcollado...@gmail.com> wrote:

    EDIT:  I found a general solution! (workaround?) I had written a
    response, but I had an idea, tested it and a few hours later I'm
    finishing this email. It does work... although not exactly as I
    intended it to.



    ---

    Thanks Martin for looking into this =)

    I'll respond to your question about making things complicated for myself.


    ## General scenario

    The general scenario is, I have a package say `newpkg`. `newpkg` has
    dependencies (imports, suggests and/or depends) on Bioconductor
    packages. I want to test that `newpkg` passes R CMD build, check &
    BiocCheck. To do so, we need all the dependencies of `newpkg`
    available including the "suggests" ones.

    If `newpkg` was already available from Bioconductor, I could install
    it using BiocManager::install("newpkg"). But that's not necessarily
    the case.**

    One could install the dependencies for `newpkg` manually, using
    BiocManager::install(), remotes, and/or install.packages(). But then,
    you need to adapt the code again for `newerpkg`, `oldpkg`, etc.

    Currently, either through remotes::install_deps() or through
    remotes::dev_package_deps(dependencies = TRUE) directly (the first
    calls the second
    
https://github.com/r-lib/remotes/blob/5b3da5f852772159cc2a580888e10536a43e9199/R/install.R#L193)
    Charlotte and I are getting the list of packages that `newpkg` depends
    on, then either installing them through remotes or BiocManager. This
    is failing for both of us, though in theory (as far as I know) either
    should work. Is this something that could be fixed? I don't know.++


    ## GitHub Actions

    Ultimately in my case, I'm trying to build a GitHub Actions workflow
    that will work for any package with Bioconductor dependencies. I'm
    nearly there, it's just this last issue about the source-only BioC
    packages (annotation, experiment, workflow). I've been doing this
    since last week and through this process I discovered some issues with
    my own packages that were masked in the Bioconductor build machines.
    Many other packages are already installed in the Bioconductor build
    machines and on my laptop, so I hadn't noticed some missing "suggests"
    dependencies on some of my packages. For example
    
https://github.com/leekgroup/recount/commit/f3bdb77d789f1a8364f16b71bd344fd23ecbfda5.


    ## Some possibilities to explore

    Maybe what we need is some other code to process the DESCRIPTION file
    of `newpkg`, extract the list of packages explicitly mentioned on
    DESCRIPTION (removing those that are base packages, say it's 10
    packages), then just install those direct dependencies (the 10
    packages) instead of all the packages listed in the DESCRIPTION and
    their dependencies (what you can get from remotes::dev_package_deps(),
    say 100 packages) and pass this smaller list of direct dependencies to
    BiocManager::install(). However, I suspect that it won't work either,
    because again, I'm expecting (maybe incorrectly) that
    BiocManager::install() figures out the right order in which to install
    either the short or long list of packages and this is currently
    failing for the long list.

    Another option might involve figuring out from the full list of
    dependencies (remotes::dev_package_deps(dependencies = TRUE) ), which
    ones are available only through source (maybe those available only
    through repos BioCann, BioCexp, BioCworkflows from
    BiocManager::repositories() ) and install those first, then install
    the remaining packages that exist in the BioCsoft and CRAN
    repositories. Maybe something like:

    ## This doesn't work since BiocManager::install() doesn't allow using
    the `repos` argument
    deps <- remotes::dev_package_deps(dependencies = TRUE)
    BiocManager::install(deps$package[deps$diff != 0], repos =
    BiocManager::repositories()[c('BioCann', 'BioCexp', 'BioCworkflows')]
    )
    BiocManager::install(deps$package[deps$diff != 0])

    ## This also doesn't work since all CRAN deps are missing at this point
    remotes::install_deps( repos =
    BiocManager::repositories()[c('BioCann', 'BioCexp', 'BioCworkflows')]
    )
    remotes::install_deps()


    ## But the above lead me a solution at
    
https://github.com/leekgroup/derfinderPlot/blob/8695cbee49a01d1d297042232a1593e6c94f1b41/.github/workflows/check-bioc.yml#L139-L165.
    That is, install packages in waves: first the CRAN ones, then the BioC
    source-only ones, then the BioC software ones. Doing the installation
    in this order worked for several of my packages (as many as I could
    test tonight).


    message(paste('****', Sys.time(), 'installing BiocManager ****'))
    remotes::install_cran("BiocManager")

    message(paste('****', Sys.time(), 'installing CRAN dependencies ****'))
    remotes::install_deps(repos = BiocManager::repositories()['CRAN'])

    message(paste('****', Sys.time(), 'installing BioC source-only
    dependencies ****'))
    remotes::install_deps(repos = BiocManager::repositories()[c('BioCann',
    'BioCexp', 'BioCworkflows')])

    message(paste('****', Sys.time(), 'installing remaining BioC
    dependencies ****'))
    deps <- remotes::dev_package_deps(dependencies = TRUE, repos =
    BiocManager::repositories())
    BiocManager::install(deps$package[deps$diff != 0])


    I added those messages so I could find these steps on the logs more
    easily and it works for Bioconductor's devel docker, macOS and Windows
    using R 4.0 and BioC 3.11.

    Here are the links to one log file (Windows):

    1. BiocManager:
    
https://github.com/leekgroup/derfinderPlot/runs/621120165?check_suite_focus=true#step:12:40
    2. CRAN deps: 
https://github.com/leekgroup/derfinderPlot/runs/621120165?check_suite_focus=true#step:12:43
    (though hm... it does install many BioC ones, not sure why)
    3. The BioC source-only deps:
    
https://github.com/leekgroup/derfinderPlot/runs/621120165?check_suite_focus=true#step:12:1219
    (hm... doesn't install anything)
    4. BioC remaining deps:
    
https://github.com/leekgroup/derfinderPlot/runs/621120165?check_suite_focus=true#step:12:1222
    This is where TxDb.Hsapiens.UCSC.hg19.knownGene gets installed;
    GenomeInfoDbData and tibble are available for GenomicFeatures at this
    point, so no errors pop up. This step also installs a few other CRAN
    deps which I'm not sure why they didn't install before.


    Best,
    Leo

    ** Even if it was, you might not want to actually install the package
    `newpkg` from Bioconductor/CRAN since you likely want to test the very
    latest version of `newpkg` and avoid any false negative errors where
    everything seems to work, but your code is really just checking the
    latest release version (bioc-release or bioc-devel for BioC packages)
    instead of your development version.

    ++ Maybe it could be fixed by adding a explicit dependency on
    GenomicFeatures to both GenomeInfoDbData and tibble, though I'm not
    sure. But it seems like fixing the order in which packages are
    installed is the more general problem.







    On Sun, Apr 26, 2020 at 5:53 PM Martin Morgan <mtmorgan.b...@gmail.com> 
wrote:
    >
    > I spent a bit of time not understanding why you were being so complicated 
-- BiocManager::install() finds all CRAN / Bioc dependencies, there's no need 
to use remotes at all and for debugging purposes it just seemed (still seems?) 
like you were making trouble for yourself.
    >
    > But eventually... I created a fake CRAN-style repository
    >
    > $ tree my_repo/
    > my_repo/
    > ├── bin
    > │   └── macosx
    > │       └── contrib
    > │           └── 4.0
    > │               └── PACKAGES
    > └── src
    >     └── contrib
    >         └── PACKAGES
    >
    > The plain-text PACKAGES file is an index of the packages that are 
supposed to be available. So under the 'bin' tree I have
    >
    > ---
    > Package: foo
    > Version: 1.0.0
    > NeedsCompilation: true
    >
    > Package: bar
    > Version: 1.0.0
    > Depends: foo
    >
    >
    > Package: baz
    > Version: 1.0.0
    > Depends: bar
    > ---
    >
    > baz depends on bar depends on foo, and binary versions are all at 1.0.0
    >
    > Under the src tree I have
    >
    > ---
    > Package: foo
    > Version: 1.0.1
    > NeedsCompilation: true
    >
    > Package: bar
    > Version: 1.0.0
    > Depends: foo
    >
    >
    > Package: baz
    > Version: 1.0.0
    > Depends: bar
    > ```
    > with a more recent src for foo at version 1.0.1. I guess this is (almost) 
the situation with GenomeInfoDbData / tibble.
    >
    > In an R session I have
    >
    > > available.packages(repos="file:///tmp/my_repo/")
    >     Package Version Priority Depends Imports LinkingTo Suggests Enhances
    > foo "foo"   "1.0.1" NA       NA      NA      NA        NA       NA
    > bar "bar"   "1.0.0" NA       "foo"   NA      NA        NA       NA
    > baz "baz"   "1.0.0" NA       "bar"   NA      NA        NA       NA
    >     License License_is_FOSS License_restricts_use OS_type Archs MD5sum
    > foo NA      NA              NA                    NA      NA    NA
    > bar NA      NA              NA                    NA      NA    NA
    > baz NA      NA              NA                    NA      NA    NA
    >     NeedsCompilation File Repository
    > foo "true"           NA   "file:///tmp/my_repo/src/contrib"
    > bar NA               NA   "file:///tmp/my_repo/src/contrib"
    > baz NA               NA   "file:///tmp/my_repo/src/contrib"
    >
    > I'll try to 'install' baz; it'll fail because there are no packages to 
install, but it's still informative...
    >
    > > install.packages("baz", repos = "file:///tmp/my_repo")
    > Installing package into '/Users/ma38727/Library/R/4.0/Bioc/3.11/library'
    > (as 'lib' is unspecified)
    > also installing the dependencies 'foo', 'bar'
    >
    >
    >   There is a binary version available but the source version is later:
    >     binary source needs_compilation
    > foo  1.0.0  1.0.1              TRUE
    >
    > Do you want to install from sources the package which needs compilation? 
(Yes/no/cancel) yes
    > Warning in download.packages(pkgs, destdir = tmpd, available = available, 
 :
    >   package 'bar' does not exist on the local repository
    > Warning in download.packages(pkgs, destdir = tmpd, available = available, 
 :
    >   package 'baz' does not exist on the local repository
    > installing the source package 'foo'
    >
    > Warning in download.packages(pkgs, destdir = tmpd, available = available, 
 :
    >   package 'foo' does not exist on the local repository
    >
    > Note the order of downloads -- binaries first, then source as you found! 
(actually, this would 'work' because the binaries are installed without any 
test load, but in more complicated situations...)
    >
    > On the other hand, if I answer 'no' to install the more recent source 
packages I get
    >
    >   There is a binary version available but the source version is later:
    >     binary source needs_compilation
    > foo  1.0.0  1.0.1              TRUE
    > Do you want to install from sources the package which needs compilation? 
(Yes/no/cancel) no
    > Warning in download.packages(pkgs, destdir = tmpd, available = available, 
 :
    >   package 'foo' does not exist on the local repository
    > Warning in download.packages(pkgs, destdir = tmpd, available = available, 
 :
    >   package 'bar' does not exist on the local repository
    > Warning in download.packages(pkgs, destdir = tmpd, available = available, 
 :
    >   package 'baz' does not exist on the local repository
    >
    > installing in the order required for dependencies.
    >
    > If I remove baz from the source repository, I get a similar order of 
events, with an additional prompt about installing 'baz' from source.
    >
    > I don't actually see, from the 'Binary packages' section of 
?install.packages, how to get R to respond 'no' to the prompt to install the 
more recent source package foo, but still  install the source-only package 
'baz'...
    >
    > Of course this is transient, when there more recent source than binaries; 
my own installation of TxDb on macOS found a binary tibble as current as the 
source, and went without problem.
    >
    > Martin
    >
    > On 4/26/20, 4:48 PM, "Leonardo Collado Torres" <lcollado...@gmail.com> 
wrote:
    >
    >     Hi everyone,
    >
    >     Charlotte, thank you very much! I didn't know about that issue on
    >     `remotes` and the fix attempts. Thank you for the info Martin!
    >
    >     However, I have to report that it doesn't seem like switching from
    >     remotes::install_deps() to BiocManager::install() fixes the issue. I
    >     updated my GitHub Actions workflow to obtain the list of dependencies
    >     using remotes, but install them with BiocManager::install() instead of
    >     remotes::install_deps(). You can see this at
    >     
https://github.com/leekgroup/derfinderPlot/blob/ea58939ac6bf13cae7d26951732914d96b5f7d07/.github/workflows/check-bioc.yml#L139-L149
    >     although I include the relevant lines of code below:
    >
    >     ## Locate the package dependencies
    >     deps <- remotes::dev_package_deps(dependencies = TRUE)
    >
    >     ## Install any that need to be updated using BiocManager to avoid
    >     ## the issues described at
    >     ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016675.html
    >     ## https://github.com/r-lib/remotes/issues/296
    >     remotes::install_cran("BiocManager")
    >     BiocManager::install(deps$package[deps$diff != 0])
    >
    >
    >     This still leads to TxDb.Hsapiens.UCSC.hg19.knownGene failing to
    >     install because GenomeInfoDbData is not available on both macOS and
    >     Windows (again, this doesn't fail on the Bioconductor devel docker).
    >     Here's for example the error on Windows
    >     
https://github.com/leekgroup/derfinderPlot/runs/620055131?check_suite_focus=true#step:12:1077.
    >     Immediately after, GenomeInfoDbData does get installed
    >     
https://github.com/leekgroup/derfinderPlot/runs/620055131?check_suite_focus=true#step:12:1100
    >     and after it, tibble
    >     
https://github.com/leekgroup/derfinderPlot/runs/620055131?check_suite_focus=true#step:12:1174.
    >
    >     Likely this issue only happens on Windows and macOS because of the
    >     availability of some packages in source form and others in binary
    >     form, unlike only using source versions in the Bioconductor docker
    >     run. However, maybe I need some other code to get all the
    >     dependencies of a given package in a different order, though I was
    >     hoping that BiocManager::install() would find the right order for me
    >     as it seems to try to do so already.
    >
    >     Charlotte linked to
    >     
https://github.com/r-lib/remotes/commit/88f302fe53864e4f27fc7b3897718fea9a8b1fa9.
    >     So maybe there's still something else to try to fix in remotes and/or
    >     BiocManager instead of the DESCRIPTION files of other packages like I
    >     initially thought of in this thread and in
    >     https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016671.html.
    >
    >     Best,
    >     Leo
    >
    >
    >
    >     On Sun, Apr 26, 2020 at 10:30 AM Martin Morgan 
<mtmorgan.b...@gmail.com> wrote:
    >     >
    >     > Thanks Charlotte for the detective work.
    >     >
    >     >
    >     >
    >     > Annotation packages (TxDb, org, BSgenome, and GenomeInfoDbData, for 
instance) are distributed only as source – this was a decision made quite a 
while (years) ago, to save disk space (some of these packages are large, and 
hosting macOS and Windows binaries in addition to source triple disk space 
requirements) and on the rationale that the packages do not have C-level source 
code so users do not need RTools or XCode (etc) to install from ‘source’. So in 
this context and in the face of a buggy remotes package, and installation of 
Bioconductor packages through non-standard approaches (BiocManager::install() 
for CRAN and Bioconductor packages and their dependencies use base R commands 
only) I guess the behavior you document is really an (ongoing?) bug in the 
remotes package?
    >     >
    >     >
    >     >
    >     > Over the years the distribution of source-only annotation packages 
has caused problems, in particular when (usually Windows) users have temporary 
or library paths with spaces or non-ASCII characters. I believe that this 
upstream bug (in R’s handling of Windows paths) has been fixed in the 4.0.0 
release, but the details are quite complicated and I have not been able to 
follow the discussion fully.
    >     >
    >     >
    >     >
    >     > Martin
    >     >
    >     >
    >     >
    >     > From: Charlotte Soneson <charlottesone...@gmail.com>
    >     > Date: Sunday, April 26, 2020 at 5:32 AM
    >     > To: Martin Morgan <mtmorgan.b...@gmail.com>
    >     > Cc: Leonardo Collado Torres <lcollado...@gmail.com>, Bioc-devel 
<bioc-devel@r-project.org>
    >     > Subject: Re: [Bioc-devel] GenomicFeatures and/or 
TxDb.Hsapiens.UCSC.hg19.knownGene issue: missing tibble
    >     >
    >     >
    >     >
    >     > Hi Leo, Martin,
    >     >
    >     >
    >     >
    >     > it looks like this is related to an issue with the remotes package: 
https://github.com/r-lib/remotes/issues/296. It gets the installation order 
wrong, and tries to install source packages before binaries. This can be a 
problem with GenomeInfoDbData (which I think doesn’t have a binary, and which 
it looks like Leo is installing manually). The TxDb package also doesn’t seem 
to be available as a binary package, and currently the source package for 
tibble is newer than the Windows binary.
    >     >
    >     >
    >     >
    >     > According to the issue above, it should have been fixed in remotes 
v2.1.1 
(https://github.com/r-lib/remotes/commit/88f302fe53864e4f27fc7b3897718fea9a8b1fa9).
 To try things out, I set up a minimal package with the only dependency being 
TxDb.Hsapiens.UCSC.hg19.knownGene (https://github.com/csoneson/testpkg), and 
checked it with GitHub Actions on macOS and Windows. It fails in both cases, 
since it’s trying to install TxDb.Hsapiens.UCSC.hg19.knownGene first (e.g. 
https://github.com/csoneson/testpkg/runs/619407291?check_suite_focus=true#step:7:533).
 If I depend instead on GenomicFeatures, everything builds fine (here we have a 
binary). It is using remotes v2.1.1 though, so perhaps this needs to be 
investigated further.
    >     >
    >     >
    >     >
    >     > Charlotte
    >     >
    >     >
    >     >
    >     > On 25 Apr 2020, at 22:20, Martin Morgan <mtmorgan.b...@gmail.com> 
wrote:
    >     >
    >     >
    >     >
    >     > tibble is not a direct dependency of TxDb*.
    >     >
    >     >
    >     > db = available.packages(repos = BiocManager::repositories())
    >     > deps = 
tools::package_dependencies("TxDb.Hsapiens.UCSC.hg19.knownGene", db)
    >     > deps
    >     >
    >     > $TxDb.Hsapiens.UCSC.hg19.knownGene
    >     > [1] "GenomicFeatures" "AnnotationDbi"
    >     >
    >     > but it is an indirect dependency
    >     >
    >     >
    >     > deps = 
tools::package_dependencies("TxDb.Hsapiens.UCSC.hg19.knownGene", db, 
recursive=TRUE)
    >     > "tibble" %in% unlist(deps)
    >     >
    >     > [1] TRUE
    >     >
    >     > I did
    >     >
    >     >  deps1 = 
tools::package_dependencies("TxDb.Hsapiens.UCSC.hg19.knownGene", db, 
recursive=TRUE)
    >     >
    >     >  deps2 = tools::package_dependencies("tibble", db, recursive=TRUE, 
reverse=TRUE)
    >     >
    >     >  intersect(unlist(deps1), unlist(deps2))
    >     >  ## [1] "GenomicFeatures" "biomaRt"         "BiocFileCache"   
"dbplyr"
    >     >  ## [5] "dplyr"
    >     >
    >     > I believe R checks for immediate dependencies, found all for TxDb* 
and GenomicFeatures available, and didn’t check further. I speculate that you 
removed tibble, or installed one of the packages in the above list, without 
satisfying the dependencies for that package. Or perhaps what the message is 
really trying to say is that it failed to load tibble (because it was installed 
in a previous version of the R toolchain?)
    >     >
    >     > It would be interesting to debug this further on your system, to 
understand the problem for other users.
    >     >
    >     > Martin
    >     >
    >     > On 4/25/20, 2:48 PM, "Bioc-devel on behalf of Leonardo Collado 
Torres" <bioc-devel-boun...@r-project.org on behalf of lcollado...@gmail.com> 
wrote:
    >     >
    >     >    Hi Bioc-devel,
    >     >
    >     >    I think that there's a potential issue with either 
GenomicFeatures,
    >     >    TxDb.Hsapiens.UCSC.hg19.knownGene or an upstream package.
    >     >
    >     >
    >     >    On a fresh R 4.0 Windows installation with BioC 3.11, I get the
    >     >    following error message when installing
    >     >    TxDb.Hsapiens.UCSC.hg19.knownGene as shown at
    >     >    
https://github.com/leekgroup/derfinderPlot/runs/618370463?check_suite_focus=true#step:13:1225.
    >     >
    >     >
    >     >    2020-04-25T18:32:26.0765748Z * installing *source* package
    >     >    'TxDb.Hsapiens.UCSC.hg19.knownGene' ...
    >     >    2020-04-25T18:32:26.0769789Z ** using staged installation
    >     >    2020-04-25T18:32:26.1001400Z ** R
    >     >    2020-04-25T18:32:26.1044734Z ** inst
    >     >    2020-04-25T18:32:26.2061605Z ** byte-compile and prepare package 
for
    >     >    lazy loading
    >     >    2020-04-25T18:32:30.7296724Z ##[error]Error: package or 
namespace load
    >     >    failed for 'GenomicFeatures' in loadNamespace(i, c(lib.loc,
    >     >    .libPaths()), versionCheck = vI[[i]]):
    >     >    2020-04-25T18:32:30.7305615Z ERROR: lazy loading failed for 
package
    >     >    'TxDb.Hsapiens.UCSC.hg19.knownGene'
    >     >    2020-04-25T18:32:30.7306686Z * removing
    >     >    'D:/a/_temp/Library/TxDb.Hsapiens.UCSC.hg19.knownGene'
    >     >    2020-04-25T18:32:30.7307196Z  there is no package called 'tibble'
    >     >    2020-04-25T18:32:30.7310561Z ##[error]Error: package 
'GenomicFeatures'
    >     >    could not be loaded
    >     >    2020-04-25T18:32:30.7311805Z Execution halted
    >     >
    >     >    From looking at the bioc-devel landing pages for both 
GenomicFeatures
    >     >    and TxDb.Hsapiens.UCSC.hg19.knownGene, I see that tibble is not 
listed
    >     >    as a dependency for either package.
    >     >
    >     >    Best,
    >     >    Leo
    >     >
    >     >    _______________________________________________
    >     >    Bioc-devel@r-project.org mailing list
    >     >    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >     > _______________________________________________
    >     > Bioc-devel@r-project.org mailing list
    >     > https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >     >
    >     >
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to