On Sat, Feb 8, 2020 at 12:02 PM Martin Morgan <mtmorgan.b...@gmail.com>
wrote:
I find it quite interesting to identify formal strategies for removing
dependencies, but also a little outside my domain of expertise. This code
It would be nice to collect the ideas in this thread into some
recommendations. The themes I am thinking of
are "how developers can make their packages robust to loss of external
packages" and "how can the
Bioc ecosystem best deal with departures of packages from itself and from
CRAN?" A good and well-adopted
solution to the first one makes the second one moot.
Two CRAN-related events I know of that required some effort are (temporary)
loss of ashr and (recently)
archiving of Seurat.
library(tools)
library(dplyr)
## non-base packages the user requires for GenomicScores
deps <- package_dependencies("GenomicScores", db, recursive=TRUE)[[1]]
deps <- intersect(deps, rownames(db))
## only need the 'universe' of GenomicScores dependencies
db1 <- db[c("GenomicScores", deps),]
## sub-graph of packages between each dependency and GenomicScores
revdeps <- package_dependencies(deps, db1, recursive = TRUE, reverse =
TRUE)
tibble(
package = names(olap),
n_remove = lengths(revdeps),
) %>%
arrange(n_remove)
produces a tibble
# A tibble: 106 x 2
package n_remove
<chr> <int>
1 BSgenome 1
2 AnnotationHub 1
3 shinyjs 1
4 DT 1
5 shinycustomloader 1
6 data.table 1
7 shinythemes 1
8 rtracklayer 2
9 BiocFileCache 2
10 BiocManager 2
# … with 96 more rows
shows me, via n_remove, that I can remove the dependency on AnnotationHub
by removing the dependency on just one package (AnnotationHub!), but to
remove BiocFileCache I'd also have to remove another package
(AnnotationHub, I'd guess). So this provides some measure of the ease
with
which a package can be removed.
I'd like a 'benefit' column, too -- if I were to remove AnnotationHub,
how
many additional packages would I also be able to remove, because they are
present only to satisfy the dependency on AnnotationHub? More generally,
perhaps there is a dependency of AnnotationHub that is only used by
AnnotationHub and BSgenome. So removing AnnotationHub as a dependency
would
make it easier to remove BSgenome, etc. I guess this is a graph
optimization problem.
Probably also worth mentioning the itdepends package (
https://github.com/r-lib/itdepends), which I think tries primarily to
determine the relationship between package dependencies and lines of
code,
which seems like complementary information.
Martin
On 2/6/20, 12:29 PM, "Robert Castelo" <robert.cast...@upf.edu> wrote:
true, i was just searching for the shortest path, we can search for
all
simple (i.e., without repeating "vertices") paths and there are up to
five routes from "GenomicScores" to "Matrix"
igraph::all_simple_paths(igraph::igraph.from.graphNEL(g),
from="GenomicScores", to="Matrix", mode="out")
[[1]]
+ 7/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment DelayedArray
[7] Matrix
[[2]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores BSgenome rtracklayer
[4] GenomicAlignments SummarizedExperiment Matrix
[[3]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores DT crosstalk ggplot2 mgcv
[6] Matrix
[[4]]
+ 6/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment DelayedArray Matrix
[[5]]
+ 5/117 vertices, named, from 04133ec:
[1] GenomicScores rtracklayer GenomicAlignments
[4] SummarizedExperiment Matrix
this is interesting, because it means that if i wanted to get rid of
the
"Matrix" dependence i'd need to get rid not only of the "rtracklayer"
dependence but also of "BSgenome" and "DT".
robert.
On 2/6/20 5:41 PM, Martin Morgan wrote:
> Excellent! I think there are other, independent, paths between your
immediate dependents...
>
> RBGL::sp.between(g, start="DT", finish="Matrix",
detail=TRUE)[[1]]$path_detail
> [1] "DT" "crosstalk" "ggplot2" "mgcv" "Matrix"
>
> ??
>
> Martin
>
> On 2/6/20, 10:47 AM, "Robert Castelo" <robert.cast...@upf.edu>
wrote:
>
> hi Martin,
>
> thanks for hint!! i wasn't aware of
'tools::package_dependencies()',
> adding a bit of graph sorcery i get the result i was looking
for:
>
> repos <- BiocManager::repositories()[c(1,5)]
> repos
> BioCsoft
> "https://bioconductor.org/packages/3.11/bioc"
> CRAN
> "https://cran.rstudio.com"
>
> db <- available.packages(repos=repos)
>
> deps <- tools::package_dependencies("GenomicScores", db,
> recursive=TRUE)[[1]]
>
> deps <- tools::package_dependencies(c("GenomicScores", deps),
db)
>
> g <- graph::graphNEL(nodes=names(deps), edgeL=deps,
edgemode="directed")
>
> RBGL::sp.between(g, start="GenomicScores", finish="Matrix",
> detail=TRUE)[[1]]$path_detail
> [1] "GenomicScores" "rtracklayer"
"GenomicAlignments"
> [4] "SummarizedExperiment" "Matrix"
>
> so, it was the rtracklayer dependency that leads to Matrix
through
> GenomeAlignments and SummarizedExperiment.
>
> maybe the BioC package 'pkgDepTools' should be deprecated if
its
> functionality is part of 'tools' and it does not even work as
fast and
> correct as 'tools'.
>
> cheers,
>
> robert.
>
>
> On 2/6/20 2:51 PM, Martin Morgan wrote:
> > The first thing is to get the correct repositories
> >
> > repos = BiocManager::repositories()
> >
> > (maybe trim the experiment and annotation repos from this).
I
also tried pkgDepTools::makeDepGraph() but it took so long that I moved
on... it has an option 'keep.builtin' which might include Matrix.
> >
> > There is also BiocPkgTools::buildPkgDependencyDataFrame() &
friends, but this seems to build dependencies within a single
repository...
> >
> > The building block for a solution is
`tools::package_dependencies()`, and I can confirm that "Matrix" _is_ a
dependency
> >
> > db = available.packages(repos =
BiocManager::repositories())
> > revdeps <- tools::package_dependencies("GenomicScores",
db, recursive = TRUE)
> > "Matrix" %in% revdeps[[1]]
> > ## [1] TRUE
> >
> > so I'll leave the clever recursive or graph-based algorithm
up to you, to report back to the mailing list?
> >
> > For what it's worth I think the last time this came up
Martin
Maechler pointed to a function in base R (probably the tools package)
that
implements this, too...?
> >
> > Martin Morgan
> >
> > On 2/6/20, 6:40 AM, "Bioc-devel on behalf of Robert
Castelo"
<bioc-devel-boun...@r-project.org on behalf of robert.cast...@upf.edu>
wrote:
> >
> > hi,
> >
> > when i load the package 'GenomicScores' in a clean
session i see thorugh
> > the 'sessionInfo()' that the package 'Matrix' is listed
under "loaded
> > via a namespace (and not attached)".
> >
> > i'd like to know what is the dependency that
'GenomicsScores' has that
> > ends up requiring the package 'Matrix'.
> >
> > i've tried using the package 'pkgDepTools' without
success, because the
> > dependency graph does not list any path from
'GenomicScores' to 'Matrix'.
> >
> > i've been manually browsing the Bioc website and,
unless
i've overlooked
> > something, the only association with 'Matrix' i could
find is that
> > 'S4Vectors' and 'GenomicRanges', which are required by
'GenomicScores',
> > list 'Matrix' in the 'Suggests' field, but my
understanding is that
> > those packages are not required and should not be
loaded.
> >
> > so, is there any way in which i can figure out what of
the
> > 'GenomicScores' dependencies leads to loading the
package 'Matrix'?
> >
> > here are the depends, import and suggests fields from
'GenomicScores':
> >
> > Depends: R (>= 3.5), S4Vectors (>= 0.7.21),
GenomicRanges, methods,
> > BiocGenerics (>= 0.13.8)
> > Imports: utils, XML, Biobase, IRanges (>= 2.3.23),
Biostrings,
> > BSgenome, GenomeInfoDb, AnnotationHub, shiny,
shinyjs,
> > DT, shinycustomloader, rtracklayer, data.table,
shinythemes
> > Suggests: BiocStyle, knitr, rmarkdown,
BSgenome.Hsapiens.UCSC.hg19,
> > phastCons100way.UCSC.hg19,
MafDb.1Kgenomes.phase1.hs37d5,
> > SNPlocs.Hsapiens.dbSNP144.GRCh37,
VariantAnnotation,
> > TxDb.Hsapiens.UCSC.hg19.knownGene, gwascat,
RColorBrewer
> >
> > and here a session information in a fresh R-devel
session after loading
> > the package 'GenomicScores':
> >
> > R Under development (unstable) (2020-01-29 r77745)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: CentOS Linux 7 (Core)
> >
> > Matrix products: default
> > BLAS: /opt/R/R-devel/lib64/R/lib/libRblas.so
> > LAPACK: /opt/R/R-devel/lib64/R/lib/libRlapack.so
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
> > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
> > [7] LC_PAPER=en_US.UTF8 LC_NAME=C
> > [9] LC_ADDRESS=C LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel stats4 stats graphics grDevices
utils datasets
> > [8] methods base
> >
> > other attached packages:
> > [1] GenomicScores_1.11.4 GenomicRanges_1.39.2
GenomeInfoDb_1.23.10
> > [4] IRanges_2.21.3 S4Vectors_0.25.12
BiocGenerics_0.33.0
> > [7] colorout_1.2-2
> >
> > loaded via a namespace (and not attached):
> > [1] Rcpp_1.0.3 lattice_0.20-38
> > [3] shinycustomloader_0.9.0 Rsamtools_2.3.3
> > [5] Biostrings_2.55.4 assertthat_0.2.1
> > [7] digest_0.6.23 mime_0.9
> > [9] BiocFileCache_1.11.4 R6_2.4.1
> > [11] RSQLite_2.2.0 httr_1.4.1
> > [13] pillar_1.4.3 zlibbioc_1.33.1
> > [15] rlang_0.4.4 curl_4.3
> > [17] data.table_1.12.8 blob_1.2.1
> > [19] DT_0.12 Matrix_1.2-18
> > [21] shinythemes_1.1.2 shinyjs_1.1
> > [23] BiocParallel_1.21.2 AnnotationHub_2.19.7
> > [25] htmlwidgets_1.5.1 RCurl_1.98-1.1
> > [27] bit_1.1-15.1 shiny_1.4.0
> > [29] DelayedArray_0.13.3 compiler_4.0.0
> > [31] httpuv_1.5.2 rtracklayer_1.47.0
> > [33] pkgconfig_2.0.3 htmltools_0.4.0
> > [35] tidyselect_1.0.0
SummarizedExperiment_1.17.1
> > [37] tibble_2.1.3
GenomeInfoDbData_1.2.2
> > [39] interactiveDisplayBase_1.25.0 matrixStats_0.55.0
> > [41] XML_3.99-0.3 crayon_1.3.4
> > [43] dplyr_0.8.4 dbplyr_1.4.2
> > [45] later_1.0.0
GenomicAlignments_1.23.1
> > [47] bitops_1.0-6 rappdirs_0.3.1
> > [49] grid_4.0.0 xtable_1.8-4
> > [51] DBI_1.1.0 magrittr_1.5
> > [53] XVector_0.27.0 promises_1.1.0
> > [55] vctrs_0.2.2 tools_4.0.0
> > [57] bit64_0.9-7 BSgenome_1.55.3
> > [59] Biobase_2.47.2 glue_1.3.1
> > [61] purrr_0.3.3 BiocVersion_3.11.1
> > [63] fastmap_1.0.1 yaml_2.2.1
> > [65] AnnotationDbi_1.49.1 BiocManager_1.30.10
> > [67] memoise_1.1.0
> >
> >
> >
> > thanks!!
> >
> > robert.
> >
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
> --
> Robert Castelo, PhD
> Associate Professor
> Dept. of Experimental and Health Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514
> fax: +34.933.160.550
>
>
--
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
The information in this e-mail is intended only for th...{{dropped:20}}