Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

Martin Morgan Fri, 19 Jan 2018 11:13:52 -0800

On 01/19/2018 12:37 PM, Gabe Becker wrote:

IT seems like you could also force a copy of the reference object via
<dbobject>$copy() and then force a refresh of the conn slot by assigning a
new db connection into it.


I'm having trouble confirming that this would work, however, because I
actually can't reproduce the error. The naive way works for me on my mac
laptop (which is running an old R and Bioconductor) and on the linux
cluster I have access to (running Bioc 3.6):


(cluster)

getSymbol <- function ( x ) {


+ return( AnnotationDbi::mget( x , hgu95av2SYMBOL ) )

+ }


pass the data base connection to the function

getSymbol <- function ( x, db )
    ## olde schoole
    AnnotationDbi::mget(x, db)
    ## AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")

and arrange for the general case, i.e., distinct processes with dataserialized between them


> cl = parallel::makePSOCKcluster(2)
> parLapply(cl, x, getSymbol, hgu95av2SYMBOL)
Error in checkForRemoteErrors(val) :
  2 nodes produced errors; first error: external pointer is not valid

(getSymbol would fail as originally written in the serial case, sincethe workers would not have access to hgu95av2SYMBOL


The workaround is to open the connection on the node, e.g.,

getSymbol <- function ( x, dbname ) {
    nmspc <- loadNamespace(dbname)
    db <- get(dbname, nmspc)
    AnnotationDbi::mapIds(db, x, "SYMBOL", "PROBEID")
}

lapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db")
bplapply(x, getSymbol, "hgu95av2.db", BPPARAM = SnowParam())

Martin

x <- list( "36090_at" , "38785_at" )

mclapply( x , getSymbol )


[[1]]

[[1]]$`36090_at`

[1] "TBL2"



[[2]]

[[2]]$`38785_at`

[1] "MUC1"

sessionInfo()


R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)


Matrix products: default

BLAS:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRblas.so

LAPACK:
/gnet/is2/p01/apps/R/3.4.3-20171201-current/x86_64-linux-2.6-rhel6/lib64/R/lib/libRlapack.so


locale:

  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8

  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C

  [9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets

[8] methods   base


other attached packages:

[1] hgu95av2.db_3.2.3    org.Hs.eg.db_3.5.0   AnnotationDbi_1.40.0

[4] IRanges_2.12.0       S4Vectors_0.16.0     Biobase_2.38.0

[7] BiocGenerics_0.24.0


loaded via a namespace (and not attached):

  [1] Rcpp_0.12.14    digest_0.6.14   DBI_0.7         RSQLite_2.0

  [5] pillar_1.1.0    rlang_0.1.6     blob_1.1.0      bit64_0.9-8

  [9] bit_1.1-13      compiler_3.4.3  pkgconfig_2.0.1 memoise_1.1.0

[13] tibble_1.4.1



~G

On Fri, Jan 19, 2018 at 9:23 AM, Vincent Carey <st...@channing.harvard.edu>
wrote:

good question

some of the discussion on

http://sqlite.1065341.n5.nabble.com/Parallel-access-to-
read-only-in-memory-database-td91814.html

seems relevant.

converting the relatively small annotation package content to pure R
read-only tables on the master before parallelizing
might be very simple?

On Fri, Jan 19, 2018 at 11:43 AM, Ludwig Geistlinger <
ludwig.geistlin...@sph.cuny.edu> wrote:

Hi,

Within a package I am developing, I would like to enable parallel probe

to

gene mapping for a compendium of microarray datasets.

This accordingly makes use of annotation packages such as hgu133a.db,
which in turn connect to the SQLite database via AnnotationDbi.

When running in multi-core mode (i.e. using a MulticoreParam with
BiocParallel) using more than 2 cores, this causes the error:

database disk image is malformed


In a very similar problem:

https://support.bioconductor.org/p/38541/

Adi Tarca and Dan Tenenbaum identified and resolved this problem by
ensuring that each process has its own unique database connection, i.e.
AnnotationDbi is not loaded before sending the job to the workers.

This solution was easily realized as this analysis was carried out within
a script and not a package.

However, within my package, AnnotationDbi is loaded as a dependency of my
package's imports.

How to resolve this here?
I am not sure whether I perfectly understand the underlying mechanisms,
but is there a way to make my workers load their own version of
AnnotationDbi instead of using the one of the parent process?
Or am I supposed to unload all packages depending on AnnotationDbi, and
AnnotationDbi itself, before sending the job to the workers (and reload

all

of them after the job has finished?)

Thanks a lot,
Ludwig



--
Dr. Ludwig Geistlinger
CUNY School of Public Health

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or...{{dropped:2}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] BiocParallel and AnnotationDbi: database disk image is malformed

Reply via email to