Can you supply the results of sessionInfo() please, and the full bam call that causes this.
best, Simon (mgcv maintainer) On 15/03/2019 09:09, Frank van Berkum wrote: > Dear Community, > > In our current research we are trying to fit Generalized Additive Models to a > large dataset. We are using the package mgcv in R. > > Our dataset contains about 22 million records with less than 20 risk factors > for each observation, so in our case n>>p. The dataset covers the period 2006 > until 2011, and we analyse both the complete dataset and datasets in which we > leave out a single year. The latter part is done to analyse robustness of the > results. We understand k-fold cross validation may seem more appropriate, but > out approach is closer to what is done in practice (how will one additional > year of information affect your estimates?). > > We use the function bam as advocated in Wood et al. (2017), and we apply the > following options: bam(�, discrete=TRUE, chunk.size=10000, gc.level=1). We > run these analyses on a computer cluster (see > https://userinfo.surfsara.nl/systems/lisa/description for details), and the > job is allocated to a node within the computer cluster. A node has at least > 16 cores and 64Gb memory. > > We had expected 64Gb of memory to be sufficient for these analyses, > especially since the bam function is built specifically for large datasets. > However, when applying this function to the different datasets described > above with different regression specifications (different risk factors > included in the linear predictor), we sometimes obtain errors of the > following form. > > Error in XWyd(G$Xd, w, z, G$kd, G$ks, G$ts, G$dt, G$v, G$qc, G$drop, ar.stop, > : > > 'Calloc' could not allocate memory (22624897 of 8 bytes) > > Calls: fnEstimateModel_bam -> bam -> bgam.fitd -> XWyd > > Execution halted > > Warning message: > > system call failed: Cannot allocate memory > > Error in Xbd(G$Xd, coef, G$kd, G$ks, G$ts, G$dt, G$v, G$qc, G$drop) : > > 'Calloc' could not allocate memory (18590685 of 8 bytes) > > Calls: fnEstimateModel_bam -> bam -> bgam.fitd -> Xbd > > Execution halted > > Warning message: > > system call failed: Cannot allocate memory > > Error: cannot allocate vector of size 1.7 Gb > > Timing stopped at: 2 0.556 4.831 > > Error in system.time(oo <- .C(C_XWXd0, XWX = as.double(rep(0, (pt + nt)^2)), > : > > 'Calloc' could not allocate memory (55315650 of 24 bytes) > > Calls: fnEstimateModel_bam -> bam -> bgam.fitd -> XWXd -> system.time -> .C > > Timing stopped at: 1.056 1.396 2.459 > > Execution halted > > Warning message: > > system call failed: Cannot allocate memory > > The errors seem to arise at different stages in the optimization process. We > have analysed whether these errors disappear if different settings are used > (different chunk.size, different gc.level), but this does not resolve our > problem. Also, the errors occur on different datasets when using different > settings, and even when using the same settings it is possible that an error > that occurred on dataset X in one run it does not necessarily occur on > dataset X in a different run. When using the discrete=TRUE option, > optimization can be parallelized, but we have chosen to not employ this > feature to ensure memory does not have to be shared between parallel > processes. > > Naturally I cannot share our dataset with you which makes the problem > difficult to analyse. However, based on your collective knowledge, could you > pinpoint us to where the problem may occur? Is it something within the C-code > used within the package (as the last error seems to indicate), or is it > related to the computer cluster? > > Any help or insights is much appreciated. > > Kind regards, > > Frank > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, School of Mathematics, University of Bristol, BS8 1TW UK https://people.maths.bris.ac.uk/~sw15190/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.