[Rd] R crashes when using huge data sets with character string variables
When working with a huge data set with character string variables, I experienced that various commands let R crash. When I run R in a Linux/bash console, R terminates with the message "Killed". When I use RStudio, I get the message "R Session Aborted. R encountered a fatal error. The session was terminated. Start New Session". If an object in the R workspace needs too much memory, I would expect that R would not crash but issue an error message "Error: cannot allocate vector of size ...". A minimal reproducible example (at least on my computer) is: nObs <- 1e9 date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) Is this a bug or a feature of R? Some information about my R version, OS, etc: R> sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8 [5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8 [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 /Arne -- Arne Henningsen http://www.arne-henningsen.name __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R crashes when using huge data sets with character string variables
On Windows you can use memory.limit. https://stackoverflow.com/questions/12582793/limiting-memory-usage-in-r-under-linux Not sure how much that helps. On 12/12/20 6:19 PM, Arne Henningsen wrote: When working with a huge data set with character string variables, I experienced that various commands let R crash. When I run R in a Linux/bash console, R terminates with the message "Killed". When I use RStudio, I get the message "R Session Aborted. R encountered a fatal error. The session was terminated. Start New Session". If an object in the R workspace needs too much memory, I would expect that R would not crash but issue an error message "Error: cannot allocate vector of size ...". A minimal reproducible example (at least on my computer) is: nObs <- 1e9 date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) Is this a bug or a feature of R? Some information about my R version, OS, etc: R> sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8 [5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8 [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 /Arne __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R crashes when using huge data sets with character string variables
> On Saturday, December 12, 2020, 6:33:33 PM EST, Ben Bolker > wrote: > > On Windows you can use memory.limit. > > https://stackoverflow.com/questions/12582793/limiting-memory-usage-in-r-under-linux > > Not sure how much that helps. > >On 12/12/20 6:19 PM, Arne Henningsen wrote: >> When working with a huge data set with character string variables, I >> experienced that various commands let R crash. When I run R in a >> Linux/bash console, R terminates with the message "Killed". When I use >> RStudio, I get the message "R Session Aborted. R encountered a fatal >> error. The session was terminated. Start New Session". If an object in >> the R workspace needs too much memory, I would expect that R would not >> crash but issue an error message "Error: cannot allocate vector of >> size ...". A minimal reproducible example (at least on my computer) >> is: >> >> nObs <- 1e9 >> >> date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, >> 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) >> >> Is this a bug or a feature of R? On OS X I see: > nObs <- 1e9 > date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs,1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) Error: vector memory exhausted (limit reached?) > sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7 Which is what I would expect. I don't doubt the error you've seen, just providing a data point for whoever ends up looking into this further. Best, Brodie. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] R crashes when using huge data sets with character string variables
If R is receiving a kill signal there is nothing it can do about it. I am guessing you are running into a memory over-commit issue in your OS. https://en.wikipedia.org/wiki/Memory_overcommitment https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/ If you have to run this close to your physical memory limits you might try using your shell's facility (ulimit for bash, limit for some others) to limit process memory/virtual memory use to your available physical memory. You can also try setting the R_MAX_VSIZE environment variable mentioned in ?Memory; that only affects the R heap, not malloc() done elsewhere. Best, luke On Sat, 12 Dec 2020, Arne Henningsen wrote: When working with a huge data set with character string variables, I experienced that various commands let R crash. When I run R in a Linux/bash console, R terminates with the message "Killed". When I use RStudio, I get the message "R Session Aborted. R encountered a fatal error. The session was terminated. Start New Session". If an object in the R workspace needs too much memory, I would expect that R would not crash but issue an error message "Error: cannot allocate vector of size ...". A minimal reproducible example (at least on my computer) is: nObs <- 1e9 date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ) Is this a bug or a feature of R? Some information about my R version, OS, etc: R> sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 locale: [1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_DK.UTF-8LC_COLLATE=en_DK.UTF-8 [5] LC_MONETARY=en_DK.UTF-8LC_MESSAGES=en_DK.UTF-8 [7] LC_PAPER=en_DK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 /Arne -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [External] R crashes when using huge data sets with character string variables
On 12 December 2020 at 21:26, luke-tier...@uiowa.edu wrote: | If R is receiving a kill signal there is nothing it can do about it. | | I am guessing you are running into a memory over-commit issue in your OS. | https://en.wikipedia.org/wiki/Memory_overcommitment | https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/ | | If you have to run this close to your physical memory limits you might | try using your shell's facility (ulimit for bash, limit for some | others) to limit process memory/virtual memory use to your available | physical memory. You can also try setting the R_MAX_VSIZE environment | variable mentioned in ?Memory; that only affects the R heap, not | malloc() done elsewhere. Similarly, as it is Linux, you could (easily) add virtual memory via a swapfile (see 'man 8 swapfile' and 'man 8 swapon'). But even then, I expect this to be slow -- 1e9 is a lot. I have 32gb and ample swap (which is rarely used, but a safety net). When I use your code with nObs <- 1e8 it ends up with about 6gb which poses poses no problem, but already takes 3 1/2 minutes: > nObs <- 1e8 > system.time(date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( > nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" )) user system elapsed 203.723 1.779 205.528 > You may want to play with the nObs value to see exactly where it breaks on your box. Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel