>>>>> "GA" == Gad Abraham <[EMAIL PROTECTED]> >>>>> on Sat, 17 May 2008 21:12:41 +1000 writes:
GA> Joram Posma wrote: >> Dear all, >> >> I have a few questions regarding the 64 bit version of R and the cache >> memory R uses. >> >> ----------------------------------- >> Computer & software info: >> >> OS: kUbuntu Feasty Fawn 7.04 (64-bit) >> Processor: AMD Opteron 64-bit >> R: version 2.7.0 (64-bit) >> Cache memory: currently 16 GB (was 2 GB) >> Outcome of 'limit' command in shell: cputime unlimited, filesize >> unlimited, datasize unlimited, stacksize 8192 kbytes, coredumpsize 0 >> kbytes, memoryuse unlimited, vmemoryuse unlimited, descriptors 1024, >> memorylocked unlimited, maxproc unlimited >> ----------------------------------- >> >> a. We have recently upgraded the cache memory from 2 to 16 GB. However, >> we have noticed that somehow R still swaps memory when datasets >> exceeding 2 GB in size are used. An indication that R uses approx. 2 GB >> of cache memory is that sometimes R also kills the session when datasets >> > 2 GB are loaded. How/where can we see how much cache memory R uses >> (since memory.size and memory.limit are only for windows, and to us >> those might be what we need)? use object.size(.) to see what's really the size of your large data object. Otherwise, > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 155605 8.4 350000 18.7 350000 18.7 Vcells 155621 1.2 2006827 15.4 2156058 16.5 is typically useful, but we often also use 'top' (on Linux that you have as well) to monitor the R process. Gad already mentioned this .. >> those might be what we need)? Could this be caused by the limit of the >> stack size (we are not exactly sure what the stack size is either) ? >> b. And how can we increase the cache memory used by R to 14 or even 16 >> GB (which might be tricky when running other programs, but still)? >> >> So in general: how can we get R to use the full memory capacity of the >> computer? >> GA> The term "cache memory" is something entirely different to what you're GA> referring to --- you're talking about RAM. yes indeed. GA> Anyway, under Linux R will take all the RAM it can get, and if you're GA> running a 64-bit OS on a 64-bit CPU then it should definitely be able to GA> use more than 2GB of RAM. definitely, and it does; I've tried up to 20 or so GB on a very similar platform as yours. HOWEVER, single R 'objects' are limited in size potentially earlier: Do read help(Memory-limits) which has all (?) pertinent info, and then also contains There are also limits on individual objects. On all versions of R, the maximum length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9, as lengths are stored as signed integers. In addition, the storage space cannot exceed the address limit, and if you try to exceed that limit, the error message begins 'cannot allocate vector of length'. The number of characters in a character string is in theory only limited by the address space. and if you now compute, e.g. a numeric vector needs 8 bytes per entry (plus ~ 40 bytes, it seems, currently on 64-bit linux), the maximal numeric vector would need > (2^31-1)*8 + 40 [1] 17179869216 bytes which is 2^14 MBytes (using the 1 MB = 2^20 bytes definition) which is around 16.4 GB. The maximal integer/logical vector would be half the size in bytes, i.e. ~ 8.2 GB. The *practical* maximal size is often a bit smaller, and note that you typically should have around 5 to 10 fold the amount of RAM than your large object, because of copying. GA> To see the memory usage, use the utility "top" in the console/terminal. GA> One thing to note: a dataset of 2GB on disk may take much more than 2GB GA> of RAM when loaded into R, due to the overhead of the metadata and the GA> fact that pointers are 64-bit long as well. exactly! The 'sfsmisc' package (from CRAN), contains some Unix-only utilities, of which Sys.ps() can be handy: It gives similar information as 'top' but from inside R. Use Sys.ps(fields="ALL") to see how many cpu/memory process-specific info you can get; the default, Sys.ps() as used below, just uses a few fields, but notably one of memory footprint. a memory-only version of Sys.ps() is Sys.sizes() {not used in axample below}, all from 'sfsmisc'. Here is a small excerpt of an R session (using R-devel, i.e. 2.8.0 unstable) on one of our 32 GB AMD Opteron 64-bit systems (running Linux, Redhat Enterprise, but could be Debian/Ubuntu as well): ## empty, newly started R session; a couple of MBs .. : > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 155605 8.4 350000 18.7 350000 18.7 Vcells 155621 1.2 2006827 15.4 2156058 16.5 > x <- rep(pi,2^28) > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 155606 8.4 350000 18.7 350000 18.7 Vcells 268591076 2049.2 564122996 4304.0 537026523 4097.2 ## Aha: now use 2 GB and have used 4 GB intermediately {also ## visible from 'top' > x <- rep(x,2) ## double the object size > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 155607 8.4 350000 18.7 350000 18.7 Vcells 537026532 4097.2 1409694707 10755.2 1342332915 10241.2 ## Yes, 4 GB now with a max. of 10.2 GB used during object construction > system.time(x <- x+1) user system elapsed 3.335 2.603 5.939 ## Now this shows the footprint *before* garbage collection: > sfsmisc::Sys.ps() pid pcpu time vsz comm "6451" "3.4" "00:00:52" "8440924" "R" > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 157099 8.4 350000 18.7 350000 18.7 Vcells 537027103 4097.2 1409694707 10755.2 1342332915 10241.2 ## after GC, we are back to 4 GB : > sfsmisc::Sys.ps() pid pcpu time vsz comm "6451" "3.4" "00:00:52" "4246616" "R" > ----------------- And 'top' (or other such tools) confirm, that the machine never swapped. {I wouldn't notice easily, as I am sitting at home, and the computer runs in a vault down there at ETH ;-)}. So you see, a 4 GB object was not a problem on this machine. Martin Maechler, ETH Zurich GA> -- GA> Gad Abraham GA> Dept. CSSE and NICTA GA> The University of Melbourne GA> Parkville 3010, Victoria, Australia GA> email: [EMAIL PROTECTED] GA> web: http://www.csse.unimelb.edu.au/~gabraham ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.