Folks: Asked this question some time ago, and found what appeared (at first) to be the best solution, but I'm now finding a new problem. First off, it seemed like ff as Jens suggested worked:
# outdata_ncells = the number of rows * number of columns * number of bands in an image: out<-ff(vmode="double",length=outdata_ncells,filename=filename) finalizer(out) <- close close(out) This was working fine until I attempted to set length to a VERY large number: outdata_ncells = 17711913600. This would create a file that is 131.964GB. Big, but not obscenely so (and certainly not larger than the filesystem can handle). However, length appears to be restricted by .Machine$integer.max (I'm on a 64-bit windows box): > .Machine$integer.max [1] 2147483647 Any suggestions on how to solve this problem for much larger file sizes? --j On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg <j...@illinois.edu>wrote: > Thanks, all! I'll try these out. I'm trying to work up something that is > platform independent (if possible) for use with mmap. I'll do some tests > on these suggestions and see which works best. I'll try to report back in a > few days. Cheers! > > --j > > > > 2012/5/3 "Jens Oehlschlägel" <jens.oehlschlae...@truecluster.com> > >> Jonathan, >> >> On some filesystems (e.g. NTFS, see below) it is possible to create >> 'sparse' memory-mapped files, i.e. reserving the space without the cost of >> actually writing initial values. >> Package 'ff' does this automatically and also allows to access the file >> in parallel. Check the example below and see how big file creation is >> immediate. >> >> Jens Oehlschlägel >> >> >> > library(ff) >> > library(snowfall) >> > ncpus <- 2 >> > n <- 1e8 >> > system.time( >> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff") >> + ) >> User System verstrichen >> 0.01 0.00 0.02 >> > # check finalizer, with an explicit filename we should have a 'close' >> finalizer >> > finalizer(x) >> [1] "close" >> > # if not, set it to 'close' inorder to not let slaves delete x on slave >> shutdown >> > finalizer(x) <- "close" >> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK") >> R Version: R version 2.15.0 (2012-03-30) >> >> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 >> CPUs. >> >> > sfLibrary(ff) >> Library ff loaded. >> Library ff loaded in cluster. >> >> Warnmeldung: >> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts >> = TRUE, : >> 'keep.source' is deprecated and will be ignored >> > sfExport("x") # note: do not export the same ff multiple times >> > # explicitely opening avoids a gc problem >> > sfClusterEval(open(x, caching="mmeachflush")) # opening with >> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS >> write storms when the file is larger than RAM >> [[1]] >> [1] TRUE >> >> [[2]] >> [1] TRUE >> >> > system.time( >> + sfLapply( chunk(x, length=ncpus), function(i){ >> + x[i] <- runif(sum(i)) >> + invisible() >> + }) >> + ) >> User System verstrichen >> 0.00 0.00 30.78 >> > system.time( >> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], >> c(0.05, 0.95)) ) >> + ) >> User System verstrichen >> 0.00 0.00 4.38 >> > # for completeness >> > sfClusterEval(close(x)) >> [[1]] >> [1] TRUE >> >> [[2]] >> [1] TRUE >> >> > csummary(s) >> 5% 95% >> Min. 0.04998 0.95 >> 1st Qu. 0.04999 0.95 >> Median 0.05001 0.95 >> Mean 0.05001 0.95 >> 3rd Qu. 0.05002 0.95 >> Max. 0.05003 0.95 >> > # stop slaves >> > sfStop() >> >> Stopping cluster >> >> > # with the close finalizer we are responsible for deleting the file >> explicitely (unless we want to keep it) >> > delete(x) >> [1] TRUE >> > # remove r-side metadata >> > rm(x) >> > # truly free memory >> > gc() >> >> >> >> *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr >> *Von:* "Jonathan Greenberg" <j...@illinois.edu> >> *An:* r-help <r-help@r-project.org>, r-sig-...@r-project.org >> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on >> disk? >> R-helpers: >> >> What would be the absolute fastest way to make a large "empty" file (e.g. >> filled with all zeroes) on disk, given a byte size and a given number >> number of empty values. I know I can use writeBin, but the "object" in >> this case may be far too large to store in main memory. I'm asking because >> I'm going to use this file in conjunction with mmap to do parallel writes >> to this file. Say, I want to create a blank file of 10,000 floating point >> numbers. >> >> Thanks! >> >> --j >> >> -- >> Jonathan A. Greenberg, PhD >> Assistant Professor >> Department of Geography and Geographic Information Science >> University of Illinois at Urbana-Champaign >> 607 South Mathews Avenue, MC 150 >> Urbana, IL 61801 >> Phone: 415-763-5476 >> AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 >> http://www.geog.illinois.edu/people/JonathanGreenberg.html >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-hpc mailing list >> r-sig-...@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc >> >> >> > > > -- > Jonathan A. Greenberg, PhD > Assistant Professor > Department of Geography and Geographic Information Science > University of Illinois at Urbana-Champaign > 607 South Mathews Avenue, MC 150 > Urbana, IL 61801 > Phone: 415-763-5476 > AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 > http://www.geog.illinois.edu/people/JonathanGreenberg.html > -- Jonathan A. Greenberg, PhD Assistant Professor Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 217-300-1924 AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 http://www.geog.illinois.edu/people/JonathanGreenberg.html [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.