Try the patch version... Maybe is the same problem I had with large database when using gsub()
HTH El mar, 03-11-2009 a las 20:31 +0100, Richard R. Liu escribió: > I apologize for not being clear. d is a character vector of length > 158908. Each element in the vector has been designated by sentDetect > (package: openNLP) as a sentence. Some of these are really > sentences. Others are merely groups of meaningless characters > separated by white space. strapply is a function in the package > gosubfn. It applies to each element of the first argument the regular > expression (second argument). Every match is then sent to the > designated function (third argument, in my case missing, hence the > identity function). Thus, with strapply I am simply performing a > white-space tokenization of each sentence. I am doing this in the > hope of being able to distinguish true sentences from false ones on > the basis of mean length of token, maximum length of token, or similar. > > Richard R. Liu > Dittingerstr. 33 > CH-4053 Basel > Switzerland > > Tel.: +41 61 331 10 47 > Email: richard....@pueo-owl.ch > > > On Nov 3, 2009, at 18:30 , Uwe Ligges wrote: > > > > > > > richard....@pueo-owl.ch wrote: > >> I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think > >> this > >> is a Mac-specific problem. > >> I have a very large (158,908 possible sentences, ca. 58 MB) plain > >> text > >> document d which I am > >> trying to tokenize: t <- strapply(d, "\\w+", perl = T). I am > >> encountering the following error: > > > > > > What is strapply() and what is d? > > > > Uwe Ligges > > > > > > > > > >> Error in base::gsub(pattern, rs, x, ...) : > >> Calloc could not allocate (-1398215180 of 1) memory > >> This happens regardless of whether I run in 32- or 64-bit mode. The > >> machine has 8 GB of RAM, so > >> I can hardly believe that RAM is a problem. > >> Thanks, > >> Richard > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > --Apple-Mail-8--203371287-- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.