f Of Richard R. Liu
Sent: Tuesday, November 03, 2009 11:32 AM
To: Uwe Ligges
Cc: r-help@r-project.org
Subject: Re: [R] R 2.10.0: Error in gsub/calloc
I apologize for not being clear. d is a character vector of length
158908. Each element in the vector has been designated by sentDetect
(package: openN
I am using gsubfn 0.5-0. When I do not specify perl = TRUE I now get
the following error on the same document:
Error in structure(.External("dotTcl", ..., PACKAGE = "tcltk"), class
= "tclObj") :
[tcl] bad index "1e+05": must be integer?[+-]integer? or end?
[+-]integer?.
Regards,
Richard
r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Richard R. Liu
Sent: Tuesday, November 03, 2009 3:00 PM
To: Kenneth Roy Cabrera Torres
Cc: r-help@r-project.org; Uwe Ligges
Subject: Re: [R] R 2.10.0: Error in gsub/calloc
Kenneth,
Thanks for the hint. I downloa
Note that you don't need perl = T since by default strapply uses tcl
regular expressions and they support \w. What happens if you omit the
perl = T?
Also please specify the version of gsubfn you are using and if its not
the latest then try it with the latest version.
On Tue, Nov 3, 2009 at 11:0
es utils datasets methods base
loaded via a namespace (and not attached):
[1] tcltk_2.10.0
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Richard R. L
Kenneth,
Thanks for the hint. I downloaded and installed the latest patch, but
to no avail. I can reproduce the error on a single sentence, the
longest in the document. It contains 743,393 characters. It isn't a
true sentence, but since it is more than three standard deviations
longer
works, it should be way faster than strapply() and should not have
any memory allocation issues either.
HTH.
Bert Gunter
Genentech Nonclinical Biostatistics
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Richard R. Liu
Sent: Tuesday
I apologize for not being clear. d is a character vector of length
158908. Each element in the vector has been designated by sentDetect
(package: openNLP) as a sentence. Some of these are really
sentences. Others are merely groups of meaningless characters
separated by white space. str
Try the patch version...
Maybe is the same problem I had with large
database when using gsub()
HTH
El mar, 03-11-2009 a las 20:31 +0100, Richard R. Liu escribió:
> I apologize for not being clear. d is a character vector of length
> 158908. Each element in the vector has been designated by s
richard@pueo-owl.ch wrote:
I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this
is a Mac-specific problem.
I have a very large (158,908 possible sentences, ca. 58 MB) plain text
document d which I am
trying to tokenize: t <- strapply(d, "\\w+", perl = T). I am
encounte
I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this
is a Mac-specific problem.
I have a very large (158,908 possible sentences, ca. 58 MB) plain text
document d which I am
trying to tokenize: t <- strapply(d, "\\w+", perl = T). I am
encountering the following error:
Error in
11 matches
Mail list logo