Thanks for lending a helping hand.
I put together a self-contained example. Basically, it all relies on a couple of functions, where one function simply iterates the application of the other function. I am trying to implement the so-called Lempel-Ziv entropy estimator. The idea is to choose a position i along a string x (standing for a time series) and find the length of the shortest string starting from i which has never occurred before i. Please find below the R snippet which requires an input file (a simple text file) you can download from

http://dl.dropbox.com/u/5685598/time_series25_.dat

What puzzles me is that the list is not really long (less than 2000 entries) and I have not experienced the same problem even with longer lists.
Many thanks

Lorenzo

######################################


total_entropy_lz <- function(x){

if (length(x)==1){

print("sequence too short")

return("error")

} else{


n <- length(x)

prefactor <- 1/(n*log(n)/log(2))

n_seq <- seq(n)

entropy_list <- n_seq

for (i in n_seq){

entropy_list[i] <- entropy_lz(x,i)


}


}

total_entropy <- 1/(prefactor*sum(entropy_list))


return(total_entropy)

}


entropy_lz <- function(x,i){

past <- x[1:i-1]

n <- length(x)

lp <- length(past)

future <- x[i:n]

go_on <- 1

count_len <- 0

past_string <- paste(past, collapse="#")

while (go_on>0){

new_seq <- x[i:(i+count_len)]

fut_string <- paste(new_seq, collapse="#")

count_len <- count_len+1

if (grepl(fut_string,past_string)!=1){

go_on <- -1
}
}
return(count_len)
}

x <- scan("time_series25_.dat", what="")


S <- total_entropy_lz(x)






On 10/08/2010 07:30 PM, jim holtman wrote:
More specificity: how long is the string, what is the pattern you are
matching against?  It sounds like you might have a complex pattern
that in trying to match the string might be doing a lot of back
tracking and such.  There is an O'Reilly book on Mastering Regular
Expression that might help you understand what might be happening.  So
if you can provide a better example than just the error message, it
would be helpful.

On Fri, Oct 8, 2010 at 1:11 PM, Lorenzo Isella<lorenzo.ise...@gmail.com>  wrote:
Dear All,
I am experiencing some problems with a script of mine.
It crashes with this message

Error in grepl(fut_string, past_string) :
  invalid regular expression
'12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12653a6#12
Calls: entropy_estimate_hash ->  total_entropy_lz ->  entropy_lz ->  grepl
In addition: Warning message:
In grepl(fut_string, past_string) : regcomp error:  'Out of memory'
Execution halted

To make a long story short, I use some functions which eventually call grepl
on very long strings to check whether a certain substring is part of a
longer string.
Now, the script technically works (it never crashes when I run it on a
smaller dataset) and the problem does not seem to be RAM memory (I have
several GB of RAM on my machine and its consumption never shoots up so my
machine never resorts to swap memory).
So (though I am not an expert) it looks like the problem is some limitation
of grepl or R memory management.
Any idea about how I could tackle this problem or how I can profile my code
to fix it (though it really seems to me that I have to find a way to allow R
to process longer strings).
Any suggestion is appreciated.
Cheers

Lorenzo

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to