Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt : > Glad to help -- I haven't taken a look at Dennis' solution (which may be far > better than mine), but if you do want to keep going down the path outlined > below you might consider the following:
I will try Dennis’ solution right away but looked at your suggestions first. Thank you very much. > Instead of throwing away a simulation if something starts negative, why not > just multiply the entire sample by -1: that lets you still use the sample > and saves you some computations: of course you'll have to remember to adjust > your final results accordingly. That is a nice suggestion. For a symmetric random walk this is indeed possible and equivalent to looking when the walk first hits zero. > This might avoid the loop: > > x = ## Whatever x is. > xLag = c(0,x[-length(x)]) # 'lag' x by 1 step. > which.max((x>=0) & (xLag <0)) + 1 # Depending on how you've decided to count > things, this +1 may be extraneous. > > The inner expression sets a 0 except where there is a switch from negative > to positive and a one there: the which.max function returns the location of > the first maximum, which is the first 1, in the vector. If you are > guaranteed the run starts negative, then the location of the first positive > should give you the length of the negative run. That is the same idea as from Bill [1]. The problem is, when the walk never returns to zero in a sample, `which.max(»everything FALSE)` returns 1 [2]. That is no problem though, when we do not have to worry about a walk starting with a positive value and adding 1 (+1) can be omitted when we count the epochs of first hitting 0 instead of the time of how long the walk stayed negative, which is always one less. Additionally my check `(x>=0) & (xLag <0)` is redundant when we know we start with a negative value. `(x>=0)` should be good enough in this case. > This all gives you, > > f4 <- function(n = 100000, # number of simulations > length = 100000) # length of iterated sum > { > R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) > > > R = apply(R,1,cumsum) > > > R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first > element in the row is positive, flip the entire row The line above seems to look the columns instead of rows. I think the following is correct since after the `apply()` above the random walks are in the columns. R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] > > fTemp <- function(x) { > > > xLag = c(0,x[-length(x)]) > return(which.max((x>=0) & (xLag <0))+1) > > > countNegative = apply(R,2,fTemp) > > tabulate(as.vector(countNegative), length) > > } > > That just crashed my computer though, so I wouldn't recommend it for large > n,length. Welcome to my world. I would have never thought that simulating random walks with a length of say a million would create that much data and push common desktop systems with let us say 4 GB of RAM to their limits. > Instead, you can help a little by combining the lagging and the & > all in one. > > f4 <- function(n = 100000, llength = 100000) > { > R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) > R = apply(R,1,cumsum) > R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row > is positive, flip the entire row > R = (cbind(rep(0,NROW(R)),R)<0)&(cbind(R,rep(0,NROW(R)))>=0) > countNegative = apply(R,1,which.max) + 1 > return (tabulate(as.vector(countNegative), length) ) > > > } I left that one out, because as written above the check can be shortened. > Of course, this is all starting to approach a very specific question that > could actually be approached much more efficiently if it's your end goal > (though I think I remember from your first email a different end goal): That is true. But to learn some optimization techniques on a simple example is much appreciated and will hopefully help me later on for the iterated random walk cases. > We can use the symmetry and "restart"ability of RW to do the following: > > x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T) > D = diff(which(x == 0)) Nice! > This will give you a vector of how long x stays positive or negative at a > time. Thinking through some simple translations lets you see that this set > has the same distribution as how long a RW that starts negative stays > negative. I have to write those translations down. On first sight though we need again to handle the case where it stays negative the whole time. `D` then has length 0 and we have to count that for a walk longer than `BIGNUMBER`. > Again, this is only good for answering a very specific question > about random walks and may not be useful if you have other more complicated > questions in sight. Just testing for 0 for the iterated cases will not be enough for iterated random walks since an iterated random walk can go from negative to non-negative without being zero at this time/epoch. I implemented all your suggestions and got the following. -------- 8< -------- code -------- >8 -------- f4 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n) R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better fTemp <- function(x) { if (x[1] >= 0 ) { return(1) } for (i in 1:length-1) { if (x[i] < 0 && x[i + 1] >= 0) { return(as.integer(i/2 + 2)) } } } countNegative = apply(R,2,fTemp) tabulate(as.vector(countNegative), length) } f5 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) R = apply(R,1,cumsum) R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] # If the first element in the row is positive, flip the entire row R <- R>=0 countNegative = apply(R,2,which.max) tabulate(as.vector(countNegative), length) } f6 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { x = cumsum(sample(c(-1L,1L), length*n,replace=T)) D = diff(which(c(0, x) == 0)) tabulate(D, max(D)) } -------- 8< -------- code -------- >8 -------- The timings differ quite much which is expected though. > # f1 is using only for loops but only does half the calculations > # and does not yet flip random walks starting with a positive value. > set.seed(1) ; system.time( z1 <- f1(300, 1e5) ) User System verstrichen 2.700 0.008 2.729 > # f1 adapted with flips > set.seed(1) ; system.time( z1f <- f1withflip(300, 1e5) ) User System verstrichen 4.457 0.004 4.475 > set.seed(1) ; system.time( z4 <- f4(300, 1e5) ) User System verstrichen 8.033 0.380 8.739 > set.seed(1) ; system.time( z5 <- f5(300, 1e5) ) User System verstrichen 9.640 0.812 10.588 > set.seed(1) ; system.time( z6 <- f6(300, 1e5) ) User System verstrichen 4.208 0.328 4.606 So `f6` seems to be the most efficient setting right now and even is slightly faster than `f1` with the for loops. But we have to keep in mind that both operate on different data sets although `set.seed(1)` is used and `f6` treats the problem totally different. One other thought is that when reusing the walks starting with a positiv term and flipping those we can probably also take the backward/reverse walk (dual problem). I will try that too. Thank you very much, Paul [1] https://stat.ethz.ch/pipermail/r-help/2011-July/285015.html [2] https://stat.ethz.ch/pipermail/r-help/2011-July/285396.html
signature.asc
Description: This is a digitally signed message part
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.