> If I use a nested ifelse statement in a loop it takes me 13 
> minutes to get an answer on just 50,000 rows. 
> ...
> ifelse(strataID[i+1]==strataID[i], y<-x[i+1], y<-x[i-1]))

maybe take a closer look at the ifelse help page and the examples?

First, ifelse is intended to be vectorized. If you nest it in a loop, you're 
effectively nesting a loop inside a loop. And by putting ifelse inside ifelse, 
you've done that twice. And then you've run the loops on vectors of length one, 
so 'twas all in vain...
Second, the two things after the condition in ifelse are not instructions, they 
are arguments to the function. Putting y<-something in as an argument means 
'(promise to) store something in a variable called y, and then pass y to the 
function'. You probably didn't mean that.
Third, ifelse returns a vector of the results; you're not using the return 
value for anything.

For a single 'if' that takes some action, you want 'if' and 'else' 
_separately_, not 'ifelse'
y<-length(x) #length() already returns a numeric value. So if you must do this 
with a loop, it would look more like
 
for(i in 1:length(x)+1) { #because x[i-1] wand x[i+1] won't be there for all i 
otherwise  
        if (!is.na(x[i])) , y[i]<-x[i]
        if(strataID[i+1]==strataID[i]) y<-x[i+1] else y<-x[i] #I changed the 
second x index  because I can't see why it differed from the strataID index
               #or, using the fact that 'if' also returns something:
               # y <- if(strataID[i+1]==strataID[i]) x[i+1] else x[i]
} 

Finally, if you don't preallocate y at the length you want, R will have to move 
the whole of y to a new memory location with one more space every time you 
append something to it. There's a section on that in the R inferno. It's a 
really good way of slowing R down.

So let's try something else.
strataID <- sample(letters[1:3], 2000000, replace=T) #a nice long strata 
identifier with some matches likely
x <- rnorm(2000000) #some random numbers
x <- ifelse(x < -2, NA, x) #a few NA's now in x, though it does take a few 
seconds for the 2 million observations

i <- 1:(length(x)-1)  #A long indexing vector with space for the last x[i+1]
y <- x  #That puts all the NA's in the right place in y, allocates y and 
happens to put all the current values of x into y too.
system.time( y[i]<-ifelse( strataID[i+1]==strataID[i], x[i+1], x[i]  ) )
                              #does the whole loop and stores it in the 'right' 
places in y - 
                              # though it will foul up those NA's because of 
your x indexing. And incidentally it doesn't change the last y either
                               #On my allegedly 2GHz machine the systemt time 
result was 2.87 seconds for the 2 million 'rows' 


#Incidentally, a look at what we ended up with:
data.frame(s=strataID, y=y)[1:30,]
#says you probably aren;t getting anything useful from the exercise other than 
a feel for what can go wrong with loops.

> 

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to