Dear Dennis and Steve,
Am Sonntag, den 31.07.2011, 23:32 -0400 schrieb Steve Lianoglou: […] > How about trying to write the of this `f4` function below using the > rcpp/inline combo. The C/C++ you will need to write looks to be quite > trivial, let's change f4 to accept an x argument as a vector: > > I've defined f4 in the same way as Dennis did: > > > f4 <- function() > > { > > x <- sample(c(-1L,1L), 1) > > > > if (x >= 0 ) {return(1)} else { > > csum <- x > > len <- 1 > > while(csum < 0) { > > csum <- csum + sample(c(-1, 1), 1) > > len <- len + 1 > > } } > > len > > } > > Now, let's do some inline/c++ mojo: > > library(inline) > inc <- " > #include <stdio.h> > #include <stdlib.h> > #include <time.h> > " > > fxx <-cxxfunction(includes=inc, plugin="Rcpp", body=" > int len = 1; > int x = ((rand() % 2 ) == 0) ? 1 : -1; > int csum = x; > > while (csum < 0) { > x = ((rand() % 2 ) == 0) ? 1 : -1; > len++; > csum = csum + x; > } > > return wrap(len); > ") > > Assuming I've faithfully translated this into c++, the timings aren't > all that comparable. > > Doing 500 replicates with the pure R version: > > set.seed(123) > system.time(out <- replicate(500, f4())) > user system elapsed > 31.525 0.120 32.510 > > Doing 10,000 replicates using the fxx function doesn't even break a sweat: > > system.time(outxx <- replicate(10000, fxx())) > user system elapsed > 0.371 0.001 0.373 > > range(out) > [1] 1 1994308 > > range(outxx) > [1] 1 11909394 thank you very much for your suggestions. This is indeed a nice speed. 1. I first had that implemented in FORTRAN (and Python) too, but turned to R for two reasons. First I wanted to use also other distributions later on and thought that it would be easier with R and that R would have that implemented as fast as possible. Secondly I thought that R would also operate faster having the right vectorization and using `csum()`. But I guess it is difficult to find a good model to use the advantages of R. Especially looking at `top` when running this example CPU is used 100 % but memory only 40 MB from 2 GB. So if one could use another data structure maybe the calculations could be done on more walks at once. 2. It is indeed possible that the walk never returns to zero, so I should make sure, that I abort the while loop after a certain length. 3. Looking at the data types I am wondering if some integer overflow(?) could happen. I could make the length variable unsigned I suppose [1]. But still `csum` could go from `-len` to 0 and for the normal random walk unsigned should not be a problem too besides that the logic/checks have to be adapted. For integrated random walks, `ccsum += csum`, `ccsum` would go from -(ccsum**2)/2 up to 0. So later on I should use probably the 64 bit data type (unsigned) `long` for `ccsum`, `csum` and `length` to avoid those problems. Memory does not seem to be a problem. Also I need to add an additional check for the height and length in the while loop like the following. (csum < 0) && (csum > -ULONG_MAX) && (len =< ULONG_MAX) So I came up with the following and to use unsigned I only consider that the random walk stays positive instead of negative. -------- 8< -------- code -------- >8 -------- library(inline) inc <- " #include <climits> #include <stdio.h> #include <stdlib.h> #include <time.h> " f9 <-cxxfunction(includes=inc, plugin="Rcpp", body=" unsigned long len = 1; if ((rand() % 2 ) == 0) { return wrap(len); } unsigned long x = 1; for (unsigned long csum = x; csum > 0; csum = ((rand() % 2 ) == 0) ? csum + 1: csum - 1) { len++; if ((csum == ULONG_MAX) && (len == ULONG_MAX)) { return wrap(len); } } return wrap(len); ") -------- 8< -------- code -------- >8 -------- I do not know if the compiler would have optimized it that way anyway and if there is any difference (besides the overflow checks). > set.seed(1); system.time( z9_1 <- replicate(1000, f9()) ) User System verstrichen 0.076 0.004 0.084 > range(z9_1) [1] 1 1449034 > length(z9_1) [1] 1000 Thanks, Paul [1] https://secure.wikimedia.org/wikipedia/en/wiki/Integer_(computer_science)#Common_integral_data_types
signature.asc
Description: This is a digitally signed message part
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.