Without more study, I can only give some general pointers.

The as.vector() in X1 <- as.vector(coord[1]) is almost certainly not needed. It 
will add a little bit to your execution time.
Converting the output of func() to a one row matrix is almost certainly not 
needed. Just return c(res1, res2).

Your data frame appears to be entirely numeric, in which case you don't need to 
ever use a data frame. 

Try
  apply( tab, 1, func, a=40, b=5, c=1 )
instead of all that dplyr stuff.


Your function can be redefined as

func <- function(coord, a, b, c){
    
          X1 <- as.vector(coord[1])
          Y1 <- as.vector(coord[2])
          X2 <- as.vector(coord[3])
          Y2 <- as.vector(coord[4])
    
           res1 <- mean(c((X1 - a) : (X1 - 1), (Y1 + 1) : (Y1 + 40)))
           res2 <- mean(c((X2 - a) : (X2 - 1), (Y2 + 1) : (Y2 + 40)))    
    
            if (c==0) c(res1, res2) else c(res1, res2)*b
        }

I suspect you can operate on the entire matrix, without looping (which both the 
apply() method, and the split/rbind method do, in effect), and if so it will be 
much faster. But I can't say for sure without more study.

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 11/1/18, 12:35 PM, "R-help on behalf of Nelly Reduan" 
<r-help-boun...@r-project.org on behalf of nell.r...@hotmail.fr> wrote:

    Hello,
    
    I have a input data frame with multiple rows. For each row, I want to apply 
a function. The input data frame has 1,000,000+ rows. How can I speed up my 
code ? I would like to keep the function "func".
    
    Here is a reproducible example with a simple function:
    
        library(tictoc)
        library(dplyr)
    
    func <- function(coord, a, b, c){
    
          X1 <- as.vector(coord[1])
          Y1 <- as.vector(coord[2])
          X2 <- as.vector(coord[3])
          Y2 <- as.vector(coord[4])
    
          if(c == 0) {
    
            res1 <- mean(c((X1 - a) : (X1 - 1), (Y1 + 1) : (Y1 + 40)))
            res2 <- mean(c((X2 - a) : (X2 - 1), (Y2 + 1) : (Y2 + 40)))
            res <- matrix(c(res1, res2), ncol=2, nrow=1)
    
          } else {
    
            res1 <- mean(c((X1 - a) : (X1 - 1), (Y1 + 1) : (Y1 + 40)))*b
            res2 <- mean(c((X2 - a) : (X2 - 1), (Y2 + 1) : (Y2 + 40)))*b
            res <- matrix(c(res1, res2), ncol=2, nrow=1)
    
          }
    
          return(res)
        }
    
        ## Apply the function
        set.seed(1)
        n = 10000000
        tab <- as.matrix(data.frame(x1 = sample(1:100, n, replace = T), y1 = 
sample(1:100, n, replace = T), x2 = sample(1:100, n, replace = T), y2 = 
sample(1:100, n, replace = T)))
    
    
      tic("test 1")
      test <- tab %>%
        split(1:nrow(tab)) %>%
        map(~ func(.x, 40, 5, 1)) %>%
        do.call("rbind", .)
      toc()
    
    test 1: 599.2 sec elapsed
    
    Thanks very much for your time
    Have a nice day
    Nell
    
        [[alternative HTML version deleted]]
    
    ______________________________________________
    R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
    

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to