Hi Joris,

The amount of a month ago is normally one value from another row.
But I used 'sum<-sum + dataset[i,22]' because I would like to reuse the code 
also for other tables. In some tables it is possible that the value of last 
month is the sum of values from different rows.

Thank u for your time
Greetings,

Ian

-----Oorspronkelijk bericht-----
Van: joris meys [mailto:jorism...@gmail.com] 
Verzonden: maandag 19 oktober 2009 16:12
Aan: Ian Willems
CC: r-help@r-project.org
Onderwerp: Re: [R] how to get rid of 2 for-loops and optimize runtime

Hi Ian,

first of all, take a look at the functions sapply, mapply, lapply,
tapply, ... : they are the more efficient way of implementing loops.

Second, could you elaborate a bit further on the data set : the amount
of the month ago, is that one value from another row, or the sum of
all values in the previous month? I saw in your example dataset that
the last month has 2 rows, but couldn't figure out whether that's a
typo or really means something. That's necessary information to
optimize your code. 129s is indeed far too long for a simple action.

Cheers
Joris

On Mon, Oct 19, 2009 at 3:49 PM, Ian Willems
<ian.will...@uz.kuleuven.ac.be> wrote:
> Short: get rid of the loops I use and optimize runtime
>
> Dear all,
>
> I want to calculate for each row the amount of the month ago. I use a matrix 
> with 2100 rows and 22 colums (which is still a very small matrix. nrows of 
> other matrixes can easily be more then 100000)
>
> Table before
> Year  month quarter yearmonth Service ...  Amount
> 2009  9        Q3            092009          A                ...    120
> 2009  9        Q3            092009          B                 ...     80
> 2009  8        Q3           082009           A                  ...     40
> 2009  7        Q3           072009           A                   ...      50
>
> The result I want
> Year month  quarter yearmonth Service ...    Amount   amound_lastmonth
> 2009 9           Q3          092009              A            ...    120      
>    40
> 2009 9           Q3          092009              B            ...    80       
>     ...
> 2009 8           Q3          082009              A           ...    40        
>     50
> 2009 7           Q3          072009              A         ...     50         
>     ...
>
> Table is not exactly the same but gives a good idea what I have and what I 
> want
>
> The code I have written (see below) does what I want but it is very very 
> slow. It takes 129s for 400 rows. And the time gets four times higher each 
> time I double the amount of rows.
> I'm new in programming in R, but I found that you can use Rprof and 
> summaryRprof to analyse your code (output see below)
> But I don't really understand the output
> I guess I need code that requires linear time and need to get rid of the 2 
> for loops.
> can someone help me or tell me what else I can do to optimize my runtime
>
> I use R 2.9.2
> windows Xp service pack3
>
> Thank you in advance
>
> Best regards,
>
> Willems Ian
>
>
> *****************************
> dataset[,5]= month
> dataset[,3]= year
> dataset[,22]= amount
> dataset[,14]= servicetype
>
> [CODE]
> #for each row of the matrix check if each row has..
>> for (j in 1:Number_rows) {
> + sum<-0
> + for(i in 1:Number_rows){
> + if (dataset[j,14]== dataset[i,14]) #..the same service type
> +   {if (dataset[j,18]== dataset[i,18]) # .. the same department
> +        {if (dataset[j,5]== "1")  # if month=1, month ago is 12 and year is 
> -1
> +           {if ("12"== dataset[i,5])
> +            {if ((dataset[j,3]-1)== dataset[i,3])
> +
> +         { sum<-sum + dataset[i,22]}
> +      }}
> +      else {
> +       if ((dataset[j,5]-1)== dataset[i,5]) " if month != 1, month ago is 
> month -1
> +         { if (dataset[j,3]== dataset[i,3])
> +         {sum<-sum + dataset[i,22]}
> +      }}}}}}
>
> [\Code]
>
>> summaryRprof()
> $by.self
>               self.time self.pct total.time total.pct
> [.data.frame       33.92  26.2    80.90      62.5
> NextMethod         12.68  9.8     12.68       9.8
> [.factor            8.60  6.6      18.36      14.2
> Ops.factor          8.10  6.3      40.08      31.0
> sort.int            6.82  5.3      13.70      10.6
> [                   6.70  5.2      85.44      66.0
> names               6.54  5.1       6.54       5.1
> length              5.66  4.4       5.66       4.4
> ==                  5.04  3.9      44.92      34.7
> levels              4.80  3.7       5.56       4.3
> is.na               4.24  3.3       4.24       3.3
> dim                 3.66  2.8       3.66       2.8
> switch              3.60  2.8       3.80       2.9
> vector              2.68  2.1       8.02       6.2
> inherits            1.90  1.5       1.90       1.5
> any                 1.68  1.3       1.68       1.3
> noNA.levels         1.46  1.1       7.84       6.1
> .Call               1.40  1.1       1.40       1.1
> !                   1.26  1.0       1.26       1.0
> attr<-              1.06  0.8       1.06       0.8
> .subset             1.00  0.8       1.00       0.8
> class<-             0.82  0.6       0.82       0.6
> !=                  0.80  0.6       0.80       0.6
> levels.default      0.68  0.5       0.76       0.6
> all                 0.62  0.5       0.62       0.5
> <                   0.54  0.4       0.54       0.4
> -                   0.48  0.4       0.48       0.4
> is.factor           0.44  0.3       2.34       1.8
> .subset2            0.38  0.3       0.38       0.3
> attr                0.36  0.3       0.36       0.3
> is.character        0.28  0.2       0.28       0.2
> is.null             0.28  0.2       0.28       0.2
> |                   0.26  0.2       0.26       0.2
> oldClass<-          0.20  0.2       0.20       0.2
> is.atomic           0.16  0.1       0.16       0.1
> nzchar              0.10  0.1       0.10       0.1
> is.numeric          0.06  0.0       0.06       0.0
> oldClass            0.06  0.0       0.06       0.0
> (                   0.04  0.0       0.04       0.0
> [.data              0.02  0.0       0.02       0.0
>
> $by.total
>               total.time total.pct self.time self.pct
> [                   85.44  66.0      6.70      5.2
> [.data.frame        80.90  62.5     33.92     26.2
> ==                  44.92  34.7      5.04      3.9
> Ops.factor          40.08  31.0      8.10      6.3
> [.factor            18.36  14.2      8.60      6.6
> sort.int            13.70  10.6      6.82      5.3
> NextMethod          12.68  9.8     12.68      9.8
> vector               8.02  6.2      2.68      2.1
> noNA.levels          7.84  6.1      1.46      1.1
> names                6.54  5.1      6.54      5.1
> length               5.66  4.4      5.66      4.4
> levels               5.56  4.3      4.80      3.7
> is.na                4.24  3.3      4.24      3.3
> switch               3.80  2.9      3.60      2.8
> dim                  3.66  2.8      3.66      2.8
> is.factor            2.34  1.8      0.44      0.3
> inherits             1.90  1.5      1.90      1.5
> any                  1.68  1.3      1.68      1.3
> .Call                1.40  1.1      1.40      1.1
> !                    1.26  1.0      1.26      1.0
> attr<-               1.06  0.8      1.06      0.8
> .subset              1.00  0.8      1.00      0.8
> class<-              0.82  0.6      0.82      0.6
> !=                   0.80  0.6      0.80      0.6
> levels.default       0.76  0.6      0.68      0.5
> all                  0.62  0.5      0.62      0.5
> <                    0.54  0.4      0.54      0.4
> -                    0.48  0.4      0.48      0.4
> .subset2             0.38  0.3      0.38      0.3
> attr                 0.36  0.3      0.36      0.3
> is.character         0.28  0.2      0.28      0.2
> is.null              0.28  0.2      0.28      0.2
> |                    0.26  0.2      0.26      0.2
> oldClass<-           0.20  0.2      0.20      0.2
> is.atomic            0.16  0.1      0.16      0.1
> nzchar               0.10  0.1      0.10      0.1
> is.numeric           0.06  0.0      0.06      0.0
> oldClass             0.06  0.0      0.06      0.0
> (                    0.04  0.0      0.04      0.0
> [.data               0.02  0.0      0.02      0.0
>
> $sampling.time
> [1] 129.38
>
> Warning message:
> In readLines(filename, n = chunksize) :
>  incomplete final line found on 'Rprof.out'
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to