Re: [R] Improving loop performance

jim holtman Tue, 11 May 2010 12:08:04 -0700

You are spending most of the time in the loop accessing the dataframe.  Put
the data you want out to a vector and then process that:


> x <- read.table(textConnection("       Row.ID AgilentProbe GeneSymbol
GeneID Exons AgilentStart first.geneid first.exon last.geneid last.exon
+ 8     1348 A_23_P116898        A2M      2    34      9112685
TRUE       TRUE        TRUE      TRUE
+ 62   19410  A_23_P95594       NAT1      9     4     18124656
TRUE       TRUE        TRUE      TRUE
+ 39   10323  A_23_P31798       NAT2     10     2     18302422
TRUE       TRUE        TRUE      TRUE
+ 21    5353 A_23_P162918   SERPINA3     12     5     94150936
TRUE       TRUE       FALSE     FALSE
+ 22    9999 A_23_P162913   SERPINA3     12     5     94150800
FALSE      FALSE       FALSE     FALSE
+ 98   29990 A_32_P151937   SERPINA3     12     5     94150720
FALSE      FALSE       FALSE      TRUE
+ 33    9516   A_23_P2920   SERPINA3     12     7     94158435
FALSE       TRUE       FALSE      TRUE
+ 96   29595 A_32_P124727   SERPINA3     12     8     94160018
FALSE       TRUE        TRUE      TRUE
+ 57   18176  A_23_P80570      AADAC     13     5    153028473
TRUE       TRUE        TRUE      TRUE
+ 46   16139  A_23_P56529       AAMP     14     9    218838396
TRUE       TRUE        TRUE      TRUE
+ 18    4438 A_23_P152527      AANAT     15     7     71976911
TRUE       TRUE        TRUE      TRUE
+ 69   21321 A_24_P172990       AARS     16    18     68845436
TRUE       TRUE        TRUE      TRUE
+ 82   24747 A_24_P330684       ABAT     18    17      8780872
TRUE       TRUE       FALSE      TRUE"), header=TRUE, as.is=TRUE)
> closeAllConnections()
> # create character string
> p1 <- x$AgilentProbe
> test <- x$first.exon
> for (i in seq_along(p1)){
+     if (!test[i]) p1[i] <- paste(p1[i-1], p1[i], sep=',')
+ }
> cbind(p1)
      p1
 [1,] "A_23_P116898"
 [2,] "A_23_P95594"
 [3,] "A_23_P31798"
 [4,] "A_23_P162918"
 [5,] "A_23_P162918,A_23_P162913"
 [6,] "A_23_P162918,A_23_P162913,A_32_P151937"
 [7,] "A_23_P2920"
 [8,] "A_32_P124727"
 [9,] "A_23_P80570"
[10,] "A_23_P56529"
[11,] "A_23_P152527"
[12,] "A_24_P172990"
[13,] "A_24_P330684"
>



On Tue, May 11, 2010 at 12:33 PM, jim holtman <jholt...@gmail.com> wrote:

> Instead of looping on each row, try the following
>
> p1 <- as.character(aga$AP)
> # skew by one on the paste
> p1 <- ifelse(aga2$first.exon, p1, paste(c("", tail(ags, -1)), aga2$AP,
> sep=','))
>
> ags <- as.character(aga$AS)
> ags <- ifelse(aga2$first.exon, ags, paste(c("", tail(ags, -1)), aga2$AS,
> sep=',')
>
> On Tue, May 11, 2010 at 12:17 PM, Mark Lamias <mlam...@yahoo.com> wrote:
>
>> R-users,
>>
>> I have the following piece of code which I am trying to run on a dataframe
>> (aga2) with about a half million records.  While the code works, it is
>> extremely slow.  I've read some of the help archives indicating that I
>> should allocate space to the p1 and ags1 vectors, which I have done, but
>> this doesn't seem to improve speed much.  Would anyone be able to provide me
>> with advice on how I might be able to speed this up?
>>
>>
>> p1 <- character(dim(aga2)[1])
>> ags <- character(dim(aga2)[1])
>> for (i in 1:dim(aga2)[1])
>> {
>>  if (aga2$first.exon[i]==TRUE)
>>  {
>>   p1[i]<-as.character(aga2[i, "AP"])
>>   ags[i]<-as.character(aga2[i, "AS"])
>>
>>  }
>>  else
>>  {
>>   p1[i]<-paste(p1[i-1], aga2[i, "AP"], sep=",")
>>   ags[i]<-paste(ags[i-1], aga2[i, "AS"], sep=",")
>>  }
>> }
>>
>> Thanks.
>>
>> --Mark Lamias
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improving loop performance

Reply via email to