Paul,

My interpretation is that you are trying to assign a new bin number to a 
row every time the variable chrom changes and every time the variable 
chromStart changes by 115341 or more.  Is that right?  If so, you don't 
need a loop at all.  Check out the code below.  I made a couple changes to 
the all.tf7 example data frame so that it would have two changes in bin 
number, one based on the chrom variable and one based on the chromStart 
variable.

Jean

all.tf7 <- data.frame(
        chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"), 
        chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), 
        chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), 
        name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)", 
"ZBTB33", "CTCF"), 
        cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), 
        bin = rep(NA, 6)
        )

# assign a new bin every time chrom changes and every time chromStart 
changes by 115341 or more
L <- nrow(all.tf7)
prev.chrom <- c(NA, all.tf7$chrom[-L])
delta.start <- c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L])
new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start 
>= 115341
all.tf7$bin <- cumsum(new.bin)
all.tf7


pguilha <paul.guilha...@gmail.com> wrote on 07/02/2012 06:25:13 AM:

> Hello all,
> 
> I have written a for loop to act on a dataframe with close to 3million 
rows
> and 6 columns and I would like to pass it to apply() to speed the 
process up
> (I let the loop run for 2 days before stopping it and it had only gone
> through 200,000 rows) but I am really struggling to find a way to pass 
the
> arguments. Below are the loop and the head of the dataframe I am working 
on.
> Any hints would be much appreciated, thank you! (I have searched for 
this
> but could not find any other posts doing quite what I want)
> Paul
> 
> x<-as.numeric(all.tf7[1,2])
> for (i in 2:nrow(all.tf7)) {
>   if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
> all.tf7[i,6]<-all.tf7[i-1,6]
>   else if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)>=115341) {
>     all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>     x<-as.numeric(all.tf7[i,2]) }
>   else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
>     all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>     x<-as.numeric(all.tf7[i,2]) } 
> }
> 
> #the aim here is to attribute a bin number to each row so that I can 
then
> split the dataframe according to those bins.
> 
> 
> chrom chromStart chromEnd         name cumsum bin
> chr1      10089             10309               ZBTB33  10089   1
> chr1      10132             10536      TAF7_(SQ-8)  20221   1
> chr1      10133             10362            Pol2-4H8  30354   1
> chr1      10148             10418  MafF_(M8194)  40502   1
> chr1      10382             10578                ZBTB33  50884   1
> chr1      16132             16352                    CTCF  67016   1
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to