Paul, My interpretation is that you are trying to assign a new bin number to a row every time the variable chrom changes and every time the variable chromStart changes by 115341 or more. Is that right? If so, you don't need a loop at all. Check out the code below. I made a couple changes to the all.tf7 example data frame so that it would have two changes in bin number, one based on the chrom variable and one based on the chromStart variable.
Jean all.tf7 <- data.frame( chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"), chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)", "ZBTB33", "CTCF"), cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), bin = rep(NA, 6) ) # assign a new bin every time chrom changes and every time chromStart changes by 115341 or more L <- nrow(all.tf7) prev.chrom <- c(NA, all.tf7$chrom[-L]) delta.start <- c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L]) new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start >= 115341 all.tf7$bin <- cumsum(new.bin) all.tf7 pguilha <paul.guilha...@gmail.com> wrote on 07/02/2012 06:25:13 AM: > Hello all, > > I have written a for loop to act on a dataframe with close to 3million rows > and 6 columns and I would like to pass it to apply() to speed the process up > (I let the loop run for 2 days before stopping it and it had only gone > through 200,000 rows) but I am really struggling to find a way to pass the > arguments. Below are the loop and the head of the dataframe I am working on. > Any hints would be much appreciated, thank you! (I have searched for this > but could not find any other posts doing quite what I want) > Paul > > x<-as.numeric(all.tf7[1,2]) > for (i in 2:nrow(all.tf7)) { > if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341) > all.tf7[i,6]<-all.tf7[i-1,6] > else if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)>=115341) { > all.tf7[i,6]<-(all.tf7[i-1,6]+1) > x<-as.numeric(all.tf7[i,2]) } > else if (all.tf7[i,1]!=all.tf7[i-1,1]) { > all.tf7[i,6]<-(all.tf7[i-1,6]+1) > x<-as.numeric(all.tf7[i,2]) } > } > > #the aim here is to attribute a bin number to each row so that I can then > split the dataframe according to those bins. > > > chrom chromStart chromEnd name cumsum bin > chr1 10089 10309 ZBTB33 10089 1 > chr1 10132 10536 TAF7_(SQ-8) 20221 1 > chr1 10133 10362 Pol2-4H8 30354 1 > chr1 10148 10418 MafF_(M8194) 40502 1 > chr1 10382 10578 ZBTB33 50884 1 > chr1 16132 16352 CTCF 67016 1 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.