Paul, Are you submitting the exact code that I included in my previous e-mail? When I submit that code, I get this ...
chrom chromStart chromEnd name cumsum bin 1 chr1 10089 10309 ZBTB33 10089 1 2 chr1 10132 10536 TAF7_(SQ-8) 20221 1 3 chr2 10133 10362 Pol2-4H8 30354 2 4 chr2 10148 10418 MafF_(M8194) 40502 2 5 chr2 210382 210578 ZBTB33 50884 3 6 chr2 216132 216352 CTCF 67016 3 Jean Paul Guilhamon <paul.guilha...@gmail.com> wrote on 07/02/2012 08:59:00 AM: > Thanks for your reply Jean, > > I think your interpretation is correct but when I run your code I end > up with the below dataframe and obviously the bins created there don't > correspond to a chromStart change of 115341: > > chrom chromStart chromEnd name cumsum bin > 1 chr1 10089 10309 ZBTB33 10089 1 > 2 chr1 10132 10536 TAF7_(SQ-8) 20221 2 > 3 chr2 10133 10362 Pol2-4H8 30354 3 > 4 chr2 10148 10418 MafF_(M8194) 40502 4 > 5 chr2 210382 210578 ZBTB33 50884 5 > 6 chr2 216132 216352 CTCF 67016 6 > > the first two rows should have the same bin number (same chrom, > <115341 diff), then rows 3&4 should be in another bin (different chrom > from rows 1&2, <115341 diff), and rows 5&6 in another one (same chrom > but >115341 difference between row 4 and row 5). > > it seems the new.bin line of your code isn't quite doing what it > should but I can't pinpoint the error there... > Paul > > > On 2 July 2012 14:19, Jean V Adams <jvad...@usgs.gov> wrote: > > Paul, > > > > My interpretation is that you are trying to assign a new bin number to a row > > every time the variable chrom changes and every time the variable chromStart > > changes by 115341 or more. Is that right? If so, you don't need a loop at > > all. Check out the code below. I made a couple changes to the all.tf7 > > example data frame so that it would have two changes in bin number, one > > based on the chrom variable and one based on the chromStart variable. > > > > Jean > > > > all.tf7 <- data.frame( > > chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"), > > chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), > > chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), > > name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)", > > "ZBTB33", "CTCF"), > > cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), > > bin = rep(NA, 6) > > ) > > > > # assign a new bin every time chrom changes and every time chromStart > > changes by 115341 or more > > L <- nrow(all.tf7) > > prev.chrom <- c(NA, all.tf7$chrom[-L]) > > delta.start <- c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L]) > > new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start >= > > 115341 > > all.tf7$bin <- cumsum(new.bin) > > all.tf7 > > > > > > pguilha <paul.guilha...@gmail.com> wrote on 07/02/2012 06:25:13 AM: > > > >> Hello all, > >> > >> I have written a for loop to act on a dataframe with close to 3million > >> rows > >> and 6 columns and I would like to pass it to apply() to speed the process > >> up > >> (I let the loop run for 2 days before stopping it and it had only gone > >> through 200,000 rows) but I am really struggling to find a way to pass the > >> arguments. Below are the loop and the head of the dataframe I am working > >> on. > >> Any hints would be much appreciated, thank you! (I have searched for this > >> but could not find any other posts doing quite what I want) > >> Paul > >> > >> x<-as.numeric(all.tf7[1,2]) > >> for (i in 2:nrow(all.tf7)) { > >> if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341) > >> all.tf7[i,6]<-all.tf7[i-1,6] > >> else if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)>=115341) { > >> all.tf7[i,6]<-(all.tf7[i-1,6]+1) > >> x<-as.numeric(all.tf7[i,2]) } > >> else if (all.tf7[i,1]!=all.tf7[i-1,1]) { > >> all.tf7[i,6]<-(all.tf7[i-1,6]+1) > >> x<-as.numeric(all.tf7[i,2]) } > >> } > >> > >> #the aim here is to attribute a bin number to each row so that I can then > >> split the dataframe according to those bins. > >> > >> > >> chrom chromStart chromEnd name cumsum bin > >> chr1 10089 10309 ZBTB33 10089 1 > >> chr1 10132 10536 TAF7_(SQ-8) 20221 1 > >> chr1 10133 10362 Pol2-4H8 30354 1 > >> chr1 10148 10418 MafF_(M8194) 40502 1 > >> chr1 10382 10578 ZBTB33 50884 1 > >> chr1 16132 16352 CTCF 67016 1 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.