Jean, It's crazy, I'm still getting 1,2,3,4,5,6 in the bin column..... Also (this is an unrelated problem i think), unless I've misunderstood it, I think your code will only create a new bin if the difference between chromStart at i and i-1 position is >=115341....What I want is for a new bin to be created each time the difference between chromStart at i and i-j is >=115341, where 'i-j' corresponds to the first row of the last bin....Im not sure if I'm being clear...chromStart values correspond to coordinates along a chromosome so I want to basically cut up each chromosome into sections/bins of approximately 115341...
thanks again for all your efforts with this, they're much appreciated! Paul On 2 July 2012 19:36, Jean V Adams [via R] <ml-node+s789695n4635185...@n4.nabble.com> wrote: > Paul, > > Try this (I changed some of the object names, but the meat of the code is > the same): > > df <- data.frame( > chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"), > chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), > chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), > name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)", > "ZBTB33", "CTCF"), > cumsum = c(10089, 20221, 30354, 40502, 50884, 67016) > ) > > # assign a new bin every time chrom changes and every time chromStart > changes by 115341 or more > L <- nrow(df) > prev.chrom <- c(NA, df$chrom[-L]) > delta.start <- c(NA, df$chromStart[-1] - df$chromStart[-L]) > new.bin <- is.na(prev.chrom) | df$chrom != prev.chrom | delta.start >= > 115341 > df$bin <- cumsum(new.bin) > df > > > pguilha <[hidden email]> wrote on 07/02/2012 10:23:36 AM: > >> Jean, that's exactly what it should be, but yes I copied and pasted >> from your email so I don't see how I could have introduced an error in >> there.... >> paul >> >> On 2 July 2012 15:57, Jean V Adams [via R] >> <[hidden email]> wrote: >> > Paul, >> > >> > Are you submitting the exact code that I included in my previous > e-mail? > >> > When I submit that code, I get this ... >> > >> > chrom chromStart chromEnd name cumsum bin >> > 1 chr1 10089 10309 ZBTB33 10089 1 >> > 2 chr1 10132 10536 TAF7_(SQ-8) 20221 1 >> > 3 chr2 10133 10362 Pol2-4H8 30354 2 >> > 4 chr2 10148 10418 MafF_(M8194) 40502 2 >> > 5 chr2 210382 210578 ZBTB33 50884 3 >> > 6 chr2 216132 216352 CTCF 67016 3 >> > >> > Jean >> > >> > >> > Paul Guilhamon <[hidden email]> wrote on 07/02/2012 08:59:00 AM: >> > >> >> Thanks for your reply Jean, >> >> >> >> I think your interpretation is correct but when I run your code I end >> >> up with the below dataframe and obviously the bins created there > don't > >> >> correspond to a chromStart change of 115341: >> >> >> >> chrom chromStart chromEnd name cumsum bin >> >> 1 chr1 10089 10309 ZBTB33 10089 1 >> >> 2 chr1 10132 10536 TAF7_(SQ-8) 20221 2 >> >> 3 chr2 10133 10362 Pol2-4H8 30354 3 >> >> 4 chr2 10148 10418 MafF_(M8194) 40502 4 >> >> 5 chr2 210382 210578 ZBTB33 50884 5 >> >> 6 chr2 216132 216352 CTCF 67016 6 >> >> >> >> the first two rows should have the same bin number (same chrom, >> >> <115341 diff), then rows 3&4 should be in another bin (different > chrom > >> >> from rows 1&2, <115341 diff), and rows 5&6 in another one (same chrom >> >> but >115341 difference between row 4 and row 5). >> >> >> >> it seems the new.bin line of your code isn't quite doing what it >> >> should but I can't pinpoint the error there... >> >> Paul >> >> >> >> >> >> On 2 July 2012 14:19, Jean V Adams <[hidden email]> wrote: >> >> > Paul, >> >> > >> >> > My interpretation is that you are trying to assign a new bin number > to > >> > a row >> >> > every time the variable chrom changes and every time the variable >> > chromStart >> >> > changes by 115341 or more. Is that right? If so, you don't need a >> > loop at >> >> > all. Check out the code below. I made a couple changes to the >> > all.tf7 >> >> > example data frame so that it would have two changes in bin number, >> > one >> > >> >> > based on the chrom variable and one based on the chromStart > variable. >> >> > >> >> > Jean >> >> > >> >> > all.tf7 <- data.frame( >> >> > chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"), >> >> > chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), >> >> > chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), >> >> > name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", > "MafF_(M8194)", >> >> > "ZBTB33", "CTCF"), >> >> > cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), >> >> > bin = rep(NA, 6) >> >> > ) >> >> > >> >> > # assign a new bin every time chrom changes and every time > chromStart >> >> > changes by 115341 or more >> >> > L <- nrow(all.tf7) >> >> > prev.chrom <- c(NA, all.tf7$chrom[-L]) >> >> > delta.start <- c(NA, all.tf7$chromStart[-1] - > all.tf7$chromStart[-L]) > >> >> > new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom | >> > delta.start >= >> > >> >> > 115341 >> >> > all.tf7$bin <- cumsum(new.bin) >> >> > all.tf7 >> >> > >> >> > >> >> > pguilha <[hidden email]> wrote on 07/02/2012 06:25:13 AM: >> >> > >> >> >> Hello all, >> >> >> >> >> >> I have written a for loop to act on a dataframe with close to >> > 3million >> >> >> rows >> >> >> and 6 columns and I would like to pass it to apply() to speed the >> > process >> >> >> up >> >> >> (I let the loop run for 2 days before stopping it and it had only >> > gone >> >> >> through 200,000 rows) but I am really struggling to find a way to >> > pass the >> >> >> arguments. Below are the loop and the head of the dataframe I am >> > working >> >> >> on. >> >> >> Any hints would be much appreciated, thank you! (I have searched > for > >> > this >> > >> >> >> but could not find any other posts doing quite what I want) >> >> >> Paul >> >> >> >> >> >> x<-as.numeric(all.tf7[1,2]) >> >> >> for (i in 2:nrow(all.tf7)) { >> >> >> if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341) >> >> >> all.tf7[i,6]<-all.tf7[i-1,6] >> >> >> else if (all.tf7[i,1]==all.tf7[i-1,1] & > (all.tf7[i,2]-x)>=115341) { >> >> >> all.tf7[i,6]<-(all.tf7[i-1,6]+1) >> >> >> x<-as.numeric(all.tf7[i,2]) } >> >> >> else if (all.tf7[i,1]!=all.tf7[i-1,1]) { >> >> >> all.tf7[i,6]<-(all.tf7[i-1,6]+1) >> >> >> x<-as.numeric(all.tf7[i,2]) } >> >> >> } >> >> >> >> >> >> #the aim here is to attribute a bin number to each row so that I > can > >> > then >> > >> >> >> split the dataframe according to those bins. >> >> >> >> >> >> >> >> >> chrom chromStart chromEnd name cumsum bin >> >> >> chr1 10089 10309 ZBTB33 10089 1 >> >> >> chr1 10132 10536 TAF7_(SQ-8) 20221 1 >> >> >> chr1 10133 10362 Pol2-4H8 30354 1 >> >> >> chr1 10148 10418 MafF_(M8194) 40502 1 >> >> >> chr1 10382 10578 ZBTB33 50884 1 >> >> >> chr1 16132 16352 CTCF 67016 1 > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ________________________________ > If you reply to this email, your message will be added to the discussion > below: > http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635185.html > To unsubscribe from apply with multiple conditions, click here. > NAML -- View this message in context: http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635189.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.