Thank you both very much! the factor issue was indeed solved by your modifications, and Jean that last bit of code does exactly what I need. Perfect! thanks again paul
On 2 July 2012 21:34, Jean V Adams [via R] <ml-node+s789695n4635200...@n4.nabble.com> wrote: > Thanks for the intrusion! I have > options(stringsAsFactors=FALSE) > and Paul probably doesn't, so he saw factors where I saw characters. > > Paul, > > I saw your other note ... try this code > > L <- nrow(df) > # assign a new bin every time chrom changes > prev.chrom <- c(NA, df$chrom[-L]) > bin1 <- cumsum(is.na(prev.chrom) | df$chrom != > levels(df$chrom)[prev.chrom]) > > # substract the minimum chromStart from each bin > min.start <- tapply(df$chromStart, bin1, min, na.rm=TRUE)[bin1] > > # split bins further if chromStart >= 115341 + min.start > bin2 <- floor((df$chromStart - min.start) / 115341) > > # combine the two bins into one > df$bin <- interaction(bin1, bin2) > > df > > > Jean > > > > Rui Barradas <[hidden email]> wrote on 07/02/2012 02:24:43 PM: > >> Hello, >> >> Sorry to intrude, but I think it's a factor issue. >> Try the changing the disjunction to, (in multiline edit) >> >> >> new.bin <- is.na(prev.chrom) | >> df$chrom != levels(df$chrom)[prev.chrom] | >> delta.start >= 115341 >> >> It should work, now. >> >> Hope this helps, >> >> Rui Barradas >> >> Em 02-07-2012 20:03, pguilha escreveu: >> > Jean, >> > It's crazy, I'm still getting 1,2,3,4,5,6 in the bin column..... >> > Also (this is an unrelated problem i think), unless I've misunderstood >> > it, I think your code will only create a new bin if the difference >> > between chromStart at i and i-1 position is >=115341....What I want is >> > for a new bin to be created each time the difference between >> > chromStart at i and i-j is >=115341, where 'i-j' corresponds to the >> > first row of the last bin....Im not sure if I'm being >> > clear...chromStart values correspond to coordinates along a chromosome >> > so I want to basically cut up each chromosome into sections/bins of >> > approximately 115341... >> > >> > thanks again for all your efforts with this, they're much appreciated! >> > Paul >> > >> > On 2 July 2012 19:36, Jean V Adams [via R] >> > <[hidden email]> wrote: >> >> Paul, >> >> >> >> Try this (I changed some of the object names, but the meat of the > code is >> >> the same): >> >> >> >> df <- data.frame( >> >> chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"), >> >> chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), >> >> chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), >> >> name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", > "MafF_(M8194)", > >> >> "ZBTB33", "CTCF"), >> >> cumsum = c(10089, 20221, 30354, 40502, 50884, 67016) >> >> ) >> >> >> >> # assign a new bin every time chrom changes and every time chromStart >> >> changes by 115341 or more >> >> L <- nrow(df) >> >> prev.chrom <- c(NA, df$chrom[-L]) >> >> delta.start <- c(NA, df$chromStart[-1] - df$chromStart[-L]) >> >> new.bin <- is.na(prev.chrom) | df$chrom != prev.chrom | delta.start >>= >> >> 115341 >> >> df$bin <- cumsum(new.bin) >> >> df >> >> >> >> >> >> pguilha <[hidden email]> wrote on 07/02/2012 10:23:36 AM: >> >> >> >>> Jean, that's exactly what it should be, but yes I copied and pasted >> >>> from your email so I don't see how I could have introduced an error > in > >> >>> there.... >> >>> paul >> >>> >> >>> On 2 July 2012 15:57, Jean V Adams [via R] >> >>> <[hidden email]> wrote: >> >>>> Paul, >> >>>> >> >>>> Are you submitting the exact code that I included in my previous >> >> e-mail? >> >> >> >>>> When I submit that code, I get this ... >> >>>> >> >>>> chrom chromStart chromEnd name cumsum bin >> >>>> 1 chr1 10089 10309 ZBTB33 10089 1 >> >>>> 2 chr1 10132 10536 TAF7_(SQ-8) 20221 1 >> >>>> 3 chr2 10133 10362 Pol2-4H8 30354 2 >> >>>> 4 chr2 10148 10418 MafF_(M8194) 40502 2 >> >>>> 5 chr2 210382 210578 ZBTB33 50884 3 >> >>>> 6 chr2 216132 216352 CTCF 67016 3 >> >>>> >> >>>> Jean >> >>>> >> >>>> >> >>>> Paul Guilhamon <[hidden email]> wrote on 07/02/2012 08:59:00 AM: >> >>>> >> >>>>> Thanks for your reply Jean, >> >>>>> >> >>>>> I think your interpretation is correct but when I run your code I > end > >> >>>>> up with the below dataframe and obviously the bins created there >> >> don't >> >> >> >>>>> correspond to a chromStart change of 115341: >> >>>>> >> >>>>> chrom chromStart chromEnd name cumsum bin >> >>>>> 1 chr1 10089 10309 ZBTB33 10089 1 >> >>>>> 2 chr1 10132 10536 TAF7_(SQ-8) 20221 2 >> >>>>> 3 chr2 10133 10362 Pol2-4H8 30354 3 >> >>>>> 4 chr2 10148 10418 MafF_(M8194) 40502 4 >> >>>>> 5 chr2 210382 210578 ZBTB33 50884 5 >> >>>>> 6 chr2 216132 216352 CTCF 67016 6 >> >>>>> >> >>>>> the first two rows should have the same bin number (same chrom, >> >>>>> <115341 diff), then rows 3&4 should be in another bin (different >> >> chrom >> >> >> >>>>> from rows 1&2, <115341 diff), and rows 5&6 in another one (same > chrom > >> >>>>> but >115341 difference between row 4 and row 5). >> >>>>> >> >>>>> it seems the new.bin line of your code isn't quite doing what it >> >>>>> should but I can't pinpoint the error there... >> >>>>> Paul >> >>>>> >> >>>>> >> >>>>> On 2 July 2012 14:19, Jean V Adams <[hidden email]> wrote: >> >>>>>> Paul, >> >>>>>> >> >>>>>> My interpretation is that you are trying to assign a new bin > number >> >> to >> >> >> >>>> a row >> >>>>>> every time the variable chrom changes and every time the variable >> >>>> chromStart >> >>>>>> changes by 115341 or more. Is that right? If so, you don't need > a >> >>>> loop at >> >>>>>> all. Check out the code below. I made a couple changes to the >> >>>> all.tf7 >> >>>>>> example data frame so that it would have two changes in bin > number, >> >>>> one >> >>>> >> >>>>>> based on the chrom variable and one based on the chromStart >> >> variable. >> >>>>>> >> >>>>>> Jean >> >>>>>> >> >>>>>> all.tf7 <- data.frame( >> >>>>>> chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", > "chr2"), >> >>>>>> chromStart = c(10089, 10132, 10133, 10148, 210382, > 216132), >> >>>>>> chromEnd = c(10309, 10536, 10362, 10418, 210578, > 216352), > >> >>>>>> name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", >> >> "MafF_(M8194)", >> >>>>>> "ZBTB33", "CTCF"), >> >>>>>> cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), >> >>>>>> bin = rep(NA, 6) >> >>>>>> ) >> >>>>>> >> >>>>>> # assign a new bin every time chrom changes and every time >> >> chromStart >> >>>>>> changes by 115341 or more >> >>>>>> L <- nrow(all.tf7) >> >>>>>> prev.chrom <- c(NA, all.tf7$chrom[-L]) >> >>>>>> delta.start <- c(NA, all.tf7$chromStart[-1] - >> >> all.tf7$chromStart[-L]) >> >> >> >>>>>> new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom | >> >>>> delta.start >= >> >>>> >> >>>>>> 115341 >> >>>>>> all.tf7$bin <- cumsum(new.bin) >> >>>>>> all.tf7 >> >>>>>> >> >>>>>> >> >>>>>> pguilha <[hidden email]> wrote on 07/02/2012 06:25:13 AM: >> >>>>>> >> >>>>>>> Hello all, >> >>>>>>> >> >>>>>>> I have written a for loop to act on a dataframe with close to >> >>>> 3million >> >>>>>>> rows >> >>>>>>> and 6 columns and I would like to pass it to apply() to speed > the >> >>>> process >> >>>>>>> up >> >>>>>>> (I let the loop run for 2 days before stopping it and it had > only >> >>>> gone >> >>>>>>> through 200,000 rows) but I am really struggling to find a way > to > >> >>>> pass the >> >>>>>>> arguments. Below are the loop and the head of the dataframe I am >> >>>> working >> >>>>>>> on. >> >>>>>>> Any hints would be much appreciated, thank you! (I have searched >> >> for >> >> >> >>>> this >> >>>> >> >>>>>>> but could not find any other posts doing quite what I want) >> >>>>>>> Paul >> >>>>>>> >> >>>>>>> x<-as.numeric(all.tf7[1,2]) >> >>>>>>> for (i in 2:nrow(all.tf7)) { >> >>>>>>> if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341) >> >>>>>>> all.tf7[i,6]<-all.tf7[i-1,6] >> >>>>>>> else if (all.tf7[i,1]==all.tf7[i-1,1] & >> >> (all.tf7[i,2]-x)>=115341) { >> >>>>>>> all.tf7[i,6]<-(all.tf7[i-1,6]+1) >> >>>>>>> x<-as.numeric(all.tf7[i,2]) } >> >>>>>>> else if (all.tf7[i,1]!=all.tf7[i-1,1]) { >> >>>>>>> all.tf7[i,6]<-(all.tf7[i-1,6]+1) >> >>>>>>> x<-as.numeric(all.tf7[i,2]) } >> >>>>>>> } >> >>>>>>> >> >>>>>>> #the aim here is to attribute a bin number to each row so that I >> >> can >> >> >> >>>> then >> >>>> >> >>>>>>> split the dataframe according to those bins. >> >>>>>>> >> >>>>>>> >> >>>>>>> chrom chromStart chromEnd name cumsum bin >> >>>>>>> chr1 10089 10309 ZBTB33 10089 1 >> >>>>>>> chr1 10132 10536 TAF7_(SQ-8) 20221 1 >> >>>>>>> chr1 10133 10362 Pol2-4H8 30354 1 >> >>>>>>> chr1 10148 10418 MafF_(M8194) 40502 1 >> >>>>>>> chr1 10382 10578 ZBTB33 50884 1 >> >>>>>>> chr1 16132 16352 CTCF 67016 > 1 > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ________________________________ > If you reply to this email, your message will be added to the discussion > below: > http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635200.html > To unsubscribe from apply with multiple conditions, click here. > NAML -- View this message in context: http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635212.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.