Thank you both very much! the factor issue was indeed solved by your
modifications, and Jean that last bit of code does exactly what I
need. Perfect!
thanks again
paul

On 2 July 2012 21:34, Jean V Adams [via R]
<ml-node+s789695n4635200...@n4.nabble.com> wrote:
> Thanks for the intrusion!  I have
>         options(stringsAsFactors=FALSE)
> and Paul probably doesn't, so he saw factors where I saw characters.
>
> Paul,
>
> I saw your other note ... try this code
>
> L <- nrow(df)
> # assign a new bin every time chrom changes
> prev.chrom <- c(NA, df$chrom[-L])
> bin1 <- cumsum(is.na(prev.chrom) | df$chrom !=
> levels(df$chrom)[prev.chrom])
>
> # substract the minimum chromStart from each bin
> min.start <- tapply(df$chromStart, bin1, min, na.rm=TRUE)[bin1]
>
> # split bins further if chromStart >= 115341 + min.start
> bin2 <- floor((df$chromStart - min.start) / 115341)
>
> # combine the two bins into one
> df$bin <- interaction(bin1, bin2)
>
> df
>
>
> Jean
>
>
>
> Rui Barradas <[hidden email]> wrote on 07/02/2012 02:24:43 PM:
>
>> Hello,
>>
>> Sorry to intrude, but I think it's a factor issue.
>> Try the changing the disjunction to, (in multiline edit)
>>
>>
>> new.bin <- is.na(prev.chrom) |
>>       df$chrom != levels(df$chrom)[prev.chrom] |
>>       delta.start >= 115341
>>
>> It should work, now.
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 02-07-2012 20:03, pguilha escreveu:
>> > Jean,
>> > It's crazy, I'm still getting 1,2,3,4,5,6 in the bin column.....
>> > Also (this is an unrelated problem i think), unless I've misunderstood
>> > it, I think your code will only create a new bin if the difference
>> > between chromStart at i and i-1 position is >=115341....What I want is
>> > for a new bin to be created each time the difference between
>> > chromStart at i and i-j is >=115341, where 'i-j' corresponds to the
>> > first row of the last bin....Im not sure if I'm being
>> > clear...chromStart values correspond to coordinates along a chromosome
>> > so I want to basically cut up each chromosome into sections/bins of
>> > approximately 115341...
>> >
>> > thanks again for all your efforts with this, they're much appreciated!
>> > Paul
>> >
>> > On 2 July 2012 19:36, Jean V Adams [via R]
>> > <[hidden email]> wrote:
>> >> Paul,
>> >>
>> >> Try this (I changed some of the object names, but the meat of the
> code is
>> >> the same):
>> >>
>> >> df <- data.frame(
>> >>          chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
>> >>          chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
>> >>          chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
>> >>          name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8",
> "MafF_(M8194)",
>
>> >> "ZBTB33", "CTCF"),
>> >>          cumsum = c(10089, 20221, 30354, 40502, 50884, 67016)
>> >>          )
>> >>
>> >> # assign a new bin every time chrom changes and every time chromStart
>> >> changes by 115341 or more
>> >> L <- nrow(df)
>> >> prev.chrom <- c(NA, df$chrom[-L])
>> >> delta.start <- c(NA, df$chromStart[-1] - df$chromStart[-L])
>> >> new.bin <- is.na(prev.chrom) | df$chrom != prev.chrom | delta.start
>>=
>> >> 115341
>> >> df$bin <- cumsum(new.bin)
>> >> df
>> >>
>> >>
>> >> pguilha <[hidden email]> wrote on 07/02/2012 10:23:36 AM:
>> >>
>> >>> Jean, that's exactly what it should be, but yes I copied and pasted
>> >>> from your email so I don't see how I could have introduced an error
> in
>
>> >>> there....
>> >>> paul
>> >>>
>> >>> On 2 July 2012 15:57, Jean V Adams [via R]
>> >>> <[hidden email]> wrote:
>> >>>> Paul,
>> >>>>
>> >>>> Are you submitting the exact code that I included in my previous
>> >> e-mail?
>> >>
>> >>>> When I submit that code, I get this ...
>> >>>>
>> >>>>    chrom chromStart chromEnd         name cumsum bin
>> >>>> 1  chr1      10089    10309       ZBTB33  10089   1
>> >>>> 2  chr1      10132    10536  TAF7_(SQ-8)  20221   1
>> >>>> 3  chr2      10133    10362     Pol2-4H8  30354   2
>> >>>> 4  chr2      10148    10418 MafF_(M8194)  40502   2
>> >>>> 5  chr2     210382   210578       ZBTB33  50884   3
>> >>>> 6  chr2     216132   216352         CTCF  67016   3
>> >>>>
>> >>>> Jean
>> >>>>
>> >>>>
>> >>>> Paul Guilhamon <[hidden email]> wrote on 07/02/2012 08:59:00 AM:
>> >>>>
>> >>>>> Thanks for your reply Jean,
>> >>>>>
>> >>>>> I think your interpretation is correct but when I run your code I
> end
>
>> >>>>> up with the below dataframe and obviously the bins created there
>> >> don't
>> >>
>> >>>>> correspond to a chromStart change of 115341:
>> >>>>>
>> >>>>>    chrom chromStart chromEnd         name cumsum bin
>> >>>>> 1  chr1      10089    10309       ZBTB33  10089   1
>> >>>>> 2  chr1      10132    10536  TAF7_(SQ-8)  20221   2
>> >>>>> 3  chr2      10133    10362     Pol2-4H8  30354   3
>> >>>>> 4  chr2      10148    10418 MafF_(M8194)  40502   4
>> >>>>> 5  chr2     210382   210578       ZBTB33  50884   5
>> >>>>> 6  chr2     216132   216352         CTCF  67016   6
>> >>>>>
>> >>>>> the first two rows should have the same bin number (same chrom,
>> >>>>> <115341 diff), then rows 3&4 should be in another bin (different
>> >> chrom
>> >>
>> >>>>> from rows 1&2, <115341 diff), and rows 5&6 in another one (same
> chrom
>
>> >>>>> but >115341 difference between row 4 and row 5).
>> >>>>>
>> >>>>> it seems the new.bin line of your code isn't quite doing what it
>> >>>>> should but I can't pinpoint the error there...
>> >>>>> Paul
>> >>>>>
>> >>>>>
>> >>>>> On 2 July 2012 14:19, Jean V Adams <[hidden email]> wrote:
>> >>>>>> Paul,
>> >>>>>>
>> >>>>>> My interpretation is that you are trying to assign a new bin
> number
>> >> to
>> >>
>> >>>> a row
>> >>>>>> every time the variable chrom changes and every time the variable
>> >>>> chromStart
>> >>>>>> changes by 115341 or more.  Is that right?  If so, you don't need
> a
>> >>>> loop at
>> >>>>>> all.  Check out the code below.  I made a couple changes to the
>> >>>> all.tf7
>> >>>>>> example data frame so that it would have two changes in bin
> number,
>> >>>> one
>> >>>>
>> >>>>>> based on the chrom variable and one based on the chromStart
>> >> variable.
>> >>>>>>
>> >>>>>> Jean
>> >>>>>>
>> >>>>>> all.tf7 <- data.frame(
>> >>>>>>          chrom = c("chr1", "chr1", "chr2", "chr2", "chr2",
> "chr2"),
>> >>>>>>          chromStart = c(10089, 10132, 10133, 10148, 210382,
> 216132),
>> >>>>>>          chromEnd = c(10309, 10536, 10362, 10418, 210578,
> 216352),
>
>> >>>>>>          name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8",
>> >> "MafF_(M8194)",
>> >>>>>> "ZBTB33", "CTCF"),
>> >>>>>>          cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
>> >>>>>>          bin = rep(NA, 6)
>> >>>>>>          )
>> >>>>>>
>> >>>>>> # assign a new bin every time chrom changes and every time
>> >> chromStart
>> >>>>>> changes by 115341 or more
>> >>>>>> L <- nrow(all.tf7)
>> >>>>>> prev.chrom <- c(NA, all.tf7$chrom[-L])
>> >>>>>> delta.start <- c(NA, all.tf7$chromStart[-1] -
>> >> all.tf7$chromStart[-L])
>> >>
>> >>>>>> new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom |
>> >>>> delta.start >=
>> >>>>
>> >>>>>> 115341
>> >>>>>> all.tf7$bin <- cumsum(new.bin)
>> >>>>>> all.tf7
>> >>>>>>
>> >>>>>>
>> >>>>>> pguilha <[hidden email]> wrote on 07/02/2012 06:25:13 AM:
>> >>>>>>
>> >>>>>>> Hello all,
>> >>>>>>>
>> >>>>>>> I have written a for loop to act on a dataframe with close to
>> >>>> 3million
>> >>>>>>> rows
>> >>>>>>> and 6 columns and I would like to pass it to apply() to speed
> the
>> >>>> process
>> >>>>>>> up
>> >>>>>>> (I let the loop run for 2 days before stopping it and it had
> only
>> >>>> gone
>> >>>>>>> through 200,000 rows) but I am really struggling to find a way
> to
>
>> >>>> pass the
>> >>>>>>> arguments. Below are the loop and the head of the dataframe I am
>> >>>> working
>> >>>>>>> on.
>> >>>>>>> Any hints would be much appreciated, thank you! (I have searched
>> >> for
>> >>
>> >>>> this
>> >>>>
>> >>>>>>> but could not find any other posts doing quite what I want)
>> >>>>>>> Paul
>> >>>>>>>
>> >>>>>>> x<-as.numeric(all.tf7[1,2])
>> >>>>>>> for (i in 2:nrow(all.tf7)) {
>> >>>>>>>    if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
>> >>>>>>> all.tf7[i,6]<-all.tf7[i-1,6]
>> >>>>>>>    else if (all.tf7[i,1]==all.tf7[i-1,1] &
>> >> (all.tf7[i,2]-x)>=115341) {
>> >>>>>>>      all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>> >>>>>>>      x<-as.numeric(all.tf7[i,2]) }
>> >>>>>>>    else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
>> >>>>>>>      all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>> >>>>>>>      x<-as.numeric(all.tf7[i,2]) }
>> >>>>>>> }
>> >>>>>>>
>> >>>>>>> #the aim here is to attribute a bin number to each row so that I
>> >> can
>> >>
>> >>>> then
>> >>>>
>> >>>>>>> split the dataframe according to those bins.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> chrom chromStart chromEnd         name cumsum bin
>> >>>>>>> chr1      10089             10309               ZBTB33  10089 1
>> >>>>>>> chr1      10132             10536      TAF7_(SQ-8)  20221   1
>> >>>>>>> chr1      10133             10362            Pol2-4H8  30354   1
>> >>>>>>> chr1      10148             10418  MafF_(M8194)  40502   1
>> >>>>>>> chr1      10382             10578                ZBTB33  50884 1
>> >>>>>>> chr1      16132             16352                    CTCF  67016
> 1
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635200.html
> To unsubscribe from apply with multiple conditions, click here.
> NAML


--
View this message in context: 
http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635212.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to