Jean,
It's crazy, I'm still getting 1,2,3,4,5,6 in the bin column.....
Also (this is an unrelated problem i think), unless I've misunderstood
it, I think your code will only create a new bin if the difference
between chromStart at i and i-1 position is >=115341....What I want is
for a new bin to be created each time the difference between
chromStart at i and i-j is >=115341, where 'i-j' corresponds to the
first row of the last bin....Im not sure if I'm being
clear...chromStart values correspond to coordinates along a chromosome
so I want to basically cut up each chromosome into sections/bins of
approximately 115341...

thanks again for all your efforts with this, they're much appreciated!
Paul

On 2 July 2012 19:36, Jean V Adams [via R]
<ml-node+s789695n4635185...@n4.nabble.com> wrote:
> Paul,
>
> Try this (I changed some of the object names, but the meat of the code is
> the same):
>
> df <- data.frame(
>         chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
>         chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
>         chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
>         name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)",
> "ZBTB33", "CTCF"),
>         cumsum = c(10089, 20221, 30354, 40502, 50884, 67016)
>         )
>
> # assign a new bin every time chrom changes and every time chromStart
> changes by 115341 or more
> L <- nrow(df)
> prev.chrom <- c(NA, df$chrom[-L])
> delta.start <- c(NA, df$chromStart[-1] - df$chromStart[-L])
> new.bin <- is.na(prev.chrom) | df$chrom != prev.chrom | delta.start >=
> 115341
> df$bin <- cumsum(new.bin)
> df
>
>
> pguilha <[hidden email]> wrote on 07/02/2012 10:23:36 AM:
>
>> Jean, that's exactly what it should be, but yes I copied and pasted
>> from your email so I don't see how I could have introduced an error in
>> there....
>> paul
>>
>> On 2 July 2012 15:57, Jean V Adams [via R]
>> <[hidden email]> wrote:
>> > Paul,
>> >
>> > Are you submitting the exact code that I included in my previous
> e-mail?
>
>> > When I submit that code, I get this ...
>> >
>> >   chrom chromStart chromEnd         name cumsum bin
>> > 1  chr1      10089    10309       ZBTB33  10089   1
>> > 2  chr1      10132    10536  TAF7_(SQ-8)  20221   1
>> > 3  chr2      10133    10362     Pol2-4H8  30354   2
>> > 4  chr2      10148    10418 MafF_(M8194)  40502   2
>> > 5  chr2     210382   210578       ZBTB33  50884   3
>> > 6  chr2     216132   216352         CTCF  67016   3
>> >
>> > Jean
>> >
>> >
>> > Paul Guilhamon <[hidden email]> wrote on 07/02/2012 08:59:00 AM:
>> >
>> >> Thanks for your reply Jean,
>> >>
>> >> I think your interpretation is correct but when I run your code I end
>> >> up with the below dataframe and obviously the bins created there
> don't
>
>> >> correspond to a chromStart change of 115341:
>> >>
>> >>   chrom chromStart chromEnd         name cumsum bin
>> >> 1  chr1      10089    10309       ZBTB33  10089   1
>> >> 2  chr1      10132    10536  TAF7_(SQ-8)  20221   2
>> >> 3  chr2      10133    10362     Pol2-4H8  30354   3
>> >> 4  chr2      10148    10418 MafF_(M8194)  40502   4
>> >> 5  chr2     210382   210578       ZBTB33  50884   5
>> >> 6  chr2     216132   216352         CTCF  67016   6
>> >>
>> >> the first two rows should have the same bin number (same chrom,
>> >> <115341 diff), then rows 3&4 should be in another bin (different
> chrom
>
>> >> from rows 1&2, <115341 diff), and rows 5&6 in another one (same chrom
>> >> but >115341 difference between row 4 and row 5).
>> >>
>> >> it seems the new.bin line of your code isn't quite doing what it
>> >> should but I can't pinpoint the error there...
>> >> Paul
>> >>
>> >>
>> >> On 2 July 2012 14:19, Jean V Adams <[hidden email]> wrote:
>> >> > Paul,
>> >> >
>> >> > My interpretation is that you are trying to assign a new bin number
> to
>
>> > a row
>> >> > every time the variable chrom changes and every time the variable
>> > chromStart
>> >> > changes by 115341 or more.  Is that right?  If so, you don't need a
>> > loop at
>> >> > all.  Check out the code below.  I made a couple changes to the
>> > all.tf7
>> >> > example data frame so that it would have two changes in bin number,
>> > one
>> >
>> >> > based on the chrom variable and one based on the chromStart
> variable.
>> >> >
>> >> > Jean
>> >> >
>> >> > all.tf7 <- data.frame(
>> >> >         chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
>> >> >         chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
>> >> >         chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
>> >> >         name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8",
> "MafF_(M8194)",
>> >> > "ZBTB33", "CTCF"),
>> >> >         cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
>> >> >         bin = rep(NA, 6)
>> >> >         )
>> >> >
>> >> > # assign a new bin every time chrom changes and every time
> chromStart
>> >> > changes by 115341 or more
>> >> > L <- nrow(all.tf7)
>> >> > prev.chrom <- c(NA, all.tf7$chrom[-L])
>> >> > delta.start <- c(NA, all.tf7$chromStart[-1] -
> all.tf7$chromStart[-L])
>
>> >> > new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom |
>> > delta.start >=
>> >
>> >> > 115341
>> >> > all.tf7$bin <- cumsum(new.bin)
>> >> > all.tf7
>> >> >
>> >> >
>> >> > pguilha <[hidden email]> wrote on 07/02/2012 06:25:13 AM:
>> >> >
>> >> >> Hello all,
>> >> >>
>> >> >> I have written a for loop to act on a dataframe with close to
>> > 3million
>> >> >> rows
>> >> >> and 6 columns and I would like to pass it to apply() to speed the
>> > process
>> >> >> up
>> >> >> (I let the loop run for 2 days before stopping it and it had only
>> > gone
>> >> >> through 200,000 rows) but I am really struggling to find a way to
>> > pass the
>> >> >> arguments. Below are the loop and the head of the dataframe I am
>> > working
>> >> >> on.
>> >> >> Any hints would be much appreciated, thank you! (I have searched
> for
>
>> > this
>> >
>> >> >> but could not find any other posts doing quite what I want)
>> >> >> Paul
>> >> >>
>> >> >> x<-as.numeric(all.tf7[1,2])
>> >> >> for (i in 2:nrow(all.tf7)) {
>> >> >>   if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
>> >> >> all.tf7[i,6]<-all.tf7[i-1,6]
>> >> >>   else if (all.tf7[i,1]==all.tf7[i-1,1] &
> (all.tf7[i,2]-x)>=115341) {
>> >> >>     all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>> >> >>     x<-as.numeric(all.tf7[i,2]) }
>> >> >>   else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
>> >> >>     all.tf7[i,6]<-(all.tf7[i-1,6]+1)
>> >> >>     x<-as.numeric(all.tf7[i,2]) }
>> >> >> }
>> >> >>
>> >> >> #the aim here is to attribute a bin number to each row so that I
> can
>
>> > then
>> >
>> >> >> split the dataframe according to those bins.
>> >> >>
>> >> >>
>> >> >> chrom chromStart chromEnd         name cumsum bin
>> >> >> chr1      10089             10309               ZBTB33  10089   1
>> >> >> chr1      10132             10536      TAF7_(SQ-8)  20221   1
>> >> >> chr1      10133             10362            Pol2-4H8  30354   1
>> >> >> chr1      10148             10418  MafF_(M8194)  40502   1
>> >> >> chr1      10382             10578                ZBTB33  50884   1
>> >> >> chr1      16132             16352                    CTCF  67016 1
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635185.html
> To unsubscribe from apply with multiple conditions, click here.
> NAML


--
View this message in context: 
http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635189.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to