Hello,
Sorry to intrude, but I think it's a factor issue.
Try the changing the disjunction to, (in multiline edit)
new.bin <- is.na(prev.chrom) |
df$chrom != levels(df$chrom)[prev.chrom] |
delta.start >= 115341
It should work, now.
Hope this helps,
Rui Barradas
Em 02-07-2012 20:03, pguilha escreveu:
Jean,
It's crazy, I'm still getting 1,2,3,4,5,6 in the bin column.....
Also (this is an unrelated problem i think), unless I've misunderstood
it, I think your code will only create a new bin if the difference
between chromStart at i and i-1 position is >=115341....What I want is
for a new bin to be created each time the difference between
chromStart at i and i-j is >=115341, where 'i-j' corresponds to the
first row of the last bin....Im not sure if I'm being
clear...chromStart values correspond to coordinates along a chromosome
so I want to basically cut up each chromosome into sections/bins of
approximately 115341...
thanks again for all your efforts with this, they're much appreciated!
Paul
On 2 July 2012 19:36, Jean V Adams [via R]
<ml-node+s789695n4635185...@n4.nabble.com> wrote:
Paul,
Try this (I changed some of the object names, but the meat of the code is
the same):
df <- data.frame(
chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)",
"ZBTB33", "CTCF"),
cumsum = c(10089, 20221, 30354, 40502, 50884, 67016)
)
# assign a new bin every time chrom changes and every time chromStart
changes by 115341 or more
L <- nrow(df)
prev.chrom <- c(NA, df$chrom[-L])
delta.start <- c(NA, df$chromStart[-1] - df$chromStart[-L])
new.bin <- is.na(prev.chrom) | df$chrom != prev.chrom | delta.start >=
115341
df$bin <- cumsum(new.bin)
df
pguilha <[hidden email]> wrote on 07/02/2012 10:23:36 AM:
Jean, that's exactly what it should be, but yes I copied and pasted
from your email so I don't see how I could have introduced an error in
there....
paul
On 2 July 2012 15:57, Jean V Adams [via R]
<[hidden email]> wrote:
Paul,
Are you submitting the exact code that I included in my previous
e-mail?
When I submit that code, I get this ...
chrom chromStart chromEnd name cumsum bin
1 chr1 10089 10309 ZBTB33 10089 1
2 chr1 10132 10536 TAF7_(SQ-8) 20221 1
3 chr2 10133 10362 Pol2-4H8 30354 2
4 chr2 10148 10418 MafF_(M8194) 40502 2
5 chr2 210382 210578 ZBTB33 50884 3
6 chr2 216132 216352 CTCF 67016 3
Jean
Paul Guilhamon <[hidden email]> wrote on 07/02/2012 08:59:00 AM:
Thanks for your reply Jean,
I think your interpretation is correct but when I run your code I end
up with the below dataframe and obviously the bins created there
don't
correspond to a chromStart change of 115341:
chrom chromStart chromEnd name cumsum bin
1 chr1 10089 10309 ZBTB33 10089 1
2 chr1 10132 10536 TAF7_(SQ-8) 20221 2
3 chr2 10133 10362 Pol2-4H8 30354 3
4 chr2 10148 10418 MafF_(M8194) 40502 4
5 chr2 210382 210578 ZBTB33 50884 5
6 chr2 216132 216352 CTCF 67016 6
the first two rows should have the same bin number (same chrom,
<115341 diff), then rows 3&4 should be in another bin (different
chrom
from rows 1&2, <115341 diff), and rows 5&6 in another one (same chrom
but >115341 difference between row 4 and row 5).
it seems the new.bin line of your code isn't quite doing what it
should but I can't pinpoint the error there...
Paul
On 2 July 2012 14:19, Jean V Adams <[hidden email]> wrote:
Paul,
My interpretation is that you are trying to assign a new bin number
to
a row
every time the variable chrom changes and every time the variable
chromStart
changes by 115341 or more. Is that right? If so, you don't need a
loop at
all. Check out the code below. I made a couple changes to the
all.tf7
example data frame so that it would have two changes in bin number,
one
based on the chrom variable and one based on the chromStart
variable.
Jean
all.tf7 <- data.frame(
chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8",
"MafF_(M8194)",
"ZBTB33", "CTCF"),
cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
bin = rep(NA, 6)
)
# assign a new bin every time chrom changes and every time
chromStart
changes by 115341 or more
L <- nrow(all.tf7)
prev.chrom <- c(NA, all.tf7$chrom[-L])
delta.start <- c(NA, all.tf7$chromStart[-1] -
all.tf7$chromStart[-L])
new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom |
delta.start >=
115341
all.tf7$bin <- cumsum(new.bin)
all.tf7
pguilha <[hidden email]> wrote on 07/02/2012 06:25:13 AM:
Hello all,
I have written a for loop to act on a dataframe with close to
3million
rows
and 6 columns and I would like to pass it to apply() to speed the
process
up
(I let the loop run for 2 days before stopping it and it had only
gone
through 200,000 rows) but I am really struggling to find a way to
pass the
arguments. Below are the loop and the head of the dataframe I am
working
on.
Any hints would be much appreciated, thank you! (I have searched
for
this
but could not find any other posts doing quite what I want)
Paul
x<-as.numeric(all.tf7[1,2])
for (i in 2:nrow(all.tf7)) {
if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
all.tf7[i,6]<-all.tf7[i-1,6]
else if (all.tf7[i,1]==all.tf7[i-1,1] &
(all.tf7[i,2]-x)>=115341) {
all.tf7[i,6]<-(all.tf7[i-1,6]+1)
x<-as.numeric(all.tf7[i,2]) }
else if (all.tf7[i,1]!=all.tf7[i-1,1]) {
all.tf7[i,6]<-(all.tf7[i-1,6]+1)
x<-as.numeric(all.tf7[i,2]) }
}
#the aim here is to attribute a bin number to each row so that I
can
then
split the dataframe according to those bins.
chrom chromStart chromEnd name cumsum bin
chr1 10089 10309 ZBTB33 10089 1
chr1 10132 10536 TAF7_(SQ-8) 20221 1
chr1 10133 10362 Pol2-4H8 30354 1
chr1 10148 10418 MafF_(M8194) 40502 1
chr1 10382 10578 ZBTB33 50884 1
chr1 16132 16352 CTCF 67016 1
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
________________________________
If you reply to this email, your message will be added to the discussion
below:
http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635185.html
To unsubscribe from apply with multiple conditions, click here.
NAML
--
View this message in context:
http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098p4635189.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.