Thank you very much! It works now perfectly. I even extended it to be
able to apply it to the whole dataset:
data<-read.delim("mhc_data.txt", stringsAsFactors=FALSE)
lettermatch <- function(a, b) {
tb <- merge(as.data.frame(table(strsplit(a, ""))),
as.data.frame(table(strsplit(b, ""))), by="Var1")
sum(apply(tb[-1], 1, min))
}
output<-matrix(ncol=(ncol(data)-1),nrow=nrow(data)/2)
sim<-rep(0, nrow(data)/2)
for (y in 2:(ncol(data))) {
for (x in 1:(nrow(data)/2)) {
a <- data[(2*x-1),y] # odd rows
b <- data[(2*x),y] # even rows
sim[x]<-(lettermatch(a,b))
}
output[,y-1]<-sim
}
colnames(output)<-c(names(data[2:length(names(data))]))
rownames(output)<-c(1:(nrow(data)/2))
output
Laetitia
Am 12.01.2010 um 18:31 schrieb Peter Ehlers:
Laetitia,
I was just responding to your comment that "R complains
about a syntax error". But I realize now that "2x" would
probably cause an "unexpected symbol" error.
Here's what I get when I run your loop; what do you get?
for (x in 1:(nrow(dat)-1)) {
+ a <- as.character(dat[(2x-1),1])
Error: unexpected symbol in:
"for (x in 1:(nrow(dat)-1)) {
a <- as.character(dat[(2x"
b <- as.character(dat[(2x),1])
Error: unexpected symbol in " b <- as.character(dat[(2x"
lettermatch(a,b)
Error in strsplit(a, "") : object 'a' not found
}
Error: unexpected '}' in "}"
and here's what I get when I fix the obvious syntax
error:
for (x in 1:(nrow(dat)-1)) {
+ a <- as.character(dat[(2*x-1),1])
+ b <- as.character(dat[(2*x),1])
+ lettermatch(a,b)
+ }
Error in fix.by(by.x, x) : 'by' must specify valid column(s)
That leaves two problems:
1) you're looking at the wrong column in dat[,1]; that
should be dat[,2], etc.
2) that error message indicates that your index variable (x)
gets to invalid values.
Try this:
for (x in 1:(nrow(dat)/2)) {
a <- dat[(2*x-1),2] # odd rows
b <- dat[(2*x),2] # even rows
print(lettermatch(a,b))
}
You don't need the as.character() if you have character data.
Always do a str(dat) before you do any analysis.
-Peter Ehlers
Laetitia Schmid wrote:
Dear Peter,
thank you for the suggestion.
Unfortunately the star did not help. Did it work for you? For me it
seems incomplete somehow.
Laetitia
________________________________________
From: Peter Ehlers [ehl...@ucalgary.ca]
Sent: Tuesday, January 12, 2010 09:54 AM
To: Laetitia Schmid
Cc: Steve Lianoglou; r-help@r-project.org
Subject: Re: [R] apply a function down each column
See inline below.
Laetitia Schmid wrote:
Dear Steve,
my solution looks like it would work, but it does not.
I attached a text file with an extract of my data. Maybe you can
try it
yourself. I want to compare C1 with M1, C2 with M2, C3 with M3,,,
for
each column.
I do not really know what the problem is. R complains about a
syntax error.
The function I am applying counts the common strings between the
two.
Greg Hirson helped me to write it.
lettermatch <- function(a, b) {
tb <- merge(as.data.frame(table(strsplit(a, ""))),
as.data.frame(table(strsplit(b, ""))), by="Var1")
sum(apply(tb[-1], 1, min))
}
For example for the second column I tried:
for (x in 1:(nrow(dat)-1)) {
a <- as.character(dat[(2x-1),1])
Shouldn't that be 2*x-1??
-Peter Ehlers
b <- as.character(dat[(2x),1])
lettermatch(a,b)
}
or
a <- as.character(dat[seq(1, nrow(dat), by=2),2])
b <- as.character(dat[seq(2, nrow(dat), by=2), 2])
all.results <- lettermatch(a,b)
With "dat<-read.delim("data_lgs.txt",stringsAsFactors=FALSE)" I can
leave the "as.character" away in the formula above.
Laetitia
Individuals Seq1 Seq2 Seq3 Seq4
C1 GGGG AATT CCGG CTTT
M1 GGGG AAAA GGGG GGGG
C2 GGGG AATT CCGG CTTT
M2 AGGG AACT CCGG CGTT
C3 AGGG AACT CCGG CGTT
M3 AGGG AACT CCGG CGTT
C4 GGGG AATT CCGG CCTT
M4 GGGG AAAT CGGG CTTT
C5 AGGG ACTT CCCG CTTT
M5 AGGG CTTT CCCC CCTT
C6 AGGG CTTT CCCC CCTT
M6 AAAG CCTT CCCC CTTT
C7 AAAG ACCC CCCG GTTT
M7 AAGG AACC CCGG TTTT
C8 GGGG AATT CCGG CCTT
M8 GGGG AATT CCGG CCTT
C9 GGGG AAAA GGGG TTTT
M9 GGGG AAAA GGGG TTTT
C11 AGGG AAAC CGGG GGTT
M11 GGGG AATT CCGG CCTT
Am 11.01.2010 um 15:18 schrieb Steve Lianoglou:
Hi,
On Mon, Jan 11, 2010 at 8:41 AM, Laetitia Schmid <laeti...@gmt.su.se
>
wrote:
Hello World,
I have a function that makes pairwise comparisons between two
strings. I would like to apply this function to my data (which
consists of columns with different strings) in the way that it
compares the first with the second entry, and then the third
with the
fourth, and then the fifth with the sixth, and so on down each
column...
So (2x-1) and (2x) would be the different entries to be compared!
dat= my data:
for the first column: compare dat[(2x-1),1] with dat[(2x),1] and x
would be 1:i, i=length(dat[,1])
I think the best way to do that is a loop:
a <- as.character(dat[(2x-1),1])
b <- as.character(dat[(2x),1])
for (i in 1:length(dat[,1]) my_function(a, b))
Can somebody help me to apply a function with a loop in the way I
want to a column?
It seems as if you got it already, don't you?
for (x in 1:(nrow(dat)-1)) {
a <- dat[(2x-1),1]
b <- dat[(2x), 1]
my_function(a,b)
}
Is there a specification of "tapply" for that?
I don't think so, but depending on what you want to do, the size of
your data, and the amount of RAM you have, it might be faster to
compare everything "at once" (assuming `my_function` can be
vectorized), for instance:
a <- dat[seq(1, nrow(dat), by=2),1]
b <- dat[seq(2, nrow(dat), by=2), 1]
all.results <- my_function(a,b)
Also, as an aside, I see you keep calling "as.character" on your
data
when you extract it from your data.frame. Is your data being
converted
to factors? You can look to set stringsAsFactors=FALSE if this is
the
case and you are reading in data using read.table/delim/etc (see:
?read.table)
Hope that helps,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Peter Ehlers
University of Calgary
403.202.3921
--
Peter Ehlers
University of Calgary
403.202.3921
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.