On May 10, 2011, at 9:49 AM, noxyp...@gmail.com wrote:

On Tue, May 10, 2011 at 3:09 PM, David Winsemius <dwinsem...@comcast.net >
wrote:

On May 10, 2011, at 3:18 AM, noxyp...@gmail.com wrote:

On Fri, May 6, 2011 at 7:41 PM, David Winsemius <dwinsem...@comcast.net >
wrote:

On May 6, 2011, at 11:35 AM, Pete Pete wrote:


Gabor Grothendieck wrote:

On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete &lt;noxyp...@gmail.com&gt;
wrote:

Hi,
consider the following two dataframes:
x1=c("232","3454","3455","342","13")
x2=c("1","1","1","0","0")
data1=data.frame(x1,x2)

y1=c("232","232","3454","3454","3455","342","13","13","13","13")
y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
data2=data.frame(y1,y2)

I need a new column in dataframe data1 (x3), which is either 0 or 1 depending if the value "E1" in y2 of data2 is true while x1=y1. The
result
of data1 should look like this:
x1     x2 x3
1 232   1   1
2 3454 1   1
3 3455 1   0
4 342   0   0
5 13     0   1

I think a SQL command could help me but I am too inexperienced with
it
to
get there.


Try this:

library(sqldf)
sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2
d2
on (x1 = y1) group by x1, x2 order by d1.rowid")

x1 x2 x3
1  232  1  1
2 3454  1  1
3 3455  1  0
4  342  0  0
5   13  0  1


snipped Gabor's sig

That works pretty cool but I need to automate this a bit more. Consider
the
following example:

list1=c("A01","B04","A64","G84","F19")

x1=c("232","3454","3455","342","13")
x2=c("1","1","1","0","0")
data1=data.frame(x1,x2)

y1=c("232","232","3454","3454","3455","342","13","13","13","13")
y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
data2=data.frame(y1,y2)

I want now to creat a loop, which creates for every value in list1 a
new
binary variable in data1. Result should look like:
x1      x2      A01     B04     A64     G84     F19
232     1       0       1       0       0       0
3454    1       0       0       1       0       1
3455    1       0       0       0       0       0
342     0       0       0       0       0       0
13      0       1       0       0       1       1

Loops!?! We don't nee no steenking loops!

xtb <-  with(data2, table(y1,y2))
cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )

   x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
232   232  1   0   0   1   1   0   0   0   0   0
3454 3454  1   0   1   0   0   0   1   0   0   0
3455 3455  1   0   0   0   0   1   0   0   0   0
342   342  0   0   0   0   0   0   0   0   0   1
13     13  0   1   0   0   0   0   1   1   1   0

I am guessing that you were to ... er, busy? ... to complete the table?

--

David Winsemius, MD
West Hartford, CT



Thanks a lot! Pretty simple. I am so much used to SQLDF right now.

So how would you handle more complicated strings like that:
y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13")
y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44
C35","F68","G84","F19","A01")
data2=data.frame(y1,y2)

Where you want to extract for instance all "A01" from the strings?

I think you need either to explain what you want in more words of the
English language or to offer an example of the desired output. I suspect
you
did not want something as simple as this:

A01.instances <- grep("A01" , data2$y2)
A01.instances
[1]  2 11
data2[A01.instances, ]
 y1          y2
2  232 B04 A01 F19
11  13         A01

Or maybe you did?

--
David Winsemius, MD
West Hartford, CT



No, that was not my intention. Consider the following example:

list1=c("A01","B04","A64","G84","F19") # My "substrings" to screen for in
data2


x1=c("232","3454","3455","342","13")
x2=c("1","1","1","0","0")
data1=data.frame(x1,x2) # Target dataframe where the 5 new binary variables
(namely from list1) are added


y1=c("232","232", "232", "3454","3454","3455","342","13","13","13","13")
y2=c("E133","B04 A01A F194","B04","F19","A642 G84 A05","E223","H44
C35","F68","G84","F19","A01")
data2=data.frame(y1,y2) # Dataframe to be screen by list1


Result should look like this:

x1      x2      A01     B04     A64     G84     F19
232     1       1       1       0       0       0
3454    1       0       0       1       0       1
3455    1       0       0       0       0       0
342     0       0       0       0       0       0
13      0       1       0       0       1       1

And how were we supposed to figure out that 3454/G84 was not supposed to be counted?

OK. let's assume you were just sloppy ... then build a new data.frame:

> data4 <- data.frame(y1=rep(data3[,1],
sapply (strsplit(gsub("\\\n"," ",data3$y2), " "), length) ), y2 = unlist (strsplit(gsub("\\\n"," ",data3$y2), " ") ) + )
> data4
    y1  y2
1   232 E13
2   232 B04
3   232 A01
4   232 F19
5   232 B04
6  3454 F19
7  3454 A64
8  3454 G84
9  3454 A05
10 3455 E22
11  342 H44
12  342 C35
13   13 F68
14   13 G84
15   13 F19
16   13 A01



--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to