Re: [R] Creating binary variable depending on strings of two dataframes

David Winsemius Tue, 10 May 2011 07:07:31 -0700


On May 10, 2011, at 9:49 AM, noxyp...@gmail.com wrote:

On Tue, May 10, 2011 at 3:09 PM, David Winsemius <dwinsem...@comcast.net>

wrote:

On May 10, 2011, at 3:18 AM, noxyp...@gmail.com wrote:
On Fri, May 6, 2011 at 7:41 PM, David Winsemius <dwinsem...@comcast.net>
wrote:
On May 6, 2011, at 11:35 AM, Pete Pete wrote:
Gabor Grothendieck wrote:
On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete<noxyp...@gmail.com>
wrote:
Hi,
consider the following two dataframes:
x1=c("232","3454","3455","342","13")
x2=c("1","1","1","0","0")
data1=data.frame(x1,x2)

y1=c("232","232","3454","3454","3455","342","13","13","13","13")
y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
data2=data.frame(y1,y2)
I need a new column in dataframe data1 (x3), which is either 0or 1depending if the value "E1" in y2 of data2 is true whilex1=y1. The
result
of data1 should look like this:
x1     x2 x3
1 232   1   1
2 3454 1   1
3 3455 1   0
4 342   0   0
5 13     0   1
I think a SQL command could help me but I am too inexperiencedwith

it

to
get there.


Try this:

library(sqldf)
sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 leftjoin data2
d2
on (x1 = y1) group by x1, x2 order by d1.rowid")


x1 x2 x3
1  232  1  1
2 3454  1  1
3 3455  1  0
4  342  0  0
5   13  0  1

snipped Gabor's sig

That works pretty cool but I need to automate this a bit more.Consider
the
following example:

list1=c("A01","B04","A64","G84","F19")

x1=c("232","3454","3455","342","13")
x2=c("1","1","1","0","0")
data1=data.frame(x1,x2)

y1=c("232","232","3454","3454","3455","342","13","13","13","13")
y2=c("E13","B04","F19","A64","E22","H44","F68","G84","F19","A01")
data2=data.frame(y1,y2)
I want now to creat a loop, which creates for every value inlist1 a

new

binary variable in data1. Result should look like:
x1      x2      A01     B04     A64     G84     F19
232     1       0       1       0       0       0
3454    1       0       0       1       0       1
3455    1       0       0       0       0       0
342     0       0       0       0       0       0
13      0       1       0       0       1       1


Loops!?! We don't nee no steenking loops!

xtb <-  with(data2, table(y1,y2))
cbind(data1, xtb[match(data1$x1, rownames(xtb)), ] )


   x1 x2 A01 A64 B04 E13 E22 F19 F68 G84 H44
232   232  1   0   0   1   1   0   0   0   0   0
3454 3454  1   0   1   0   0   0   1   0   0   0
3455 3455  1   0   0   0   0   1   0   0   0   0
342   342  0   0   0   0   0   0   0   0   0   1
13     13  0   1   0   0   0   0   1   1   1   0

I am guessing that you were to ... er, busy? ... to complete thetable?


--

David Winsemius, MD
West Hartford, CT


Thanks a lot! Pretty simple. I am so much used to SQLDF right now.

So how would you handle more complicated strings like that:

y1=c("232","232", "232","3454","3454","3455","342","13","13","13","13")

y2=c("E13","B04 A01 F19","B04","F19","A64 G84 A05","E22","H44
C35","F68","G84","F19","A01")
data2=data.frame(y1,y2)

Where you want to extract for instance all "A01" from the strings?


I think you need either to explain what you want in more words of the

English language or to offer an example of the desired output. Isuspect

you

did not want something as simple as this:

A01.instances <- grep("A01" , data2$y2)
A01.instances

[1]  2 11

data2[A01.instances, ]

 y1          y2
2  232 B04 A01 F19
11  13         A01

Or maybe you did?

--
David Winsemius, MD
West Hartford, CT


No, that was not my intention. Consider the following example:

list1=c("A01","B04","A64","G84","F19") # My "substrings" to screenfor in

data2


x1=c("232","3454","3455","342","13")
x2=c("1","1","1","0","0")
data1=data.frame(x1,x2) # Target dataframe where the 5 new binaryvariables
(namely from list1) are added
y1=c("232","232", "232","3454","3454","3455","342","13","13","13","13")
y2=c("E133","B04 A01A F194","B04","F19","A642 G84 A05","E223","H44
C35","F68","G84","F19","A01")
data2=data.frame(y1,y2) # Dataframe to be screen by list1


Result should look like this:

x1      x2      A01     B04     A64     G84     F19

232     1       1       1       0       0       0
3454    1       0       0       1       0       1
3455    1       0       0       0       0       0
342     0       0       0       0       0       0
13      0       1       0       0       1       1

And how were we supposed to figure out that 3454/G84 was not supposedto be counted?


OK. let's assume you were just sloppy ... then build a new data.frame:

> data4 <- data.frame(y1=rep(data3[,1],

sapply (strsplit(gsub("\\\n"," ",data3$y2), " "),length) ),y2 = unlist (strsplit(gsub("\\\n"," ",data3$y2)," ") ) + )

> data4
    y1  y2
1   232 E13
2   232 B04
3   232 A01
4   232 F19
5   232 B04
6  3454 F19
7  3454 A64
8  3454 G84
9  3454 A05
10 3455 E22
11  342 H44
12  342 C35
13   13 F68
14   13 G84
15   13 F19
16   13 A01



--
David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating binary variable depending on strings of two dataframes

Reply via email to