On Oct 7, 2011, at 9:34 AM, jdanielnd wrote:
Hello,
I am having some problems to use the 'merge' function. I'm not sure
if I got
its working right.
What I want to do is:
1) Suppose I have a dataframe like:
height width
1 1.1 2.3
2 2.1 2.5
3 1.8 1.9
4 1.6 2.1
5 1.8 2.4
2) And I generate a second dataframe sampled from this one, like:
height width
1 1.1 2.3
3 1.8 1.9
5 1.8 2.4
3) Next, I add a new variable from this dataframe:
height width color
1 1.1 2.3 red
3 1.8 1.9 red
5 1.8 2.4 blue
4) So, I want to merge those dataframes, so that the new variable,
color, is
binded to the first dataframe. Of course some cases won't have value
for it,
since I generated this variable in a smaller dataframe. In those
cases I
want the value to be NA. The result dataframe should be:
height width color
1 1.1 2.3 red
2 2.1 2.5 NA
3 1.8 1.9 red
4 1.6 2.1 NA
5 1.8 2.4 blue
I have written some codes, but they're not working properly. The new
variable has its values mixed up, and they do not correspond to its
row.names.
# Generate the first dataframe
data1 <- data.frame(height=rnorm(20,3,0.2),width=rnorm(20,2,0.5))
# Sample a smaller dataframe from data1
data2 <- data1[sample(1:20,15,replace=F),]
# Generate the new variable
color <- sample(c("red","blue"),15,replace=T)
# Bind the new variable to data2
data2 <- cbind(data2, color)
# Merge the data1 and data2$color by row.names, and force it to has
the same
values that data1. Next it generates a new dataframe where column 1
is the
row.name, and then sort it by the row.name from data1.
data.frame(merge(data1,data2$color, by=0,
all.x=T),row.names=1)[row.names(data1),]
I'm not sure what am I doing wrong.
I'm not sure what you want. You get the rownames with this:
> str( merge( data1, data2$color, by=0, all.x=T) )
'data.frame': 20 obs. of 4 variables:
$ Row.names:Class 'AsIs' chr [1:20] "1" "10" "11" "12" ...
$ height : num 3.02 2.9 2.93 2.87 2.95 ...
$ width : num 1.7 1.85 1.51 2.14 2.22 ...
$ y : Factor w/ 2 levels "blue","red": 1 2 1 2 1 1 1 NA NA
NA ...
If all you want is the original order then just resort:
newdat <- merge( data1, data2$color, by=0, all.x=T)
newdat[order(newdat$Row.names), ]
I checked to see if the Row.names were correct by also examining
merge( cbind(rownames(data1), data1),
data2$color,
by=0, all.x=T)
Can anyone see where the mistake is?
Thank you!
Cheers,
Joao D.
--
View this message in context:
http://r.789695.n4.nabble.com/Merge-dataframes-tp3882222p3882222.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.