Sorry---I thought it worked but I think I am actually definitely doing
something wrong...

The problem might be that there are NA's and there are also duplicated
values...My fault. I can't figure out what is going wrong...
I'll be more thorough and modify the two df to mirror more what I have to
explain better:

df1 is:

Name Position location
francesca A 75
maria A 75
cristina B 36

And df2 is:

location Country
75 UK
75 Italy
56 France
56 Austria

So I thought I had to first eliminate the duplicates like this:
df1_unique<-subset(df1, !duplicated(location))
df2_unique<-subset(df2, !duplicated(location))

After doing this I get:

df1 :

Name Position location
francesca A 75
cristina B 36

And df2:

location Country
75 UK
56 France

And I would like to match on "Location" and the output to tell me which
records are matching in df1 and not in df2, the ones matching in both, and
the ones which are in df2 but are not matching in df1...

Name Position Location Match
francesca A 75 1
cristina B 36 0

As William suggested,


df12 <- merge(df1, cbind(df2, fromDF2=TRUE), all.x=TRUE, by="location")
df12$Match <- !is.na(df12$fromDF2)
new_common<- new[which(new$Match==TRUE),]

Would give me the records that are matching, which should be correct, but I
am not getting the correct value for the non-shared elements (the variants
that are in the df2 but not indf1):
df2_only <- subset(df1_unique, !(location %in% df2_unique))
df2_only<- df2_unique[-which(df2_unique$location %in% df1_unique$location),]


Neither of these work and give me wrong records...
My questions are:

1. How do I calculate the records from df2 which are NOT in df1?
2.Do I need to eliminate the duplictaes (or is there a way to record where
they came from)?

Any help is very appreciated...
THANK YOU very much!

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to