I think I am using the improved version of setdiff(...) that handles 
data.frames, so I think some odd behavior was expected but this one is escaping 
me.  

It appears that the the addition of duplicate entries is not caught by the 
setdiff(...).  Is this expected behavior? 

If so, is there another method or approach that should be used to identify 
duplicate row entries between two different data frames? 

Thanks in advance for any feedback. 

Test1_DF<-data.frame(HouseSize=c(1:100))
Test2_DF<-rbind(Test1_DF, Test1_DF)
setdiff(Test1_DF, Test2_DF)
integer(0)
setdiff(Test2_DF, Test1_DF)
integer(0)

However, 
Test3_DF<-data.frame(HouseSize=c(1:25))
setdiff(Test1_DF, Test3_DF)
 [1]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41
[17]  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57
[33]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73
[49]  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
[65]  90  91  92  93  94  95  96  97  98  99 100

setdiff(Test3_DF, Test1_DF)
integer(0)

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to