Jay, 

Thanks again for all your help.  

I have ended up with something similar that appears to work and truly does 
provide the difference of two data frames including all the duplicate rows that 
may be removed due to filtering.  

Thanks again as this will be very helpful to me going forward as the data I 
receive often has duplicate rows that I filter out but want to double check 
that it is filtered out. 


Entry_DF<-read.csv("RSetDiffEntry.csv", header = TRUE)

EntryFiltered_DF<-subset(Entry_DF, !duplicated(Entry_DF))
EntryFiltered_DF<-subset(EntryFiltered_DF, 
!(EntryFiltered_DF$CostPerSquareFoot==0))
EntryFiltered_DF<-subset(EntryFiltered_DF, EntryFiltered_DF$CostPerSquareFoot>0)
EntryFiltered_DF<-subset(EntryFiltered_DF, 
EntryFiltered_DF$CostPerSquareFoot<300)

library("prob")
setDiff_DF<-setdiff(Entry_DF, EntryFiltered_DF)


DuplicateRows_DF<-subset(Entry_DF, duplicated(Entry_DF))


DesiredDFDiff_DF<-rbind(DuplicateRows_DF, setDiff_DF)

DesiredDFDiff_DF




--- On Sat, 5/30/09, G. Jay Kerns <gke...@ysu.edu> wrote:

> From: G. Jay Kerns <gke...@ysu.edu>
> Subject: Re: setdiff bizarre (was: odd behavior out of setdiff)
> To: "Jason Rupert" <jasonkrup...@yahoo.com>
> Cc: "David Winsemius" <dwinsem...@comcast.net>, "r-help@r-project.org" 
> <r-help@r-project.org>
> Date: Saturday, May 30, 2009, 5:19 PM
> Jason,
> 
> (moved back to R-help)
> 
> On Sat, May 30, 2009 at 3:30 PM, Jason Rupert <jasonkrup...@yahoo.com>
> wrote:
> >
> > Jay,
> >
> >
> > I really appreciate all your help help.
> >
> > I posted to Nabble an R file and input CSV files more
> accurately demonstrating what I am seeing and the output I
> desire to achieve when I difference two dataframes.
> > http://n2.nabble.com/Support-SetDiff-Discussion-Items...-td2999739.html
> >
> >
> > It may be that "setdiff" as intended in the base R
> functionality and "prob" was never intended to provide the
> type of result I desire.  If that is the case then I will
> need to ask the "Ninjas" for help to produce the out come I
> seek.
> >
> > That is, when I different the data within
> RSetDiffEntry.csv and RSetDuplicatesRemoved.csv, I desire to
> get the result shown in  RDesired.csv.
> >
> > Note that, it would not be enough to just work to
> remove duplicate "CostPerSquareFoot" values, since that
> variable is tied to "EntryDate" and "HouseNumber".
> >
> > Any further help and insights are much appreciated.
> >
> > Thanks again,
> > Jason
> >
> 
> From your description, something like the following should
> work:
> 
> Let A = your RSetDiffEntry
> Let B = your RSetDuplicatesRemoved...
> 
> library(prob)
> C <- setdiff(A,B)
> D <- rbind(A,C)
> E <- D[duplicated(D),]
> 
> The E should = your RDesired.
> 
> Hope this helps,
> Jay
> 
> P.S.  I notice your row number 7 in
> "RSetDuplicatesRemoved" is
> duplicated by the following row. That's a typo, yes? 
> If so, then E
> should have one more row than your "RDesired."
> 




______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to