The easiest thing might be to just sort on the Loaded column: #### start of example d <- read.table(textConnection("ESR_ref ESR_ref_edit Loaded 1.1 1.1 Y 1.1.1 1.1 NC 1.1.2 1.1 Y 2.1 2.1 N 2.1.1 2.1 Y 2.1.2 2.1 PU 2.1.3 2.1 Y 3.1 3.1 Y 4.1 4.1 N 4.1.1 4.1 PU"), header = TRUE)
d$Loaded <- ordered(d$Loaded, levels = c("Y", "NC", "PU", "N")) dSorted <- d[order(d$Loaded),] subset(dSorted, !duplicated(dSorted$ESR_ref_edit)) #### end of code You could also try using tapply. --Gray On Thu, Dec 17, 2009 at 11:31 AM, gcam <gcam...@gmail.com> wrote: > > Thanks Gray, > > This helps, I'd completely forgotten about the subset command. However, it > doesn't quite get me where I need. Perhaps an example will help. I will > simplify my dataframe to the three important variables: > > ESR_ref ESR_ref_edit Loaded > 1.1 1.1 Y > 1.1.1 1.1 NC > 1.1.2 1.1 Y > 2.1 2.1 N > 2.1.1 2.1 Y > 2.1.2 2.1 PU > 2.1.3 2.1 Y > 3.1 3.1 Y > 4.1 4.1 N > 4.1.1 4.1 PU > > So I've created the "edit" variable so I can test for duplicates (i.e. > samples with more than one sub-sample) because this is not of interest at > this point. I just want one subsample per sample. However, if we consider > 2.1 - this would result in a subset (if duplicates were removed) with the > first line which has an "N". But it is of interest to me the if at least > one of the subsamples has a "Y" then I want that line rather than a > subsample with another code. 1.1 in this example works by default because > the first subsample is a "Y" so it will retain that information. > > Thanks > > Gareth > > > Gray Calhoun-2 wrote: >> >> Hi, >> Try: >> >> subset(Samps, !duplicated(Samps$ESR_ref_edit) | Samps$Loaded == "Y") >> >> I'd need specific code to be sure that this is exactly what you want >> (ie you specify input and desired output), but indexing with a logical >> vector is probably going to be the solution. >> >> Best, >> Gray >> >> On Wed, Dec 16, 2009 at 7:55 PM, gcam <gcam...@gmail.com> wrote: >>> >>> Hi all. >>> >>> So I have a data frame with multiple columns/variables. The first >>> variable >>> is a major sample name for which there are some sub-samples. Currently I >>> have used the following command to remove the duplicates: >>> >>> Samps_working<-Samps[-c(which(duplicated(Samps$ESR_Ref_edit))),] >>> >>> This removes all of the duplicated sample rows. >>> >>> However, I just realised that, of course, this removes the first >>> observation >>> of each duplicated set. However, I wish to retain any that have the code >>> "Y" in another variable Samps$Loaded. I'm at a bit of a loss as to how >>> best >>> to approach this problem. >>> >>> Just to reiterate. I want to remove all duplicate lines based on sample >>> name, but, I want the lines to be removed with a preference given to >>> those >>> that do not include a "Y" in the Loaded variable column. >>> -- >>> View this message in context: >>> http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p965745.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Gray Calhoun >> >> Assistant Professor of Economics >> Iowa State University >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p974312.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Gray Calhoun Assistant Professor of Economics Iowa State University ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.