Try this: had to add extra names to your data since it was not clear how it was organized. Next time use 'dput' to enclose data.
> x <- read.table(textConnection(" index time key date values + 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 + 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 + 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 + 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 + 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 + 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 + 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 + 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 + 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 + 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 + 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 + 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 + 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 + 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 + 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 + 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 + 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43") + , header = TRUE + , as.is = TRUE + ) > closeAllConnections() > x index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 2 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 3 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 5 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 7 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 11 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 14 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 16 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 17 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 > # get index of first occurance of 'key' column > indx <- !duplicated(x$key) > x[indx,] index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 > > On Mon, Aug 1, 2011 at 11:13 AM, Francesca <francesca.panco...@gmail.com> wrote: > Dear Contributors > thanks for any help you can provide. I searched the threads > but I could not find any query that satisfied my needs. > This is my database: > index time values > 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 > 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 > 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 > 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 > 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 > 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 > 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 > 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 > 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 > 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 > 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 > 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 > 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 > 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 > 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 > 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 > 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 > > I need to select only the rows of this database that correspond to each > of the first occurrences of the string represented in column > index. In the example shown I would like to obtain a new > data.frame which is > > index time values > 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 > 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 > 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 > 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 > 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 > 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 > 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 > 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 > > As you can see, it is not the whole string to change, > rather a substring that is part of it. I want to select > only the first values related to the row that presents for the first time > the different part of the string(substring). > I know how to select rows according to a substring condition on the > index column, but I cannot use it here because the substring changes > and moreover the number of occurrences per substring is variable. > > Thank you for any help you can provide. > Francesca > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.