Hi: Here's one approach, although I imagine there are more efficient ways.
# A function to strip spaces and return the first three non-blank elements of a string keyset <- function(x) substr(gsub(' ', '', x)[1], 1, 3) # Apply the function to the data frame to generate the key: a$key <- sapply(a$product, keyset) > a date product sales key 1 20081201 a b c d e 1 abc 2 20081202 a b c g h t 2 abc 3 20081201 d e h a c e h g 3 deh # Use aggregate to sum sales by key: aggregate(sales ~ key, data = a, FUN = sum) key sales 1 abc 3 2 deh 3 HTH, Dennis On Wed, Mar 9, 2011 at 6:02 PM, Hui Du <hui...@dataventures.com> wrote: > > Hi All, > > I have a data frame like > > a = data.frame(date = c(20081201, 20081202, 20081201), product = c("a b c d > e", "a b c g h t", "d e h a c e h g"), sales = c(1, 2, 3)) > > Now I want to aggregate the sales by part of the a$product. > 'Product' is the product name, a string separated by a space. The key in my > aggregate function is first three items in "product" field. In my example, > the key is "a b c", "a b c" and "d e h", respectively. Do you know how to do > it? I thought an awkward way which needed several function calls (like > strsplit, lapply, paste etc) to manipulate the string in 'product' field. I > guess there could be some more elegant way to do it. > > Thanks in advance. > > > HXD > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.