In the Links box to the right on the sqldf home page click on "SQLite - aggregate functions" and lookup group_concat.
On Wed, Mar 25, 2009 at 9:05 AM, Schragi Schwartz <schra...@post.tau.ac.il> wrote: > Thank you, your answer was extremely helpful. One last problem though: one > of the aggregate functions I'd like to apply on the columns is > concatentation (equivalent to the paste() function). So if I have a given > character column in three separate rows sharing the same ids with the value > "apple" in the first, "banana" in the second, and "orange" in the third, in > the summarizing row I'd like to receive output in the form > "apple|banana|orange". Is there any way to do this? > > Thanks again, > Schragi > > -----Original Message----- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Tuesday, March 24, 2009 12:50 AM > To: Schraga Schwartz > Cc: r-help@r-project.org > Subject: Re: [R] Merging rows in dataframes > > Using sqldf you only need two statements, infile <- file(...) and > DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by id"). > The file statement identifies the filename and the second reads it > into sqlite (without > going through R), summarizes it and then reads the summarized version > into R. You may also need to provide info on its format if its not in the > default format. See example 4a on home page and the other examples > there: > http://sqldf.googlecode.com > > > On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz > <schra...@post.tau.ac.il> wrote: >> Hello, >> >> I have a dataframe with 40 columns and around 450,000 rows. The first > column >> in each row is a factor id and the remaining are numeric. Some rows have > the >> same ids. What I want to do is to merge each set of rows sharing the same >> ids (id set) into one single row (summarizing row) with that id. To create >> the summarizing row, I'd like to apply a different function on each of the >> original columns in the id set. Some columns within the summarizing row > will >> equal the mean of the columns in the id set, others will equal the > minimum, >> others the maximum. >> >> To do this, I tried using the by() function. However, this was extremely >> slow (it ran for more than two hours before I stopped it). Also, it used > up >> all of 16 GB of memory on my machine. Is there any more efficient > function, >> both in terms of time and memory, to do this sort of thing? >> >> Thank you very much, >> Schraga Schwartz >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.