Hi, Again, thank you for your extremely helpful answer. I'll try my luck a third time with a similar but different question: assuming I have two dataframes, a and b, each of which containing a set of start and end coordinates, and I want to create a new dataframe c, which contains all coordinates in a which are are overlapped by any coordinate in b. Can I do this via sqldf()? Or is there any other way to do this?
Thank you very much! Schragi -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Wednesday, March 25, 2009 3:44 PM To: Schragi Schwartz Cc: r-help@r-project.org Subject: Re: [R] Merging rows in dataframes I've added an example to FAQ 3 on the home page that illustrates group_concat. http://sqldf.googlecode.com On Wed, Mar 25, 2009 at 9:06 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > In the Links box to the right on the sqldf home page click on > "SQLite - aggregate functions" and lookup group_concat. > > On Wed, Mar 25, 2009 at 9:05 AM, Schragi Schwartz > <schra...@post.tau.ac.il> wrote: >> Thank you, your answer was extremely helpful. One last problem though: one >> of the aggregate functions I'd like to apply on the columns is >> concatentation (equivalent to the paste() function). So if I have a given >> character column in three separate rows sharing the same ids with the value >> "apple" in the first, "banana" in the second, and "orange" in the third, in >> the summarizing row I'd like to receive output in the form >> "apple|banana|orange". Is there any way to do this? >> >> Thanks again, >> Schragi >> >> -----Original Message----- >> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >> Sent: Tuesday, March 24, 2009 12:50 AM >> To: Schraga Schwartz >> Cc: r-help@r-project.org >> Subject: Re: [R] Merging rows in dataframes >> >> Using sqldf you only need two statements, infile <- file(...) and >> DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by id"). >> The file statement identifies the filename and the second reads it >> into sqlite (without >> going through R), summarizes it and then reads the summarized version >> into R. You may also need to provide info on its format if its not in the >> default format. See example 4a on home page and the other examples >> there: >> http://sqldf.googlecode.com >> >> >> On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz >> <schra...@post.tau.ac.il> wrote: >>> Hello, >>> >>> I have a dataframe with 40 columns and around 450,000 rows. The first >> column >>> in each row is a factor id and the remaining are numeric. Some rows have >> the >>> same ids. What I want to do is to merge each set of rows sharing the same >>> ids (id set) into one single row (summarizing row) with that id. To create >>> the summarizing row, I'd like to apply a different function on each of the >>> original columns in the id set. Some columns within the summarizing row >> will >>> equal the mean of the columns in the id set, others will equal the >> minimum, >>> others the maximum. >>> >>> To do this, I tried using the by() function. However, this was extremely >>> slow (it ran for more than two hours before I stopped it). Also, it used >> up >>> all of 16 GB of memory on my machine. Is there any more efficient >> function, >>> both in terms of time and memory, to do this sort of thing? >>> >>> Thank you very much, >>> Schraga Schwartz >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.