As I replied to Rich privately for another message, I suggest that you may well be able to fit what you need in memory, if careful. But my main point is that when you have so much data, you do not need all of it to make a representative graph. A boxplot made using 100,000 data points may well have too many outliers to display resulting in a bushy tail and not be all that much more accurate than one made using 10,000 randomly chosen data points from it.
So the idea would be to read in df1 into memory, trimming away any columns not needed, then use something like sample() to make a smaller version and rm() the original and repeat by reading in the second and third and so on. Now add a PLACE column to each of df1 through dfN and then cbind() them together and again throw away anything no longer needed. Finally, you can use factors as already discussed including as a way to use less data as a factor is just an integer vector attached to a sort of dictionary containing one copy of the text aspect of your data. Then call ggplot and ... The results may vary depending on the size chosen and it may be wise to use set.seed() to some value so it does the same thing each time you run it. Your thought of going to make separate boxplots also can use as much memory or more if you keep everything in memory as you go along. And, BTW, for people using truly big data, there are approaches that get them huge amounts of memory either within their own machines, or using web services. -----Original Message----- From: R-help <r-help-boun...@r-project.org> On Behalf Of Rich Shepard Sent: Thursday, November 11, 2021 12:56 PM To: R-help <r-help@r-project.org> Subject: Re: [R] ggplot2: multiple box plots, different tibbles/dataframes On Thu, 11 Nov 2021, Bert Gunter wrote: > You can always create a graphics layout and then plot different > ggplot objects in the separate regions of the layout. See ?grid.layout > (since ggplots are grobs) and ?plot.ggplot . This also **may** be > useful by showing examples using grid.arrange() > > https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html > > Still, I suspect that Jeff Newmiller may be right about needing to > structure your data more appropriately for what you wish to do. Bert, For this plot I could create a new data set with only site_nbr, year and cfs columns; it would be 3,016,005 rows long. Or, I could create separate boxplots and arrange them in a row. That might be the easiest. Thanks, Rich ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.