> > You want to process all invoices to count them and to sum up the > > amounts on a per month/area/type basis. The initial data size is in > > GB, but the size of the expected result is in KB (namely 2 data for > > each 100 areas * 12 months * 4 types). > > The key to handling large datasets for data mining is pre-aggregation > based on the smallest time frame needed for details. I'd suggest running > these large queries and storing the results in other tables, and then > writing a set of functions to work with those aggregate tables.
Sure, that's what I do. I do not want to pay several joins on 120M tuples. However, the one or few "initial" queries take some time and a lot of space, hence my mail about temporary storage and 'on the fly' data fetching to help improve both speed and temporary storage requirements for this type of application. > No sense in summing up the same set of static data more than once if you > can help it. Sure. I never did that. Thanks for your advices anyway, -- Fabien. ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]