> Hannu Krosing <[EMAIL PROTECTED]> writes: > > OTOH, I'm not sure if hash aggregates can already spill to disk if not > > enough memory is available for keeping them all. > > They do not, which is something it'd be good to fix, since if the > planner drastically underestimates the number of groups, you could end > up with a hashtable far bigger than the advertised SortMem. Even if > this doesn't end up with an "out of memory" error, it could still drive > you into swap hell. > > (Maybe that should be Robert's get-your-feet-wet project.)
Ok, I've been looking through the executor - nodeGroup.c, nodeAgg.c, dynahash.c, execGrouping.c, etc.. I've found the places where hashed aggregation occurs. It looks like hashed groups simply read the entire result in memory via a function chain through lookup_hash_entry() -> LookupTupleHashEntry() -> hash_search(). I think that LookupTupleHashEntry is the best place to put the code to spill over unto disk, since that is used only by the Group, Agg, and Subplan executor nodes. Putting it there gives the added benefit of letting hashed subselect results spill over unto disk as well (not sure if that's the right thing to do). I have a couple of questions: 1) When does the optimizer set the nodeAgg plan to HASHED? The current implementation simply reads through the entire result before returning anything, so obviously it's not always done. Sorry if I should RTFM on this one.... 2) What mechanism would be best to use for storing the data on disk? I know there is a temporary table mechanism, I'll be hunting for that shortly.. 3) What should define the spillover point. The documentation points to the 'sort_mem' parameter for this, but the code doesn't look to actually implement that yet. Similarly for the hash_search() function - I didn't see anything to store it to disk. 4) Should LookupTupleHashEntry() be worried about the pointers it receives...similarly for hash_search()? Obviously (2) is the really big question. What's the best way to do this? I still have a bit more to go before I understand what's going on, but I'm starting to grasp it. Any tips would be appreciated! :) Cheers! Robert ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])