Re: Spilling hashed SetOps and aggregates to disk

2018-06-21 Thread David Gershuni
On Jun 21, 2018, at 1:04 PM, Jeff Davis wrote: > On Thu, 2018-06-21 at 11:04 -0700, David Gershuni wrote: >> This approach seems functionally correct, but I don't like the idea >> of >> transforming O(N) tuples of disk I/O into O(S*N) tuples of disk I/O >> (in the

Re: Spilling hashed SetOps and aggregates to disk

2018-06-21 Thread David Gershuni
> On Jun 19, 2018, at 10:36 PM, Jeff Davis wrote: > > But I am worried that I am missing something, because it appears that > for AGG_MIXED, we wait until the end of the last phase to emit the > contents of the hash tables. Wouldn't they be complete after the first > phase? You're right. They'

Re: Spilling hashed SetOps and aggregates to disk

2018-06-15 Thread David Gershuni
> On Jun 13, 2018, at 12:53 PM, Jeff Davis wrote: > >> >> An adaptive hash agg node would start as a hash agg, and turn into a >> "partial hash agg + sort + final group agg" when OOM is detected. >> >> The benefit over ordinary sort+group agg is that the sort is >> happening >> on a potential

Re: Spilling hashed SetOps and aggregates to disk

2018-06-07 Thread David Gershuni
As Serge mentioned, we’ve implemented spill-to-disk for SetOps and Aggregates at Salesforce. We were hitting OOMs often enough that this became a high priority for us. However, our current spill implementation is based on dynahash from 9.6, and we’re not happy with its performance (it was primar