Re: [D] Multiple 'group by's, one scan [datafusion]

2025-05-15 Thread via GitHub
GitHub user pepijnve added a comment to the discussion: Multiple 'group by's, one scan Just FYI, in the particular case I'm working on the problem I'm dealing with is that I want to compute a whole bunch of aggregates over a table with a cardinality in the billions order or magnitude. For `n`

Re: [D] Multiple 'group by's, one scan [datafusion]

2025-05-15 Thread via GitHub
GitHub user alamb added a comment to the discussion: Multiple 'group by's, one scan There is some additional discussion on a similar sounding feature here: - https://github.com/apache/datafusion/issues/8777 Another potential approach is to fully materialize the input (`INSERT INTO temp_file.p

Re: [D] Multiple 'group by's, one scan [datafusion]

2025-05-15 Thread via GitHub
GitHub user alamb added a comment to the discussion: Multiple 'group by's, one scan The other types of plans I have seen this cause problems is when the operators rely on sort order -- so like `SortPreservingMerge` or a group by where the data is partially sorted on some of the group keys Gi

Re: [D] Multiple 'group by's, one scan [datafusion]

2025-05-15 Thread via GitHub
GitHub user pepijnve added a comment to the discussion: Multiple 'group by's, one scan Perhaps satisfying the non-general case would already be of value? For the diamond self-join example in the linked issue it doesn't make much sense indeed. Can you think of other examples besides joins wher

Re: [D] Multiple 'group by's, one scan [datafusion]

2025-05-15 Thread via GitHub
GitHub user pepijnve added a comment to the discussion: Multiple 'group by's, one scan I read through the linked issues in the meantime. I think what we're trying to do is closest to the Splitter idea described in the linked document at https://github.com/apache/datafusion/pull/8558#issuecomm