Re: [I] Very high memory usage for unnest + group by [datafusion]

via GitHub Fri, 13 Mar 2026 08:38:57 -0700


psuszyns commented on issue #20788:
URL: https://github.com/apache/datafusion/issues/20788#issuecomment-4056074712


   The query I provided in the description is a bit incorrect. Corrected query:
   
   ```sql
       WITH indexed AS (
           SELECT ROW_NUMBER() OVER () as row_idx, metadata, values_a, 
values_b, values_c 
           FROM test_data
       ),
       unnested AS (
           SELECT 
               row_idx, metadata,
               ROW_NUMBER() OVER (PARTITION BY row_idx ORDER BY row_idx) as 
val_idx,
               unnest(values_a) as val_a,
               unnest(values_b) as val_b,
               unnest(values_c) as val_c
           FROM indexed
       ),
       transformed AS (
           SELECT
               row_idx, val_idx, metadata, val_a, val_b, val_c,
               CASE WHEN val_c > 100 THEN val_a * val_b ELSE val_a + val_b END 
AS val_d
           FROM unnested
       )
       SELECT
           row_idx, metadata,
           array_agg(val_a ORDER BY val_idx) AS values_a,
           array_agg(val_b ORDER BY val_idx) AS values_b,
           array_agg(val_c ORDER BY val_idx) AS values_c,
           array_agg(val_d ORDER BY val_idx) AS values_d
       FROM transformed
       GROUP BY row_idx, metadata
       ORDER BY row_idx
   ```
   
   and when I remove the "ORDER BY val_idx" from all array_agg the memory no 
longer explodes, although it remain quite high.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Very high memory usage for unnest + group by [datafusion]

Reply via email to