psuszyns commented on issue #20788:
URL: https://github.com/apache/datafusion/issues/20788#issuecomment-4056074712
The query I provided in the description is a bit incorrect. Corrected query:
```sql
WITH indexed AS (
SELECT ROW_NUMBER() OVER () as row_idx, metadata, values_a,
values_b, values_c
FROM test_data
),
unnested AS (
SELECT
row_idx, metadata,
ROW_NUMBER() OVER (PARTITION BY row_idx ORDER BY row_idx) as
val_idx,
unnest(values_a) as val_a,
unnest(values_b) as val_b,
unnest(values_c) as val_c
FROM indexed
),
transformed AS (
SELECT
row_idx, val_idx, metadata, val_a, val_b, val_c,
CASE WHEN val_c > 100 THEN val_a * val_b ELSE val_a + val_b END
AS val_d
FROM unnested
)
SELECT
row_idx, metadata,
array_agg(val_a ORDER BY val_idx) AS values_a,
array_agg(val_b ORDER BY val_idx) AS values_b,
array_agg(val_c ORDER BY val_idx) AS values_c,
array_agg(val_d ORDER BY val_idx) AS values_d
FROM transformed
GROUP BY row_idx, metadata
ORDER BY row_idx
```
and when I remove the "ORDER BY val_idx" from all array_agg the memory no
longer explodes, although it remain quite high.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]