2010YOUY01 opened a new issue, #12937:
URL: https://github.com/apache/datafusion/issues/12937
### Describe the bug
Actual memory consumption remains high even if the memory limit is specified
### To Reproduce
Compile and run datafusion-cli with:
```sh
# /usr/bin/time to measure actual memory usage
/usr/bin/time -l cargo run -- --mem-pool-type fair -m 1G -f 'aggr.sql'
```
And it showed max memory resident set size is ~400M
Next, set the memory limit to 200M and run it again, max RSS is still ~400M
Use `explain analyze` can show that the spill actually happened
And the query has ~1M groups, if changing it to 4 groups (see `aggr.sql`),
max RSS is around 80M, the major memory consumer should be aggregation, so it's
likely there is some issue with external aggregation implementation
<details>
<summary>aggr.sql</summary>
Change file path to parquet lineitem file generated by
https://github.com/apache/datafusion/tree/main/benchmarks
```sql
-- aggr.sql
CREATE EXTERNAL TABLE IF NOT EXISTS lineitem (
l_orderkey BIGINT,
l_partkey BIGINT,
l_suppkey BIGINT,
l_linenumber INTEGER,
l_quantity DECIMAL(15, 2),
l_extendedprice DECIMAL(15, 2),
l_discount DECIMAL(15, 2),
l_tax DECIMAL(15, 2),
l_returnflag VARCHAR,
l_linestatus VARCHAR,
l_shipdate DATE,
l_commitdate DATE,
l_receiptdate DATE,
l_shipinstruct VARCHAR,
l_shipmode VARCHAR,
l_comment VARCHAR,
) STORED AS parquet
LOCATION
'/Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem';
explain analyze
SELECT
count(*)
FROM lineitem
GROUP BY l_suppkey, l_partkey, l_returnflag, l_linestatus; -- cardinality is
~36% of lineitem
-- GROUP BY l_returnflag, l_linestatus -- 4 groups
```
</details>
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]