bharath-techie commented on issue #19216: URL: https://github.com/apache/datafusion/issues/19216#issuecomment-3635177078
So this is the error I get :
```
datafusion-cli -m 8G
DataFusion CLI v51.0.0
> SET datafusion.execution.target_partitions=1;
0 row(s) fetched.
Elapsed 0.001 seconds.
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION '/home/ec2-user/clickdata/partitioned/hits/*.parquet';
0 row(s) fetched.
Elapsed 0.054 seconds.
> SELECT "URL", COUNT(*) AS c FROM hits GROUP BY "URL" ORDER BY c DESC LIMIT
10;
Resources exhausted: Additional allocation failed for TopK[0] with top
memory consumers (across reservations) as:
TopK[0]#4(can spill: false) consumed 7.8 GB, peak 7.8 GB,
GroupedHashAggregateStream[0] (count(1))#3(can spill: true) consumed 80.4
KB, peak 4.6 GB,
DataFusion-Cli#2(can spill: false) consumed 0.0 B, peak 0.0 B.
Error: Failed to allocate additional 3.9 GB for TopK[0] with 7.8 GB already
allocated for this reservation - 187.0 MB remain available for the total pool
>
```
Once i added spill to `TopK`, I was able to get results back with same
memory config
```
explain analyze SELECT "URL", COUNT(*) AS c FROM hits GROUP BY "URL" ORDER
BY c DESC LIMIT 10;
+-------------------
| Plan with Metrics | SortExec: TopK(fetch=10), expr=[c@1 DESC],
preserve_partitioning=[false], filter=[c@1 IS NULL OR c@1 > 57340],
metrics=[output_rows=10, elapsed_compute=81.54ms, output_bytes=50.0 MB,
output_batches=1, spill_count=19, spilled_bytes=115.1 MB, spilled_rows=125,
row_replacements=189]
|
| | ProjectionExec: expr=[URL@0 as URL,
count(Int64(1))@1 as c], metrics=[output_rows=18.34 M, elapsed_compute=1.54ms,
output_bytes=8.6 TB, output_batches=2.24 K]
|
| | AggregateExec: mode=Single, gby=[URL@0 as URL],
aggr=[count(Int64(1))], metrics=[output_rows=18.34 M, elapsed_compute=14.45s,
output_bytes=8.6 TB, output_batches=2.24 K, spill_count=0, spilled_bytes=0.0 B,
spilled_rows=0, peak_mem_used=4.90 B, aggregate_arguments_time=38.36ms,
aggregation_time=389.52ms, emitting_time=1.30ms,
time_calculating_group_ids=13.93s]
|
| | DataSourceExec: file_groups={1 group:
[[home/ec2-user/clickdata/partitioned/hits/hits_0.parquet:0..122446530,
home/ec2-user/clickdata/partitioned/hits/hits_1.parquet:0..174965044,
home/ec2-user/clickdata/partitioned/hits/hits_10.parquet:0..101513258,
home/ec2-user/clickdata/partitioned/hits/hits_11.parquet:0..118419888,
home/ec2-user/clickdata/partitioned/hits/hits_12.parquet:0..149514164, ...]]},
projection=[URL], file_type=parquet, metrics=[output_rows=100.00 M,
elapsed_compute=1ns, output_bytes=20.0 GB, output_batches=12.31 K,
files_ranges_pruned_statistics=100 total → 100 matched,
row_groups_pruned_statistics=325 total → 325 matched,
row_groups_pruned_bloom_filter=325 total → 325 matched,
page_index_rows_pruned=0 total → 0 matched, batches_split=0, bytes_scanned=2.62
B, file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=0,
predicate_cache_inner_records=0, predicate_cache_records=0,
predicate_evaluation_errors=0, pushdown_r
ows_matched=0, pushdown_rows_pruned=0, bloom_filter_eval_time=200ns,
metadata_load_time=8.04ms, page_index_eval_time=200ns,
row_pushdown_eval_time=200ns, statistics_eval_time=200ns,
time_elapsed_opening=2.69ms, time_elapsed_processing=6.99s,
time_elapsed_scanning_total=21.96s, time_elapsed_scanning_until_data=241.98ms,
scan_efficiency_ratio=1600% (2.62 B/160.3 M)] |
| |
```
The suggestion I was asking was whether anything I can do anything in
planning configuration to make this work without spill for example. I have all
defaults. I tried reducing the batch size as topK holds 20*batchSize before
compacting, but it didn't seem to help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
