[PR] fix: External sort failing on an edge case [datafusion]

via GitHub Wed, 05 Mar 2025 01:52:35 -0800


2010YOUY01 opened a new pull request, #15017:
URL: https://github.com/apache/datafusion/pull/15017

## Which issue does this PR close?

- Closes #.

NA
## Rationale for this change

I came across one sorting query with memory limit fail indefinitely, here is
a simpler reproducer (running in `datafusion-cli` with commit 75977692c)
```sh
# Compile datafusion-cli
cargo run --profile release-nonlto -- --mem-pool-type fair -m 10M
```
2 sort queries are executed: Q1 get executed with no issue, Q2 has smaller
input size than Q1, but it failed.
```
DataFusion CLI v46.0.0
> set datafusion.execution.sort_spill_reservation_bytes = 3000000;
0 row(s) fetched.
Elapsed 0.001 seconds.

> select * from generate_series(1,10000000) as t1(v1) order by v1;
...Query succeed

> select * from generate_series(1,9000000) as t1(v1) order by v1;
Resources exhausted: Failed to allocate additional 65536 bytes for
ExternalSorterMerge[0] with 0 bytes already allocated for this reservation -
49152 bytes remain available for the total pool
```

### Query failure reason
At the final stage of sorting, all buffered in-memory batches and all the
spilled files will be sort-preserving merged at the same time, and this caused
the issue.

https://github.com/apache/datafusion/blob/75977692c12bda72301ccf65067532c5135fbd5c/datafusion/physical-plan/src/sorts/sort.rs#L342-L355
For example, there is one workload, let's say it's executing in a single
partition. It's memory limit can hold 10 batches.
- Sorting 100 batches can be executed without issue:
- Every time 10 batches are read, mempool is full and one spill file
will be written to disk
- Finally, there are 10 spill files, only one batch of each file is
required to load to memory at the same time, so there is enough memory budget
to do the final merging.
- Sorting 49 batches fails:
- When the input is exhausted, there are 9 in-mem batches and 4 spill
files. 9 + 4 batches are required to load to memory for final merging, it
exceeds the memory pool limit which is around 10 batches.

A common workaround I believe is to set
`datafusion.execution.sort_spill_reservation_bytes` to larger, its used for
extra space for merging. However, workloads' row/batch size can vary
drastically, also it's possible to see the case in-memory batches has almost
reached the memory limit but not yet triggered on spilling, so this parameter
is very tricky to configure it correct.
To make this simpler, we can always spill the in-memory batches (if it has
spilled previously) at the final stage.

## What changes are included in this PR?

Change the final sort-preserving merge logic of sorting: when it has spilled
before, always spill all in-mem batches first, then start the merging phase.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are
they covered by existing tests)?
-->
Regression test is added

## Are there any user-facing changes?

No

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: External sort failing on an edge case [datafusion]

Reply via email to