2010YOUY01 commented on code in PR #13133:
URL: https://github.com/apache/datafusion/pull/13133#discussion_r1821958842
##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -326,18 +343,97 @@ mod tests {
use arrow::compute::SortOptions;
use arrow::datatypes::{DataType, Field, Schema};
use arrow::record_batch::RecordBatch;
+ use arrow_array::Int64Array;
use arrow_schema::SchemaRef;
use datafusion_common::{assert_batches_eq, assert_contains,
DataFusionError};
use datafusion_common_runtime::SpawnedTask;
use datafusion_execution::config::SessionConfig;
+ use datafusion_execution::runtime_env::RuntimeEnvBuilder;
use datafusion_execution::RecordBatchStream;
use datafusion_physical_expr::expressions::Column;
use datafusion_physical_expr::EquivalenceProperties;
use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
use futures::{FutureExt, Stream, StreamExt};
+ use hashbrown::HashMap;
use tokio::time::timeout;
+ fn generate_task_ctx_for_round_robin_tie_breaker() ->
Result<Arc<TaskContext>> {
+ let mut pool_per_consumer = HashMap::new();
Review Comment:
(Previous conversation for the context
https://github.com/apache/datafusion/pull/13133/files#r1818369721)
I tried to remove the per-operator memory limit, and set the total limit to
20M, then rerun two unit tests in this file.
```rust
let mut pool_per_consumer = HashMap::new();
// Bytes from 660_000 to 30_000_000 (or even more) are all valid
limits
// pool_per_consumer.insert("RepartitionExec[0]".to_string(),
10_000_000);
// pool_per_consumer.insert("RepartitionExec[1]".to_string(),
10_000_000);
let runtime = RuntimeEnvBuilder::new()
// Random large number for total mem limit, we only care about
RepartitionExec only
.with_memory_limit_per_consumer(20_000_000, 1.0,
pool_per_consumer)
.build_arc()?;
```
`test_round_robin_tie_breaker_fail` and
`test_round_robin_tie_breaker_success` all passed, the error message for fail
case is: `Expected error: ResourcesExhausted("Additional allocation failed with
top memory consumers (across reservations) as: RepartitionExec[1] consumed
18206496 bytes, SortPreservingMergeExec[0] consumed 1446684 bytes,
RepartitionExec[0] consumed 216744 bytes. Error: Failed to allocate additional
216744 bytes for SortPreservingMergeExec[0] with 433488 bytes already allocated
for this reservation - 130076 bytes remain available for the total pool")`
I think this error message is expected: SPM only reserved constant memory,
and `RepartitionExec`'s memory consumption indeed keeps growing.
If that's the case memory pool related changes are not necessary any more,
I'd prefer to remove them from this PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]