2010YOUY01 commented on code in PR #13133:
URL: https://github.com/apache/datafusion/pull/13133#discussion_r1821958842


##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -326,18 +343,97 @@ mod tests {
     use arrow::compute::SortOptions;
     use arrow::datatypes::{DataType, Field, Schema};
     use arrow::record_batch::RecordBatch;
+    use arrow_array::Int64Array;
     use arrow_schema::SchemaRef;
     use datafusion_common::{assert_batches_eq, assert_contains, 
DataFusionError};
     use datafusion_common_runtime::SpawnedTask;
     use datafusion_execution::config::SessionConfig;
+    use datafusion_execution::runtime_env::RuntimeEnvBuilder;
     use datafusion_execution::RecordBatchStream;
     use datafusion_physical_expr::expressions::Column;
     use datafusion_physical_expr::EquivalenceProperties;
     use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
 
     use futures::{FutureExt, Stream, StreamExt};
+    use hashbrown::HashMap;
     use tokio::time::timeout;
 
+    fn generate_task_ctx_for_round_robin_tie_breaker() -> 
Result<Arc<TaskContext>> {
+        let mut pool_per_consumer = HashMap::new();

Review Comment:
   (Previous conversation for the context 
https://github.com/apache/datafusion/pull/13133/files#r1818369721)
   
   I tried to remove the per-operator memory limit, and set the total limit to 
20M, then rerun two unit tests in this file.
   ```rust
           let mut pool_per_consumer = HashMap::new();
           // Bytes from 660_000 to 30_000_000 (or even more) are all valid 
limits
           // pool_per_consumer.insert("RepartitionExec[0]".to_string(), 
10_000_000);
           // pool_per_consumer.insert("RepartitionExec[1]".to_string(), 
10_000_000);
   
           let runtime = RuntimeEnvBuilder::new()
               // Random large number for total mem limit, we only care about 
RepartitionExec only
               .with_memory_limit_per_consumer(20_000_000, 1.0, 
pool_per_consumer)
               .build_arc()?;
   ```
   `test_round_robin_tie_breaker_fail` and 
`test_round_robin_tie_breaker_success` all passed, the error message for fail 
case is: `Expected error: ResourcesExhausted("Additional allocation failed with 
top memory consumers (across reservations) as: RepartitionExec[1] consumed 
18206496 bytes, SortPreservingMergeExec[0] consumed 1446684 bytes, 
RepartitionExec[0] consumed 216744 bytes. Error: Failed to allocate additional 
216744 bytes for SortPreservingMergeExec[0] with 433488 bytes already allocated 
for this reservation - 130076 bytes remain available for the total pool")`
   I think this error message is expected: SPM only reserved constant memory, 
and `RepartitionExec`'s memory consumption indeed keeps growing.
   If that's the case memory pool related changes are not necessary any more, 
I'd prefer to remove them from this PR.
   I think setting per-consumer memory limits by name makes the API less 
user-friendly and increases the risk of implementation issues with future 
changes. Perhaps we can explore a better solution later if a more suitable use 
case arises.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to