2010YOUY01 commented on issue #12136: URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2579434876
For the given reproducer, I got the error ``` Error: ResourcesExhausted("Failed to allocate additional 117568 bytes for ExternalSorter[0] with 0 bytes already allocated for this reservation - 0 bytes remain available for the total pool ``` Configure [datafusion.execution.sort_spill_reservation_bytes](https://datafusion.apache.org/user-guide/configs.html) to 1MB can let it run successfully. ( I don't know whether the parquet related error message is caused by the same issue) <details> ```rust // Reproducer: place in datafusion/core/tests/memory_limit/mod.rs #[tokio::test] async fn test_sort_with_memory_limit() -> Result<()> { // initialize logging to see DataFusion's internal logging let _ = env_logger::try_init(); // how much data to sort let row_limit = 10 * 1000; let mem_limit = 10 * 1024 * 1024; // 10 MB let sort_spill_reservation_bytes = 1024 * 1024; // 1 MB let generator = AccessLogGenerator::new() .with_row_limit(row_limit) .with_max_batch_size(100); // 100 rows per batch let pool = Arc::new(GreedyMemoryPool::new(mem_limit)); let runtime = RuntimeEnvBuilder::new() .with_memory_pool(pool) .with_disk_manager(DiskManagerConfig::new()) .build()?; let session_config = SessionConfig::new() .with_sort_spill_reservation_bytes(sort_spill_reservation_bytes); let state = SessionStateBuilder::new() .with_config(session_config) .with_runtime_env(Arc::new(runtime)) .build(); let ctx = SessionContext::new_with_state(state); // create a plan that simply sorts on the hostname let df = ctx .read_batches(generator)? .sort(vec![col("host").sort(true, true)])?; // execute the plan (it should succeed) let _results: Vec<RecordBatch> = df.collect().await?; Ok(()) } ``` </details> Reasons: - How `sort_spill_reservation_bytes` works: For example there is a sort query with memory limit 10MB, and `sort_spill_reservation_bytes` is 1M, it will accumulate batches in memory until it reaches 9M, then do in-memory sort. The sort will do in-place sort for all individual batches, and finally do a sort-preserving merge to get the final one sorted run. 1MB is reserved for internal data structures of the merging phase. - The query in reproducer only has 10 MB memory budget, and all 10MB is reserved for later merge usage, then the execution can fail Thoughts: Note I'm not 100% sure about the implementation detail, if that's the case, I think we can get rid of this configuration option, and figure `sort_spill_reservation_bytes` according to the available memory budget, to prevent similar failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org