2010YOUY01 commented on issue #12136:
URL: https://github.com/apache/datafusion/issues/12136#issuecomment-2579434876

   For the given reproducer, I got the error
   ```
   Error: ResourcesExhausted("Failed to allocate additional 117568 bytes for 
ExternalSorter[0] with 0 bytes already allocated for this reservation - 0 bytes 
remain available for the total pool
   ```
   Configure 
[datafusion.execution.sort_spill_reservation_bytes](https://datafusion.apache.org/user-guide/configs.html)
 to 1MB can let it run successfully. ( I don't know whether the parquet related 
error message is caused by the same issue)
   
   <details>
   
   ```rust
   // Reproducer: place in datafusion/core/tests/memory_limit/mod.rs
   #[tokio::test]
   async fn test_sort_with_memory_limit() -> Result<()> {
       // initialize logging to see DataFusion's internal logging
       let _ = env_logger::try_init();
   
       // how much data to sort
       let row_limit = 10 * 1000;
       let mem_limit = 10 * 1024 * 1024; // 10 MB
       let sort_spill_reservation_bytes = 1024 * 1024; // 1 MB
   
       let generator = AccessLogGenerator::new()
           .with_row_limit(row_limit)
           .with_max_batch_size(100); // 100 rows per batch
   
       let pool = Arc::new(GreedyMemoryPool::new(mem_limit));
       let runtime = RuntimeEnvBuilder::new()
           .with_memory_pool(pool)
           .with_disk_manager(DiskManagerConfig::new())
           .build()?;
       let session_config = SessionConfig::new()
           .with_sort_spill_reservation_bytes(sort_spill_reservation_bytes);
       let state = SessionStateBuilder::new()
           .with_config(session_config)
           .with_runtime_env(Arc::new(runtime))
           .build();
   
       let ctx = SessionContext::new_with_state(state);
   
       // create a plan that simply sorts on the hostname
       let df = ctx
           .read_batches(generator)?
           .sort(vec![col("host").sort(true, true)])?;
   
       // execute the plan (it should succeed)
       let _results: Vec<RecordBatch> = df.collect().await?;
   
       Ok(())
   }
   ```
   </details>
   
   Reasons:
   - How `sort_spill_reservation_bytes` works: For example there is a sort 
query with memory limit 10MB, and `sort_spill_reservation_bytes` is 1M, it will 
accumulate batches in memory until it reaches 9M, then do in-memory sort. The 
sort will do in-place sort for all individual batches, and finally do a 
sort-preserving merge to get the final one sorted run. 1MB is reserved for 
internal data structures of the merging phase.
   - The query in reproducer only has 10 MB memory budget, and all 10MB is 
reserved for later merge usage, then the execution can fail
   
   Thoughts:
   Note I'm not 100% sure about the implementation detail, if that's the case, 
I think we can get rid of this configuration option, and figure 
`sort_spill_reservation_bytes` according to the available memory budget, to 
prevent similar failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to