alamb commented on code in PR #20820:
URL: https://github.com/apache/datafusion/pull/20820#discussion_r2936567717
##########
datafusion/datasource/src/file_stream.rs:
##########
@@ -39,30 +39,91 @@ use datafusion_physical_plan::metrics::{
use arrow::record_batch::RecordBatch;
use datafusion_common::instant::Instant;
+use crate::morsel::{FileOpenerMorselizer, Morsel, MorselPlanner, Morselizer};
use futures::future::BoxFuture;
use futures::stream::BoxStream;
-use futures::{FutureExt as _, Stream, StreamExt as _, ready};
+use futures::{FutureExt, Stream, StreamExt as _};
+
+/// How many planners can be active (performing I/O or producing morsels) at
once for a given `FileStream`?
+///
+/// This setting controls the potential number of concurrent IOs.
+///
+/// Setting this to 1 means that the `FileStream` will only have one active
+/// planner at a time, and will not start opening the next file until the
+/// current file is fully processed. Setting this to a higher number allows the
+/// `FileStream` to start opening the next file while still processing the
+/// current file, which can improve performance by overlapping IO and CPU work.
+/// However, setting this too high may lead to more memory buffering and
+/// resource contention if there are too many concurrent IOs.
+///
+/// TODO make this a config option
+const TARGET_CONCURRENT_PLANNERS: usize = 2;
+
+/// Keep at most this many morsels buffered before pausing additional planning.
+///
+/// The default is one morsel per available core. The intent is that once work
+/// stealing is added, each other core can find at least one morsel to steal
+/// without requiring the scan to eagerly buffer an unbounded amount of work.
+///
+/// TODO make this a config option
+fn max_buffered_morsels() -> usize {
Review Comment:
Good points. I am hoping that we'll soon be in a position to test these
strategies (I think we can express it all in FileStream, and the basic
morselizing API is th esame)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]