AdamGS commented on code in PR #20481:
URL: https://github.com/apache/datafusion/pull/20481#discussion_r2853322395
##########
datafusion/datasource/src/file_stream.rs:
##########
@@ -130,9 +130,16 @@ impl FileStream {
///
/// Since file opening is mostly IO (and may involve a
/// bunch of sequential IO), it can be parallelized with decoding.
+ ///
+ /// In morsel-driven mode this prefetches the next already-morselized item
+ /// from the shared queue (leaf morsels only — items that still need
+ /// async morselization are left in the queue for the normal Idle →
+ /// Morselizing path).
fn start_next_file(&mut self) -> Option<Result<FileOpenFuture>> {
if self.morsel_driven {
- return None;
+ let queue = Arc::clone(self.shared_queue.as_ref()?);
Review Comment:
Its not as well structured, but the general structure can be followed
starting at
[`RepeatedScan::execute`](https://github.com/vortex-data/vortex/blob/89cee35fc8d153f153646f5402e71062d5b34ca5/vortex-scan/src/repeated_scan.rs#L120),
it returns a Vec of futures, where each future returns a chunk of data
(~RecordBatch).
Each of the futures runs the full pipeline (filter -> project) for a row
split, with the underlying `LayoutReader` layer de-duplicating some of the IO
so we only fetch a physical chunk of an array once, even if it happens to be
spread across two splits row-wise.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]