edmondop commented on issue #9370:
URL: https://github.com/apache/datafusion/issues/9370#issuecomment-2090358005

   > > Still work in progress, @mustafasrepo you recommended the `split` to 
return a `Vec<SendableStream>`, however once I have a `RecordBatch` I got lost 
a bit. Should that part be implemented using a RepartitionStream too?
   > 
   > I think so, `PerPartitionStream` might work also. I think we need to 
consume `input: Arc<dyn ExecutiionPlan>`, which generates `RecordBatch`. Then 
these `RecordBatch`es should be fed to the output channels, (as in 
`pull_from_input` method). Then once channels are fed with `RecordBatch`es we 
can construct `SendableRecordBatchStream`s from them using either 
`RepartitionStream` or `PerPartitionStream`. I think the best way to proceed is 
to write a method (this method can assume input partition is always 1)
   > 
   > ```rust
   > async fn pull_from_input_helper(
   >         input: Arc<dyn ExecutionPlan>,
   >         partition: usize,
   >         mut output_channels: HashMap<
   >             usize,
   >             (DistributionSender<MaybeBatch>, SharedMemoryReservation),
   >         >,
   >         metrics: RepartitionMetrics,
   >         context: Arc<TaskContext>,
   >     ) -> Result<()> 
   > ```
   > 
   > similar to `fn pull_from_input` where partitioning is always roundrobin. 
And writing another method
   > 
   > ```rust
   > fn generate_streams(necessary_data, receivers, context, etc) -> 
Vec<SendableRecordBatchStream>
   > ```
   > 
   > I will try to experiment with these in my spare time (If you have a branch 
you already work on we can collaborate on that branch if that is OK for you).
   
   There is a draft PR linked to the PR, but I have missed the part of the 
output_channels, I will do some more work and ping you again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to