milenkovicm opened a new issue, #1238: URL: https://github.com/apache/datafusion-ballista/issues/1238
At the moment we have three different task distribution strategies - binding - round robin - consistent hashing I believe we should open scheduler interface exposing pluggable way to reconfigure task distribution. it may provide a way to implement location aware tasks. **Describe the solution you'd like** extend `TaskDistributionPolicy` adding `TaskDistributionPolicy::Custom(Arc<dyn DistributionPolicy>)` adding a trait ``` async_trait::async_trait] pub trait DistributionPolicy: std::fmt::Debug + Send + Sync { /// User provided custom task distribution policy /// /// # Parameters /// /// * `slots` - vector of available executor slots, there may not be available slots /// * `running_jobs` - (JobId -> JobInfoCache) cache must contain only running jobs /// /// # Returns /// /// vector of task, executor bounding /// async fn bind_tasks( &self, mut slots: Vec<&mut AvailableTaskSlots>, running_jobs: Arc<HashMap<String, JobInfoCache>>, ) -> datafusion::error::Result<Vec<BoundTask>>; } ``` which would provide a method to bing tasks. **Describe alternatives you've considered** Use `ClusterState::bind_schedulable_tasks` directly which would bring code duplication **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org