I’m -1 to accepting this. Ray is created by AnyScale, a commercial for-profit company and I don’t think we should accept the provider simply because it has been offered. Doubly
AnyScale should write, maintain and test this, and show that they are actively supporting it _before_ we consider accepting it. Ray in particular is complex, and there have been three or previous attempts at a provider, and we (the PMC/community) do not have the capacity to maintain this right now. In short: build it, show us it is a) maintained, and b) popular enough; before we’ll accept it. -ash > On 29 May 2025, at 11:12, Jarek Potiuk <ja...@potiuk.com> wrote: > > Yeah. Ray seems like a good candidate and having prior art with > Astronomer's and Anyscale implementation - indeed might be easier and we > might be able to bring it faster I guess. Ray is hugely popular in ML > world and from what I remember there were some cool ideas in the astro > provider like custom decorators and making Ray not just executing single > tasks, but also have dependencies between then and use shared memory of ray > and like https://airflowsummit.org/sessions/2021/airflow-ray/ . Also (but > that's something that might be implemented extra if someone is interested) > having Ray Executor seems like an option. > > The question is - as usual - about maintainability, It's easier to accept > if: > > * there is an organisation that is committed to maintain it > * we can run end-2-end integration tests locally > * if not possible because it's a service - there is a system dashboard that > is maintained by 3rd-party that we can use as a decision point that we > suspend the provider if it is not really maintained in the future > * lack of complicated/outdated dependencies that complicate our dependency > management > > I think if we got good answers to all 4 questions, Ray is no brainer to > have (especially that you can install local cluster - which means that > e-2-e tests are possible) - similarly as with Kafka amd recently > Tinkerpop/Gremlin - while those were Apache projects, both had one in > common- full integration tests were possible and we are running them in CI > now - for basic scenarios - on top of unit tests. There is a bit of problem > with the last point potentially - at least recently we had some problems > with adding Rat to ai integration of Google provider > https://github.com/apache/airflow/pull/49797 so it might be a bit > problematic, but hopefully that was only "google" specific > > J. > > > > > > On Wed, May 28, 2025 at 5:42 PM Constance Martineau > <consta...@astronomer.io.invalid> wrote: > >> We do see a not insignificant number of our customers using Ray & Airflow >> together on Astro and Astronomer-Software, so there is definitely interest >> and I believe this could be really valuable. >> https://github.com/astronomer/astro-provider-ray/ was created for >> validation purposes, but we were not able to significantly invest due to >> priority changes. I'd love to see an official provider within the project >> -- checking pypistats quickly, there were more than 13M downloads last >> month for Ray. >> >> On Tue, May 27, 2025 at 3:30 PM Jens Scheffler <j_scheff...@gmx.de.invalid >>> >> wrote: >> >>> Hi, >>> >>> Thanks for the proposal. I assume you have read the >>> >>> >> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers >>> docs? >>> >>> By accident I had also a (not maturing) discussion about integration of >>> Ray as cluster backend into Airflow workflows. But I am not sure how >>> common the demand is. Are there any voices? >>> >>> Have you considered just providing the operators separate and link them >>> in the ecosystem? >>> >>> Have you seen that there way a few years ago a provider being made in >>> Github, but seems this was not maintained for a while: >>> https://github.com/anyscale/airflow-provider-ray As well as >>> https://github.com/astronomer/astro-provider-ray/ >>> >>> Jens >>> >>> On 27.05.25 16:11, Maksim Yermakou wrote: >>>> Hello all, >>>> >>>> I would like to propose adding a new provider for the Ray[1] service to >>> the >>>> Airflow providers. >>>> >>>> Ray is an open source framework to build and scale ML and Python >>>> applications. In the current time in the Google provider we have two >>>> services which can work with Ray are GKE and Vertex AI. But, it is >>>> important to know that operators for working with Ray on GKE and on >>>> VertexAI are only about creating a Ray cluster on Google Cloud >>>> infrastructure. For starting a Ray application we need to start a Ray >> Job >>>> on the Ray Cluster. For doing it users need to use the Ray’s Python SDK >>>> >>>> Knowing all of this I can suggest creating a new provider for Ray >> itself >>>> with operators which can manage Ray’s Job. Here[2] is the code for >> Client >>>> for working with Jobs. We need a new provider, because Ray is not a >>> Google >>>> service. And if we want to provide for users ability to submit jobs to >>>> clusters then we need to create new operators and put them to the new >>>> provider. Also, Ray can work with clusters which are deployed on AWS, >>> Azure >>>> and more. It means that operators from this provider can be used in >>>> combination with operators from amazon and microsoft providers. >>>> >>>> I have started the implementation. And I will be glad to hear from all >> of >>>> you any feedback about my proposal >>>> >>>> >>>> [1] https://docs.ray.io/en/latest/index.html >>>> [2] >>>> >>> >> https://github.com/ray-project/ray/blob/master/python/ray/dashboard/modules/job/sdk.py#L35 >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >>> For additional commands, e-mail: dev-h...@airflow.apache.org >>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org