I’m -1 to accepting this.

Ray is created by AnyScale, a commercial for-profit company and I don’t think 
we should accept the provider simply because it has been offered. Doubly 

AnyScale should write, maintain and test this, and show that they are actively 
supporting it _before_ we consider accepting it. Ray in particular is complex, 
and there have been three or previous attempts at a provider, and we (the 
PMC/community) do not have the capacity to maintain this right now.

In short: build it, show us it is a) maintained, and b) popular enough; before 
we’ll accept it.

-ash


> On 29 May 2025, at 11:12, Jarek Potiuk <ja...@potiuk.com> wrote:
> 
> Yeah. Ray seems like a good candidate and having prior art with
> Astronomer's and Anyscale implementation - indeed might be easier and we
> might be able to bring it faster I guess.  Ray is hugely popular in ML
> world and from what I remember there were some cool ideas in the astro
> provider like custom decorators and making Ray not just executing single
> tasks, but also have dependencies between then and use shared memory of ray
> and like https://airflowsummit.org/sessions/2021/airflow-ray/ . Also (but
> that's something that might be implemented extra if someone is interested)
> having Ray Executor seems like an option.
> 
> The question is - as usual - about maintainability, It's easier to accept
> if:
> 
> * there is an organisation that is committed to maintain it
> * we can run end-2-end integration tests locally
> * if not possible because it's a service - there is a system dashboard that
> is maintained by 3rd-party that we can use as  a decision point that we
> suspend the provider if it is not really maintained in the future
> * lack of complicated/outdated dependencies that complicate our dependency
> management
> 
> I think if we got good answers to all 4 questions, Ray is no brainer to
> have (especially that you can install local cluster - which means that
> e-2-e tests are possible) - similarly as with Kafka amd recently
> Tinkerpop/Gremlin - while those were Apache projects, both had one in
> common- full integration tests were possible and we are running them in CI
> now - for basic scenarios - on top of unit tests. There is a bit of problem
> with the last point potentially - at least recently we had some problems
> with adding Rat to ai integration of Google provider
> https://github.com/apache/airflow/pull/49797 so it might be a bit
> problematic, but hopefully that was only "google" specific
> 
> J.
> 
> 
> 
> 
> 
> On Wed, May 28, 2025 at 5:42 PM Constance Martineau
> <consta...@astronomer.io.invalid> wrote:
> 
>> We do see a not insignificant number of our customers using Ray & Airflow
>> together on Astro and Astronomer-Software, so there is definitely interest
>> and I believe this could be really valuable.
>> https://github.com/astronomer/astro-provider-ray/ was created for
>> validation purposes, but we were not able to significantly invest due to
>> priority changes. I'd love to see an official provider within the project
>> -- checking pypistats quickly, there were more than 13M downloads last
>> month for Ray.
>> 
>> On Tue, May 27, 2025 at 3:30 PM Jens Scheffler <j_scheff...@gmx.de.invalid
>>> 
>> wrote:
>> 
>>> Hi,
>>> 
>>> Thanks for the proposal. I assume you have read the
>>> 
>>> 
>> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers
>>> docs?
>>> 
>>> By accident I had also a (not maturing) discussion about integration of
>>> Ray as cluster backend into Airflow workflows. But I am not sure how
>>> common the demand is. Are there any voices?
>>> 
>>> Have you considered just providing the operators separate and link them
>>> in the ecosystem?
>>> 
>>> Have you seen that there way a few years ago a provider being made in
>>> Github, but seems this was not maintained for a while:
>>> https://github.com/anyscale/airflow-provider-ray As well as
>>> https://github.com/astronomer/astro-provider-ray/
>>> 
>>> Jens
>>> 
>>> On 27.05.25 16:11, Maksim Yermakou wrote:
>>>> Hello all,
>>>> 
>>>> I would like to propose adding a new provider for the Ray[1] service to
>>> the
>>>> Airflow providers.
>>>> 
>>>> Ray is an open source framework to build and scale ML and Python
>>>> applications. In the current time in the Google provider we have two
>>>> services which can work with Ray are GKE and Vertex AI. But, it is
>>>> important to know that operators for working with Ray on GKE and on
>>>> VertexAI are only about creating a Ray cluster on Google Cloud
>>>> infrastructure. For starting a Ray application we need to start a Ray
>> Job
>>>> on the Ray Cluster. For doing it users need to use the Ray’s Python SDK
>>>> 
>>>> Knowing all of this I can suggest creating a new provider for Ray
>> itself
>>>> with operators which can manage Ray’s Job. Here[2] is the code for
>> Client
>>>> for working with Jobs. We need a new provider, because Ray is not a
>>> Google
>>>> service. And if we want to provide for users ability to submit jobs to
>>>> clusters then we need to create new operators and put them to the new
>>>> provider. Also, Ray can work with clusters which are deployed on AWS,
>>> Azure
>>>> and more. It means that operators from this provider can be used in
>>>> combination with operators from amazon and microsoft providers.
>>>> 
>>>> I have started the implementation. And I will be glad to hear from all
>> of
>>>> you any feedback about my proposal
>>>> 
>>>> 
>>>> [1] https://docs.ray.io/en/latest/index.html
>>>> [2]
>>>> 
>>> 
>> https://github.com/ray-project/ray/blob/master/python/ray/dashboard/modules/job/sdk.py#L35
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to