[ 
https://issues.apache.org/jira/browse/SPARK-55278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069566#comment-18069566
 ] 

Haiyang Sun commented on SPARK-55278:
-------------------------------------

Hi [~varun244], thank you for your interest in contributing to the project. 
Though the provided design doc looks high-levelly relevant to the original SPIP 
proposal, there are many details that require more careful design and thus it 
is still far from realistic production.

We are setting up several important key abstractions for implementing the UDF 
protocol and there will more followups to incrementally build up the 
architecture. Please feel free to follow this epic for more updates and 
feedbacks are welcome.

By setting up more core interface and specs, there will be more concrete 
micro-tasks that could be easier to contribute later.

> SPIP: Language-agnostic UDF Protocol for Spark
> ----------------------------------------------
>
>                 Key: SPARK-55278
>                 URL: https://issues.apache.org/jira/browse/SPARK-55278
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Connect, PySpark
>    Affects Versions: 4.2.0
>            Reporter: Haiyang Sun
>            Priority: Major
>              Labels: SPIP, pull-request-available
>
> Run user-provided code in Spark {*}consistently across many programming 
> languages{*}.
> Today, Spark Connect allows users to write queries from multiple languages, 
> but support for user-defined functions is incomplete. In practice, only 
> Python has a mature solution, and it relies on language-specific mechanisms 
> that do not generalize to other languages such as 
> [Go|https://github.com/apache/spark-connect-go] / 
> [Rust|https://github.com/apache/spark-connect-rust] / 
> [Swift|https://github.com/apache/spark-connect-swift] / 
> [.NET|https://github.com/GoEddie/spark-connect-dotnet] (where UDF is not 
> supported).
> Our objective is to define a *unified API and execution protocol* for 
> user-defined functions that run outside the Spark engine process via 
> inter-process communication (IPC). This allows Spark to interact with 
> external workers in a consistent way, regardless of the language used to 
> implement the function.
> SPIP doc: 
> [https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0]
> Worker Specification doc: 
> [https://docs.google.com/document/d/1Dx9NqHRNuUpatH9DYoFF9cmvUl2fqHT4Rjbyw4EGLHs/edit?tab=t.0]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to