Hi all,

I’d like to start a discussion on a draft SPIP: Language-agnostic UDF
Protocol for Spark

JIRA: https://issues.apache.org/jira/browse/SPARK-55278

Doc:
https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0

tl;dr

The SPIP proposes a structured, language-agnostic execution protocol for
running user-defined functions (UDFs) in Spark across multiple programming
languages.

Today, Spark Connect allows users to write queries from multiple languages,
but support for user-defined functions remains incomplete. In practice,
only Scala / Java / Python / R have working support, and it relies on
language-specific mechanisms that do not generalize well to other languages
such as Go (Apache Spark Connect Go
<https://github.com/apache/spark-connect-go>), Rust (Apache Spark Connect
Rust <https://github.com/apache/spark-connect-rust>), Swift (Apache Spark
Connect Swift <https://github.com/apache/spark-connect-swift>), or .NET (Spark
Connect DotNet <https://github.com/GoEddie/spark-connect-dotnet>), where
UDF support is currently unavailable. There are also legacy limitations
around the existing PySpark worker.py implementation that can be improved
with the proposal.

This proposal aims to define a unified API and execution protocol for UDFs
that run outside the Spark executor process and communicate with Spark via
inter-process communication (IPC). The goal is to enable Spark to interact
with external workers in a consistent and extensible way, regardless of the
implementation language.

I’m happy to help drive the discussion and development of this proposal,
and I would greatly appreciate feedback from the community.

Thanks,

Haiyang Sun

Reply via email to