Re: [DISCUSS][Erlang] Erlang Apache Arrow Implementation

Benjamin Philip Wed, 20 Aug 2025 01:21:22 -0700

On Wed, 20 Aug 2025 at 04:08, Jacob Wujciak <[email protected]> wrote:

> > Secondly, this will be the first time I will be maintaining an Apache
> > project, and I am not very familiar with the internal processes you use.
> I feel I might
> > move faster with a repo under my own user
>
> This does sound like it might be another use case for the 'arrow-contrib'
> org:
> Apache Datafusion has a community run, non-apache org called
> 'datafusion-contrib' [1], where unofficial extensions and datafusion
> related crates are developed. Once a project is mature/used enough it
> can be donated to the ASF Datafusion TLP (so that is not a necessity).
> This was for example done for Datafusion for Ray [2]. Though
> apparently it will now be archived due to a lack of maintenance [3].
> (So maybe not the best example xD)
>
> The idea of creating a similar org for arrow has been brought up a
> number of times in the community meeting, This would not come with the
> 'red tape' of an ASF project  and would allow faster initial
> development for the Erlang implementation.
>
>
That sounds like a good option. However, I don't want to eliminate
developing this as an ASF project from the start. I figure that this will
eventually become a regular ASF project, so I might as well get accustomed
to it now. Is there a document with all the "red tape" an ASF project
entails?

If we were to do this, would the Erlang implementation be considered
"official" and linked from the docs? I would like to improve awareness of
the project, and I'd prefer it be mentioned in the official docs even as an
alpha release. I think that is important in addition to promoting it on
Elixir/Erlang specific channels.

I also forgot to mention this in my previous email, but would any Arrow
maintainer be able to review PRs to this project, maybe multiple times a
week? I remember having many arrow specific doubts while working on this,
and I think it would be wise to have someone re-check my work to ensure I
haven't misinterpreted anything in the specifications and generally keep an
eye from the Apache side. I also have 2 other reviewers from the Erlang
Ecosystem Foundation reviewing my Erlang code, so that part is already
taken care of.

Regarding the ip clearance process (that as you say will need to
> happen at some point of moving the implementation into
> apache/arrow-erlang), IIRC as long as the code has always been
> licensed under ASL 2.0 the process is more of a formality and
> shouldn't be too hard to do.
>

The code is indeed licensed under ASL 2.0, so I think we can go with the ip
clearance process then. Are there any other legal matters that need to be
addressed?

On Tue, 19 Aug 2025 at 14:09, Antoine Pitrou <[email protected]> wrote:

> There isn't an official criterion for declaring an implementation
> "complete" (and we don't really use that term, either).
>
> What is important is to address the most common needs that your users
> may have (such as OpenTelemetry data payloads).

That makes sense.

> I would personally suggest:
>
> - support the most common data types (all primitive types + at least
> list and struct + dictionary + basic support for extension types)
> - support either the C Data Interface or the IPC format (preferably both)
>
> In the IPC format, you don't need to support everything (tensors are
> rarely used, for example; endianness conversion is only useful if you
> plan to exchange data with big-endian systems...).
>
>
As of right now, we support about half of all primitive types and most of
the lists (under nested types), but none of the special or extension types.
We also have some rudimentary support for IPC (since that's needed for
OTel). I plan to add support for everything under the Columnar Format
anyway, so it's just a matter of time. Is Flight and friends handled by the
Arrow team? How often and where is Flight used?

Hi Benjamin,
>
> Le 14/08/2025 à 20:17, Benjamin Philip a écrit :
> >
> >> serialization/deserialization features but arrow-rs provides
> >> more features such as computation features.
> >
> > This reminds me. What features will I have to support out of
> > (de)serialization
> > for an implementation to be considered complete?
>
> You're probably aware of https://arrow.apache.org/docs/dev/status.html ,
> otherwise it will give you an idea of the variety of features that *can*
> be implemented.
>

 This list only lists support for serialization and deserialization of
various data types, whether that be the Columnar Format, the IPC Format or
Flight. I realize that the words "out of" weren't very clear, but what I
meant was what should I support *apart from* serde? For example, Sutou
mentioned computation. I don't see a list of supported computations
anywhere, what computations must I provide? I'm guessing serde (i.e. R/W of
Arrow arrays) and computations (i.e. transformations of Arrow arrays) are
it, but are there any other high-level features I should support?

-- bp

Re: [DISCUSS][Erlang] Erlang Apache Arrow Implementation

Reply via email to