On Wed, 20 Aug 2025 at 04:08, Jacob Wujciak <assignu...@apache.org> wrote:
> > Secondly, this will be the first time I will be maintaining an Apache > > project, and I am not very familiar with the internal processes you use. > I feel I might > > move faster with a repo under my own user > > This does sound like it might be another use case for the 'arrow-contrib' > org: > Apache Datafusion has a community run, non-apache org called > 'datafusion-contrib' [1], where unofficial extensions and datafusion > related crates are developed. Once a project is mature/used enough it > can be donated to the ASF Datafusion TLP (so that is not a necessity). > This was for example done for Datafusion for Ray [2]. Though > apparently it will now be archived due to a lack of maintenance [3]. > (So maybe not the best example xD) > > The idea of creating a similar org for arrow has been brought up a > number of times in the community meeting, This would not come with the > 'red tape' of an ASF project and would allow faster initial > development for the Erlang implementation. > > That sounds like a good option. However, I don't want to eliminate developing this as an ASF project from the start. I figure that this will eventually become a regular ASF project, so I might as well get accustomed to it now. Is there a document with all the "red tape" an ASF project entails? If we were to do this, would the Erlang implementation be considered "official" and linked from the docs? I would like to improve awareness of the project, and I'd prefer it be mentioned in the official docs even as an alpha release. I think that is important in addition to promoting it on Elixir/Erlang specific channels. I also forgot to mention this in my previous email, but would any Arrow maintainer be able to review PRs to this project, maybe multiple times a week? I remember having many arrow specific doubts while working on this, and I think it would be wise to have someone re-check my work to ensure I haven't misinterpreted anything in the specifications and generally keep an eye from the Apache side. I also have 2 other reviewers from the Erlang Ecosystem Foundation reviewing my Erlang code, so that part is already taken care of. Regarding the ip clearance process (that as you say will need to > happen at some point of moving the implementation into > apache/arrow-erlang), IIRC as long as the code has always been > licensed under ASL 2.0 the process is more of a formality and > shouldn't be too hard to do. > The code is indeed licensed under ASL 2.0, so I think we can go with the ip clearance process then. Are there any other legal matters that need to be addressed? On Tue, 19 Aug 2025 at 14:09, Antoine Pitrou <anto...@python.org> wrote: > There isn't an official criterion for declaring an implementation > "complete" (and we don't really use that term, either). > > What is important is to address the most common needs that your users > may have (such as OpenTelemetry data payloads). That makes sense. > I would personally suggest: > > - support the most common data types (all primitive types + at least > list and struct + dictionary + basic support for extension types) > - support either the C Data Interface or the IPC format (preferably both) > > In the IPC format, you don't need to support everything (tensors are > rarely used, for example; endianness conversion is only useful if you > plan to exchange data with big-endian systems...). > > As of right now, we support about half of all primitive types and most of the lists (under nested types), but none of the special or extension types. We also have some rudimentary support for IPC (since that's needed for OTel). I plan to add support for everything under the Columnar Format anyway, so it's just a matter of time. Is Flight and friends handled by the Arrow team? How often and where is Flight used? Hi Benjamin, > > Le 14/08/2025 à 20:17, Benjamin Philip a écrit : > > > >> serialization/deserialization features but arrow-rs provides > >> more features such as computation features. > > > > This reminds me. What features will I have to support out of > > (de)serialization > > for an implementation to be considered complete? > > You're probably aware of https://arrow.apache.org/docs/dev/status.html , > otherwise it will give you an idea of the variety of features that *can* > be implemented. > This list only lists support for serialization and deserialization of various data types, whether that be the Columnar Format, the IPC Format or Flight. I realize that the words "out of" weren't very clear, but what I meant was what should I support *apart from* serde? For example, Sutou mentioned computation. I don't see a list of supported computations anywhere, what computations must I provide? I'm guessing serde (i.e. R/W of Arrow arrays) and computations (i.e. transformations of Arrow arrays) are it, but are there any other high-level features I should support? -- bp