This is essentially the same idea as the proposal here I think -- row/map-based representation & conversion functions for ease of use:
[RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry, increase adoption/audience and productivity. · Issue #12618 · apache/arrow (github.com) <https://github.com/apache/arrow/issues/12618> Definitely a worthwhile pursuit IMO. On Thu, Jul 28, 2022 at 4:46 PM Sasha Krassovsky <krassovskysa...@gmail.com> wrote: > Hi everyone, > I just wanted to chime in that we already do have a form of row-oriented > storage inside of `arrow/compute/row/row_internal.h`. It is used to store > rows inside of GroupBy and Join within Acero. We also have utilities for > converting to/from columnar storage (and AVX2 implementations of these > conversions) inside of `arrow/compute/row/encode_internal.h`. Would it be > useful to standardize this row-oriented format? > > As far as I understand fixed-width rows would be trivially convertible > into this representation (just a pointer to your array of structs), while > variable-width rows would need a little bit of massaging (though not too > much) to be put into this representation. > > Sasha Krassovsky > > > On Jul 28, 2022, at 1:10 PM, Laurent Quérel <laurent.que...@gmail.com> > wrote: > > > > Thank you Micah for a very clear summary of the intent behind this > > proposal. Indeed, I think that clarifying from the beginning that this > > approach aims at facilitating experimentation more than efficiency in > terms > > of performance of the transformation phase would have helped to better > > understand my objective. > > > > Regarding your question, I don't think there is a specific technical > reason > > for such an integration in the core library. I was just thinking that it > > would make this infrastructure easier to find for the users and that this > > topic was general enough to find its place in the standard library. > > > > Best, > > Laurent > > > > On Thu, Jul 28, 2022 at 12:50 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> Hi Laurent, > >> I'm retitling this thread to include the specific languages you seem to > be > >> targeting in the subject line to hopefully get more eyes from > maintainers > >> in those languages. > >> > >> Thanks for clarifying the goals. If I can restate my understanding, the > >> intended use-case here is to provide easy (from the developer point of > >> view) adaptation of row based formats to Arrow. The means of achieving > >> this is creating an API for a row-base structure, and having utility > >> classes that can manipulate the interface to build up batches (there > are no > >> serialization or in memory spec associated with this API). People > wishing > >> to integrate a specific row based format, can extend that API at > whatever > >> level makes sense for the format. > >> > >> I think this would be useful infrastructure as long as it was made clear > >> that in many cases this wouldn't be the most efficient way to convert to > >> Arrow from other formats. > >> > >> I don't work much with either the Rust or Go implementation, so I can't > >> speak to if there is maintainer support for incorporating the changes > >> directly in Arrow. Is there any technical reasons for preferring to > have > >> this included directly in Arrow vs a separate library? > >> > >> Cheers, > >> Micah > >> > >> On Thu, Jul 28, 2022 at 12:34 PM Laurent Quérel < > laurent.que...@gmail.com> > >> wrote: > >> > >>> Far be it from me to think that I know more than Jorge or Wes on this > >>> subject. Sorry if my post gives that perception, that is clearly not my > >>> intention. I'm just trying to defend the idea that when designing this > >> kind > >>> of transformation, it might be interesting to have a library to test > >>> several mappings and evaluate them before doing a more direct > >>> implementation if the performance is not there. > >>> > >>> On Thu, Jul 28, 2022 at 12:15 PM Benjamin Blodgett < > >>> benjaminblodg...@gmail.com> wrote: > >>> > >>>> He was trying to nicely say he knows way more than you, and your ideas > >>>> will result in a low performance scheme no one will use in production > >>>> ai/machine learning. > >>>> > >>>> Sent from my iPhone > >>>> > >>>>> On Jul 28, 2022, at 12:14 PM, Benjamin Blodgett < > >>>> benjaminblodg...@gmail.com> wrote: > >>>>> > >>>>> I think Jorge’s opinion has is that of an expert and him being > >> humble > >>>> is just being tactful. Probably listen to Jorge on performance and > >>>> architecture, even over Wes as he’s contributed more than anyone else > >> and > >>>> know the bleeding edge of low level performance stuff more than > anyone. > >>>>> > >>>>> Sent from my iPhone > >>>>> > >>>>>> On Jul 28, 2022, at 12:03 PM, Laurent Quérel < > >>> laurent.que...@gmail.com> > >>>> wrote: > >>>>>> > >>>>>> Hi Jorge > >>>>>> > >>>>>> I don't think that the level of in-depth knowledge needed is the > >> same > >>>>>> between using a row-oriented internal representation and "Arrow" > >> which > >>>> not > >>>>>> only changes the organization of the data but also introduces a set > >> of > >>>>>> additional mapping choices and concepts. > >>>>>> > >>>>>> For example, assuming that the initial row-oriented data source is a > >>>> stream > >>>>>> of nested assembly of structures, lists and maps. The mapping of > >> such > >>> a > >>>>>> stream to Protobuf, JSON, YAML, ... is straightforward because on > >> both > >>>>>> sides the logical representation is exactly the same, the schema is > >>>>>> sometimes optional, the interest of building batches is optional, > >> ... > >>> In > >>>>>> the case of "Arrow" things are different - the schema and the > >> batching > >>>> are > >>>>>> mandatory. The mapping is not necessarily direct and will generally > >> be > >>>> the > >>>>>> result of the combination of several trade-offs (normalization vs > >>>>>> denormalization representation, mapping influencing the compression > >>>> rate, > >>>>>> queryability with Arrow processors like DataFusion, ...). Note that > >>>> some of > >>>>>> these complexities are not intrinsically linked to the fact that the > >>>> target > >>>>>> format is column oriented. The ZST format ( > >>>>>> https://zed.brimdata.io/docs/formats/zst/) for example does not > >>>> require an > >>>>>> explicit schema definition. > >>>>>> > >>>>>> IMHO, having a library that allows you to easily experiment with > >>>> different > >>>>>> types of mapping (without having to worry about batching, > >>> dictionaries, > >>>>>> schema definition, understanding how lists of structs are > >> represented, > >>>> ...) > >>>>>> and to evaluate the results according to your specific goals has a > >>> value > >>>>>> (especially if your criteria are compression ratio and > >> queryability). > >>> Of > >>>>>> course there is an overhead to such an approach. In some cases, at > >> the > >>>> end > >>>>>> of the process, it will be necessary to manually perform this direct > >>>>>> transformation between a row-oriented XYZ format and "Arrow". > >> However, > >>>> this > >>>>>> effort will be done after a simple experimentation phase to avoid > >>>> changes > >>>>>> in the implementation of the converter which in my opinion is not so > >>>> simple > >>>>>> to implement with the current Arrow API. > >>>>>> > >>>>>> If the Arrow developer community is not interested in integrating > >> this > >>>>>> proposal, I plan to release two independent libraries (Go and Rust) > >>> that > >>>>>> can be used on top of the standard "Arrow" libraries. This will have > >>> the > >>>>>> advantage to evaluate if such an approach is able to raise interest > >>>> among > >>>>>> Arrow users. > >>>>>> > >>>>>> Best, > >>>>>> > >>>>>> Laurent > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Wed, Jul 27, 2022 at 9:53 PM Jorge Cardoso Leitão < > >>>>>>> jorgecarlei...@gmail.com> wrote: > >>>>>>> > >>>>>>> Hi Laurent, > >>>>>>> > >>>>>>> I agree that there is a common pattern in converting row-based > >>> formats > >>>> to > >>>>>>> Arrow. > >>>>>>> > >>>>>>> Imho the difficult part is not to map the storage format to Arrow > >>>>>>> specifically - it is to map the storage format to any in-memory > >> (row- > >>>> or > >>>>>>> columnar- based) format, since it requires in-depth knowledge about > >>>> the 2 > >>>>>>> formats (the source format and the target format). > >>>>>>> > >>>>>>> - Understanding the Arrow API which can be challenging for complex > >>>> cases of > >>>>>>>> rows representing complex objects (list of struct, struct of > >> struct, > >>>>>>> ...). > >>>>>>>> > >>>>>>> > >>>>>>> the developer would have the same problem - just shifted around - > >>> they > >>>> now > >>>>>>> need to convert their complex objects to the intermediary > >>>> representation. > >>>>>>> Whether it is more "difficult" or "complex" to learn than Arrow is > >> an > >>>> open > >>>>>>> question, but we would essentially be shifting the problem from > >>>> "learning > >>>>>>> Arrow" to "learning the Intermediate in-memory". > >>>>>>> > >>>>>>> @Micah Kornfield, as described before my goal is not to define a > >>> memory > >>>>>>>> layout specification but more to define an API and a translation > >>>>>>> mechanism > >>>>>>>> able to take this intermediate representation (list of generic > >>> objects > >>>>>>>> representing the entities to translate) and to convert it into one > >>> or > >>>>>>> more > >>>>>>>> Arrow records. > >>>>>>>> > >>>>>>> > >>>>>>> imho a spec of "list of generic objects representing the entities" > >> is > >>>>>>> specified by an in-memory format (not by an API spec). > >>>>>>> > >>>>>>> A second challenge I anticipate is that in-memory formats > >> inneerently > >>>> "own" > >>>>>>> the memory they outline (since by definition they outline how this > >>>> memory > >>>>>>> is outlined). An Intermediate in-memory representation would be no > >>>>>>> different. Since row-based formats usually require at least one > >>>> allocation > >>>>>>> per row (and often more for variable-length types), the > >>> transformation > >>>>>>> (storage format -> row-based in-memory format -> Arrow) incurs a > >>>>>>> significant cost (~2x slower last time I played with this problem > >> in > >>>> JSON > >>>>>>> [1]). > >>>>>>> > >>>>>>> A third challenge I anticipate is that given that we have 10+ > >>>> languages, we > >>>>>>> would eventually need to convert the intermediary representation > >>> across > >>>>>>> languages, which imo just hints that we would need to formalize an > >>>> agnostic > >>>>>>> spec for such representation (so languages agree on its > >>>> representation), > >>>>>>> and thus essentially declare a new (row-based) format. > >>>>>>> > >>>>>>> (none of this precludes efforts to invent an in-memory row format > >> for > >>>>>>> analytics workloads) > >>>>>>> > >>>>>>> @Wes McKinney <wesmck...@gmail.com> > >>>>>>> > >>>>>>> I still think having a canonical in-memory row format (and > >> libraries > >>>>>>>> to transform to and from Arrow columnar format) is a good idea — > >> but > >>>>>>>> there is the risk of ending up in the tar pit of reinventing Avro. > >>>>>>>> > >>>>>>> > >>>>>>> afaik Avro does not have O(1) random access neither to its rows nor > >>>> columns > >>>>>>> - records are concatenated back to back, every record's column is > >>>>>>> concatenated back to back within a record, and there is no indexing > >>>>>>> information on how to access a particular row or column. There are > >>>> blocks > >>>>>>> of rows that reduce the cost of accessing large offsets, but imo it > >>> is > >>>> far > >>>>>>> from the O(1) offered by Arrow (and expected by analytics > >> workloads). > >>>>>>> > >>>>>>> [1] https://github.com/jorgecarleitao/arrow2/pull/1024 > >>>>>>> > >>>>>>> Best, > >>>>>>> Jorge > >>>>>>> > >>>>>>> On Thu, Jul 28, 2022 at 5:38 AM Laurent Quérel < > >>>> laurent.que...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Let me clarify the proposal a bit before replying to the various > >>>> previous > >>>>>>>> feedbacks. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> It seems to me that the process of converting a row-oriented data > >>>> source > >>>>>>>> (row = set of fields or something more hierarchical) into an Arrow > >>>> record > >>>>>>>> repeatedly raises the same challenges. A developer who must > >> perform > >>>> this > >>>>>>>> kind of transformation is confronted with the following questions > >>> and > >>>>>>>> problems: > >>>>>>>> > >>>>>>>> - Understanding the Arrow API which can be challenging for complex > >>>> cases > >>>>>>> of > >>>>>>>> rows representing complex objects (list of struct, struct of > >> struct, > >>>>>>> ...). > >>>>>>>> > >>>>>>>> - Decide which Arrow schema(s) will correspond to your data > >> source. > >>> In > >>>>>>> some > >>>>>>>> complex cases it can be advantageous to translate the same > >>>> row-oriented > >>>>>>>> data source into several Arrow schemas (e.g. OpenTelementry data > >>>>>>> sources). > >>>>>>>> > >>>>>>>> - Decide on the encoding of the columns to make the most of the > >>>>>>>> column-oriented format and thus increase the compression rate > >> (e.g. > >>>>>>> define > >>>>>>>> the columns that should be represent as dictionaries). > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> By experience, I can attest that this process is usually > >> iterative. > >>>> For > >>>>>>>> non-trivial data sources, arriving at the arrow representation > >> that > >>>>>>> offers > >>>>>>>> the best compression ratio and is still perfectly usable and > >>> queryable > >>>>>>> is a > >>>>>>>> long and tedious process. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> I see two approaches to ease this process and consequently > >> increase > >>>> the > >>>>>>>> adoption of Apache Arrow: > >>>>>>>> > >>>>>>>> - Definition of a canonical in-memory row format specification > >> that > >>>> every > >>>>>>>> row-oriented data source provider can progressively adopt to get > >> an > >>>>>>>> automatic translation into the Arrow format. > >>>>>>>> > >>>>>>>> - Definition of an integration library allowing to map any > >>>> row-oriented > >>>>>>>> source into a generic row-oriented source understood by the > >>>> converter. It > >>>>>>>> is not about defining a unique in-memory format but more about > >>>> defining a > >>>>>>>> standard API to represent row-oriented data. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> In my opinion these two approaches are complementary. The first > >>> option > >>>>>>> is a > >>>>>>>> long-term approach targeting directly the data providers, which > >> will > >>>>>>>> require to agree on this generic row-oriented format and whose > >>>> adoption > >>>>>>>> will be more or less long. The second approach does not directly > >>>> require > >>>>>>>> the collaboration of data source providers but allows an > >>> "integrator" > >>>> to > >>>>>>>> perform this transformation painlessly with potentially several > >>>>>>>> representation trials to achieve the best results in his context. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> The current proposal is an implementation of the second approach, > >>>> i.e. an > >>>>>>>> API that maps a row-oriented source XYZ into an intermediate > >>>> row-oriented > >>>>>>>> representation understood mechanically by the translator. This > >>>> translator > >>>>>>>> also adds a series of optimizations to make the most of the Arrow > >>>> format. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> You can find multiple examples of a such transformation in the > >>>> following > >>>>>>>> examples: > >>>>>>>> > >>>>>>>> - > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >>> > >> > https://github.com/lquerel/otel-arrow-adapter/blob/main/pkg/otel/trace/otlp_to_arrow.go > >>>>>>>> this example converts OTEL trace entities into their > >> corresponding > >>>>>>> Arrow > >>>>>>>> IR. At the end of this conversion the method returns a collection > >>> of > >>>>>>>> Arrow > >>>>>>>> Records. > >>>>>>>> - A more complex example can be found here > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >>> > >> > https://github.com/lquerel/otel-arrow-adapter/blob/main/pkg/otel/metrics/otlp_to_arrow.go > >>>>>>>> . > >>>>>>>> In this example a stream of OTEL univariate row-oriented metrics > >>> are > >>>>>>>> translate into multivariate row-oriented metrics and then > >>>>>>> automatically > >>>>>>>> translated into Apache Records. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> In these two examples, the creation of dictionaries and > >> multi-column > >>>>>>>> sorting is automatically done by the framework and the developer > >>>> doesn’t > >>>>>>>> have to worry about the definition of Arrow schemas. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Now let's get to the answers. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> @David Lee, I don't think Parquet and from_pylist() solve this > >>> problem > >>>>>>>> particularly well. Parquet is a column-oriented data file format > >> and > >>>>>>>> doesn't really help to perform this transformation. The Python > >>> method > >>>> is > >>>>>>>> relatively limited and language specific. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> @Micah Kornfield, as described before my goal is not to define a > >>>> memory > >>>>>>>> layout specification but more to define an API and a translation > >>>>>>> mechanism > >>>>>>>> able to take this intermediate representation (list of generic > >>> objects > >>>>>>>> representing the entities to translate) and to convert it into one > >>> or > >>>>>>> more > >>>>>>>> Arrow records. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> @Wes McKinney, If I interpret your answer correctly, I think you > >> are > >>>>>>>> describing the option 1 mentioned above. Like you I think it is an > >>>>>>>> interesting approach although complementary to the one I propose. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Looking forward to your feedback. > >>>>>>>> > >>>>>>>> On Wed, Jul 27, 2022 at 4:19 PM Wes McKinney <wesmck...@gmail.com > >>> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> We had an e-mail thread about this in 2018 > >>>>>>>>> > >>>>>>>>> https://lists.apache.org/thread/35pn7s8yzxozqmgx53ympxg63vjvggvm > >>>>>>>>> > >>>>>>>>> I still think having a canonical in-memory row format (and > >>> libraries > >>>>>>>>> to transform to and from Arrow columnar format) is a good idea — > >>> but > >>>>>>>>> there is the risk of ending up in the tar pit of reinventing > >> Avro. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, Jul 27, 2022 at 5:11 PM Micah Kornfield < > >>>> emkornfi...@gmail.com > >>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Are there more details on what exactly an "Arrow Intermediate > >>>>>>>>>> Representation (AIR)" is? We've talked about in the past maybe > >>>>>>> having > >>>>>>>> a > >>>>>>>>>> memory layout specification for row-based data as well as column > >>>>>>> based > >>>>>>>>>> data. There was also a recent attempt at least in C++ to try to > >>>>>>> build > >>>>>>>>>> utilities to do these pivots but it was decided that it didn't > >> add > >>>>>>> much > >>>>>>>>>> utility (it was added a comprehensive example). > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Micah > >>>>>>>>>> > >>>>>>>>>> On Tue, Jul 26, 2022 at 2:26 PM Laurent Quérel < > >>>>>>>> laurent.que...@gmail.com > >>>>>>>>>> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> In the context of this OTEP > >>>>>>>>>>> < > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >>> > >> > https://github.com/lquerel/oteps/blob/main/text/0156-columnar-encoding.md > >>>>>>>>>>>> > >>>>>>>>>>> (OpenTelemetry > >>>>>>>>>>> Enhancement Proposal) I developed an integration layer on top > >> of > >>>>>>>> Apache > >>>>>>>>>>> Arrow (Go an Rust) to *facilitate the translation of > >> row-oriented > >>>>>>>> data > >>>>>>>>>>> stream into an arrow-based columnar representation*. In this > >>>>>>>> particular > >>>>>>>>>>> case the goal was to translate all OpenTelemetry entities > >>> (metrics, > >>>>>>>>> logs, > >>>>>>>>>>> or traces) into Apache Arrow records. These entities can be > >> quite > >>>>>>>>> complex > >>>>>>>>>>> and their corresponding Arrow schema must be defined on the > >> fly. > >>>>>>> IMO, > >>>>>>>>> this > >>>>>>>>>>> approach is not specific to my specific needs but could be used > >>> in > >>>>>>>> many > >>>>>>>>>>> other contexts where there is a need to simplify the > >> integration > >>>>>>>>> between a > >>>>>>>>>>> row-oriented source of data and Apache Arrow. The trade-off is > >> to > >>>>>>>> have > >>>>>>>>> to > >>>>>>>>>>> perform the additional step of conversion to the intermediate > >>>>>>>>>>> representation, but this transformation does not require to > >>>>>>>> understand > >>>>>>>>> the > >>>>>>>>>>> arcana of the Arrow format and allows to potentially benefit > >> from > >>>>>>>>>>> functionalities such as the encoding of the dictionary "for > >>> free", > >>>>>>>> the > >>>>>>>>>>> automatic generation of Arrow schemas, the batching, the > >>>>>>> multi-column > >>>>>>>>>>> sorting, etc. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I know that JSON can be used as a kind of intermediate > >>>>>>> representation > >>>>>>>>> in > >>>>>>>>>>> the context of Arrow with some language specific > >> implementation. > >>>>>>>>> Current > >>>>>>>>>>> JSON integrations are insufficient to cover the most complex > >>>>>>>> scenarios > >>>>>>>>> and > >>>>>>>>>>> are not standardized; e.g. support for most of the Arrow data > >>> type, > >>>>>>>>> various > >>>>>>>>>>> optimizations (string|binary dictionaries, multi-column > >> sorting), > >>>>>>>>> batching, > >>>>>>>>>>> integration with Arrow IPC, compression ratio optimization, ... > >>> The > >>>>>>>>> object > >>>>>>>>>>> of this proposal is to progressively cover these gaps. > >>>>>>>>>>> > >>>>>>>>>>> I am looking to see if the community would be interested in > >> such > >>> a > >>>>>>>>>>> contribution. Above are some additional details on the current > >>>>>>>>>>> implementation. All feedback is welcome. > >>>>>>>>>>> > >>>>>>>>>>> 10K ft overview of the current implementation: > >>>>>>>>>>> > >>>>>>>>>>> 1. Developers convert their row oriented stream into records > >>>>>>> based > >>>>>>>>> on > >>>>>>>>>>> the Arrow Intermediate Representation (AIR). At this stage the > >>>>>>>>>>> translation > >>>>>>>>>>> can be quite mechanical but if needed developers can decide > >> for > >>>>>>>>> example > >>>>>>>>>>> to > >>>>>>>>>>> translate a map into a struct if that makes sense for them. > >> The > >>>>>>>>> current > >>>>>>>>>>> implementation support the following arrow data types: bool, > >> all > >>>>>>>>> uints, > >>>>>>>>>>> all > >>>>>>>>>>> ints, all floats, string, binary, list of any supported types, > >>>>>>> and > >>>>>>>>>>> struct > >>>>>>>>>>> of any supported types. Additional Arrow types could be added > >>>>>>>>>>> progressively. > >>>>>>>>>>> 2. The row oriented record (i.e. AIR record) is then added to > >> a > >>>>>>>>>>> RecordRepository. This repository will first compute a schema > >>>>>>>>> signature > >>>>>>>>>>> and > >>>>>>>>>>> will route the record to a RecordBatcher based on this > >>>>>>> signature. > >>>>>>>>>>> 3. The RecordBatcher is responsible for collecting all the > >>>>>>>>> compatible > >>>>>>>>>>> AIR records and, upon request, the "batcher" is able to build > >> an > >>>>>>>>> Arrow > >>>>>>>>>>> Record representing a batch of compatible inputs. In the > >> current > >>>>>>>>>>> implementation, the batcher is able to convert string columns > >> to > >>>>>>>>>>> dictionary > >>>>>>>>>>> based on a configuration. Another configuration allows to > >>>>>>> evaluate > >>>>>>>>> which > >>>>>>>>>>> columns should be sorted to optimize the compression ratio. > >> The > >>>>>>>> same > >>>>>>>>>>> optimization process could be applied to binary columns. > >>>>>>>>>>> 4. Steps 1 through 3 can be repeated on the same > >>>>>>> RecordRepository > >>>>>>>>>>> instance to build new sets of arrow record batches. Subsequent > >>>>>>>>>>> iterations > >>>>>>>>>>> will be slightly faster due to different techniques used (e.g. > >>>>>>>>> object > >>>>>>>>>>> reuse, dictionary reuse and sorting, ...) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> The current Go implementation > >>>>>>>>>>> <https://github.com/lquerel/otel-arrow-adapter> (WIP) is > >>> currently > >>>>>>>>> part of > >>>>>>>>>>> this repo (see pkg/air package). If the community is > >> interested, > >>> I > >>>>>>>>> could do > >>>>>>>>>>> a PR in the Arrow Go and Rust sub-projects. > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Laurent Quérel > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Laurent Quérel > >>>> > >>> > >>> > >>> -- > >>> Laurent Quérel > >>> > >> > > > > > > -- > > Laurent Quérel > >