Re: [Discussion][Gandiva] Migration JIT engine from MCJIT to ORC v2

2023-12-06 Thread Yue Ni
the full compatibility status will require some more testing to figure out. Thanks. Regards Yue Ni [1] https://llvm.org/docs/ORCv2.html [2] https://releases.llvm.org/8.0.0/docs/ReleaseNotes.html#changes-to-the-jit-apis [3] https://github.com/apache/arrow/blob/92723f34f8df40b35e5840e61011c0076680

[Discussion][Gandiva] Migration JIT engine from MCJIT to ORC v2

2023-12-04 Thread Yue Ni
cted to be roughly the same. Any feedback is appreciated. Thanks. *References:* [1] https://llvm.org/docs/ORCv2.html [2] https://github.com/apache/arrow/issues/37848 Regards, Yue Ni

Re: Apache Arrow file format

2023-10-22 Thread Yue Ni
> Looks like projecting columns isn't available by default. > One of the benefits of Parquet file format is column projection, where the IO is limited to just the columns projected. > Unfortunately, you are correct that it doesn't allow for easy column projecting (you're going to read all the col

Re: [DISCUSS][Gandiva] External function registry proposal

2023-09-26 Thread Yue Ni
a, rather than > letting Gandiva discover them by itself. > > On Tue, Sep 26, 2023 at 2:14 AM Yue Ni wrote: > > > > The definition of an external function registry can certainly belong in > > Gandiva, but how it's populated should be left to third-party projects >

Re: [DISCUSS][Gandiva] External function registry proposal

2023-09-25 Thread Yue Ni
der` can return a list of bitcode buffers, so that the specific metadata/bitcode data population logic can be moved out of Gandiva? Thanks. Regards, Yue On Tue, Sep 26, 2023 at 12:25 AM Antoine Pitrou wrote: > > Hi Yue, > > Le 25/09/2023 à 18:15, Yue Ni a écrit : > > > >>

Re: [DISCUSS][Gandiva] External function registry proposal

2023-09-25 Thread Yue Ni
where so that it is generally > useful, not only for the contributors of the feature. > > Also, I hope that this will get more people interested in Gandiva > maintenance. > > Regards > > Antoine. > > > Le 25/09/2023 à 16:17, Yue Ni a écrit : > > Hi there, >

[DISCUSS][Gandiva] External function registry proposal

2023-09-25 Thread Yue Ni
https://github.com/apache/arrow/pull/37787 Regards, Yue Ni

Re: modeling column group

2023-01-01 Thread Yue Ni
row/blob/master/cpp/src/arrow/compute/row/row_internal.h > > On Sun, Jan 1, 2023 at 6:02 AM Yue Ni wrote: > > > > Hi there, > > > > Happy new year. > > > > I store some data in arrow IPC files. And I have two fields that are > always > > acce

modeling column group

2023-01-01 Thread Yue Ni
Hi there, Happy new year. I store some data in arrow IPC files. And I have two fields that are always accessed at the same time, namely, when accessing these two fields, they are accessed in a row oriented manner and are always fetched together, but other fields are accessed in columnar manner. O

Re: Apache Arrow development using CLion

2022-06-07 Thread Yue Ni
Hi Dulvin, I used CLion when working with Apache Arrow C++. You can give CMake presets [1] a try. CLion started to support CMake presets last year (2021.2) and your version (2022.1.2) should work fine [2]. And Apache Arrow provides several CMake presets [3], which you can choose from depending on

Re: ExecBatch in arrow execution engine

2022-05-09 Thread Yue Ni
gt; case. However, note that when there are intermediate nodes in between such > sending and receiving nodes, this may well break because an intermediate > node could output a fresh ExecBatch even when receiving a RichExecBatch as > input, like filter_node does [1], for example. >

ExecBatch in arrow execution engine

2022-05-09 Thread Yue Ni
Hi there, I would like to use apache arrow execution engine for some computation. I found `ExecBatch` instead of `RecordBatch` is used for execution engine's node, and I wonder how I can attach some additional information such as schema/metadata for the `ExecBatch` during execution so that they ca

Re: Designing standards for "sandboxed" Arrow user-defined functions [was Re: User defined "Arrow Compute Function"]

2022-04-26 Thread Yue Ni
This is a very interesting topic. I wonder if we have a UDF mechanism in arrow compute, is there any chance Gandiva's UDF could be integrated with arrow compute's UDF function registry? [1] >From an external user's perspective, Gandiva is part of arrow project, having two UDF registries that are no

Re: storing per record batch metadata in arrow IPC file

2022-04-06 Thread Yue Ni
bly doesn't matter much either way. > > So there might be some potential here but I wouldn't say it is a sure > thing. > > [1] https://github.com/apache/arrow/tree/master/format > [2] https://issues.apache.org/jira/browse/ARROW-6940 > > On Tue, Apr 5, 2022 at 7:26 PM Y

Re: storing per record batch metadata in arrow IPC file

2022-04-05 Thread Yue Ni
r.cc#L644 > [3]: > > https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L665 > [4]: > > https://github.com/apache/arrow/blob/apache-arrow-7.0.0/cpp/src/arrow/ipc/writer.cc#L1253 > > Aldrin Montana > Computer Science PhD Student > UC San

storing per record batch metadata in arrow IPC file

2022-04-05 Thread Yue Ni
Hi there, I am investigating analyzing time series data using apache arrow. I would like to store some record batch specific metadata, for example, some statistics/tags about data in a particular record batch. More specifically, I may use a single record batch to store metric samples for a certain

Arrow IPC file format mindmap

2021-12-08 Thread Yue Ni
Hi there, Recently I read Arrow's documentation about columnar format ( https://arrow.apache.org/docs/format/Columnar.html) and flatbuffers structures (Message.fbs/File.fbs/etc), but feel it is not easy to connect each pieces about what I read, so I created a mindmap to visualize the basic structu

Re: output_schema for ExecNode

2021-11-17 Thread Yue Ni
al algebra operators) the scanner should be enough. You can > get an iterator of batches and setup your own custom processing > pipeline from that. > > These are just ideas from the hip. I think most relational algebra > systems rely on fixed schemas, and I would worry a little abou

Re: output_schema for ExecNode

2021-11-15 Thread Yue Ni
t; you have a different use case for multiple schemas in mind that > doesn't quite fit the "promote to common schema" case? > > [1] https://issues.apache.org/jira/browse/ARROW-11003 > > > On Sun, Nov 14, 2021 at 7:03 PM Yue Ni wrote: > > > > Hi there, &g

output_schema for ExecNode

2021-11-14 Thread Yue Ni
Hi there, I am evaluating Apache Arrow C++ compute engine for my project, and wonder what the schema assumption is for execution operators in the compute engine. In my use case, multiple record batches for computation may have different schemas. I read the Apache Arrow Query Engine for C++ design

Re: Feather v2 random access

2020-06-24 Thread Yue Ni
Hi François, Thanks so much for the very detailed explanation, and that makes sense to me. I will check out the links for more information. @Wes, ARROW-8250 is very useful to me as well and I will keep an eye on it. Thanks. On Wed, Jun 24, 2020 at 11:08 PM Wes McKinney wrote: > See also this J

Feather v2 random access

2020-06-22 Thread Yue Ni
Hi there, I am evaluating using feather v2 on disk to store some data that needs random access. I did some experiments to see the performance, but since there are many scenarios I cannot verify each of them, I am searching for some details about how it works internally to understand if it satisfie

Re: Gandiva projector for dictionary array

2020-04-21 Thread Yue Ni
o the dev mail list, is it correct? Or do we have other place/process requiring more formal proposal like PEP for Python? On Wed, Apr 22, 2020 at 12:22 AM Wes McKinney wrote: > On Tue, Apr 21, 2020 at 6:34 AM Yue Ni wrote: > > > > Hi there, > > > > I am currently

Gandiva projector for dictionary array

2020-04-21 Thread Yue Ni
Hi there, I am currently using gandiva C++ library doing projection/selection for Arrow record batch, in my record batch, I have some fields encoded with dictionary encoding, I wonder how I can apply gandiva functions for these dictionary encoded fields. Currently, there is no gandiva function ha