Re: [ANNOUNCE] New Arrow PMC chair: Kouhei Sutou

2022-01-29 Thread Fan Liya
Congratulations, Kou! Best, Liya Fan Daniël Heres 于2022年1月28日周五 16:04写道: > Congratulations! > > Op vr 28 jan. 2022 om 05:00 schreef Bryan Cutler : > > > Congratulations Kou, thanks for all your work! > > > > On Thu, Jan 27, 2022, 4:36 PM Sutou Kouhei wrote: > > > > > Thanks everyone!!! > > > >

Re: [ANNOUNCE] New Arrow committer: Alessandro Molina

2022-01-06 Thread Fan Liya
Congratulations, Alessandro! Best, Liya Fan Nic 于2022年1月6日周四 17:45写道: > Congratulations! > > On Thu, 6 Jan 2022 at 07:58, Alenka Frim wrote: > > > Congratulations! > > > > On Wed, Jan 5, 2022 at 8:37 PM Krisztián Szűcs < > szucs.kriszt...@gmail.com> > > wrote: > > > > > Congrats Alessandro! >

Re: [ANNOUNCE] New Arrow PMC member: Yibo Cai

2022-01-06 Thread Fan Liya
Congratulations, Yibo! Best, Liya Fan Krisztián Szűcs 于2022年1月6日周四 19:37写道: > Congrats Yibo! > > On Thu, Jan 6, 2022 at 10:45 AM Nic wrote: > > > > Congratulations! > > > > On Tue, 4 Jan 2022 at 17:23, Ian Joiner wrote: > > > > > Congrats Yibo! > > > > > > Ian > > > > > > On Tuesday, January

Re: Java code review request

2022-01-06 Thread Fan Liya
I will take a look. Thanks. Best, Liya Fan Laurent Goujon 于2022年1月6日周四 22:18写道: > Hi, > > I made a small change to the Java Arrow library to support newer versions > of Jackson, as more projects use Jackson 2.12 or higher but Arrow still > uses Jackson 2.11. > > Jira ticket link is https://issu

Re: [ANNOUNCE] New Arrow committer: Matt Topol

2021-08-31 Thread Fan Liya
Congratulations, Matt! Best, Liya Fan On Tue, Aug 31, 2021 at 1:38 PM Weston Pace wrote: > Congratulations Matt! > > On Mon, Aug 30, 2021 at 5:36 PM Micah Kornfield > wrote: > > > > On behalf of the Apache Arrow PMC, I'm happy to announce that Matt Topol > > has accepted an invitation to becom

Re: [Java] C Data Interface and dictionaries

2021-08-25 Thread Fan Liya
Hi roee, It seems that we have both raw value and encoded value types in the Java implementation, so there is no information loss? In particular, we have org.apache.arrow.vector.types.pojo.FieldType#type for the raw type and org.apache.arrow.vector.types.pojo.FieldType#dictionary#indexType for th

Re: [VOTE][Format] Clarify allowed value range for the Time types

2021-08-23 Thread Fan Liya
+1 On Fri, Aug 20, 2021 at 11:37 PM Micah Kornfield wrote: > +1 (binding) > > On Fri, Aug 20, 2021 at 7:46 AM Keith Kraus > wrote: > > > +1 (non-binding) > > > > On Fri, Aug 20, 2021 at 9:49 AM Rok Mihevc wrote: > > > > > +1 (non-binding) > > > > > > On Fri, Aug 20, 2021 at 3:46 PM Jorge Cardo

Re: [VOTE][Format] Add in a new interval type can combines Month, Days and Nanoseconds

2021-08-17 Thread Fan Liya
+1 On Wed, Aug 18, 2021 at 8:28 AM Keith Kraus wrote: > +1 (non-binding) > > On Tue, Aug 17, 2021 at 7:34 PM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > +1 > > > > On Tue, Aug 17, 2021 at 8:50 PM Micah Kornfield > > wrote: > > > > > Hello, > > > As discussed previously [1],

Re: [ANNOUNCE] New Arrow committer: QP Hou

2021-07-27 Thread Fan Liya
Congratulations, QP! Best, Liya Fan On Tue, Jul 27, 2021 at 11:39 PM Weston Pace wrote: > Congratulations QP! > > On Tue, Jul 27, 2021, 12:37 AM Rok Mihevc wrote: > > > Congrats QP! > > > > Rok > > > > On Tue, Jul 27, 2021 at 9:21 AM QP Hou wrote: > > > > > > Thank you all for the warm welcom

Re: [Java] Is hardcoding NullVector .getField() intentional?

2021-07-27 Thread Fan Liya
Hi AI, I understand your concern. It makes sense to me. I am not aware of any special reason for this. So if there are no objections, I think it would be reasonable to change this to make the NullVector consistent with other vectors. Best, Liya Fan On Fri, Jul 23, 2021 at 8:26 PM Al Taylor wro

Re: [ANNOUNCE] New Arrow PMC member: David M Li

2021-06-22 Thread Fan Liya
Congratulations David! Best, Liya Fan On Wed, Jun 23, 2021 at 9:44 AM Yibo Cai wrote: > Congrats David! > > On 6/22/21 8:56 PM, David Li wrote: > > Thanks everyone! > > > > I've learned a lot and had a great time contributing here, and I look > > forward to continuing to work with everybody. >

Re: [ANNOUNCE] New Arrow committer: Kazuaki Ishizaki

2021-06-07 Thread Fan Liya
Congratulations, Kazuaki! Best, Liya Fan On Tue, Jun 8, 2021 at 7:59 AM Rok Mihevc wrote: > Congrats! > > On Tue, Jun 8, 2021 at 1:36 AM Micah Kornfield > wrote: > > > Congrats! > > > > On Monday, June 7, 2021, Bryan Cutler wrote: > > > > > Congratulations!! > > > > > > On Sun, Jun 6, 2021, 7

Re: [ANNOUNCE] New Arrow committer: Dominik Moritz

2021-06-04 Thread Fan Liya
Congratulations Dominik! Best, Liya Fan On Thu, Jun 3, 2021 at 10:45 AM David Li wrote: > Congratulations Dominik! > > -David > > On Wed, Jun 2, 2021, at 18:09, Rok Mihevc wrote: > > Congrats Dominik! > > > > On Thu, Jun 3, 2021 at 1:03 AM Micah Kornfield > > >

Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-06 Thread Fan Liya
Congratulations, Ben! Best, Liya Fan On Fri, May 7, 2021 at 4:23 AM Bryan Cutler wrote: > Congrats Ben! > > On Thu, May 6, 2021 at 12:05 PM Antoine Pitrou wrote: > > > > > Congratulations Ben :-) > > > > > > Le 06/05/2021 à 21:02, Rok Mihevc a écrit : > > > Congrats! > > > > > > On Thu, May 6,

Re: [JAVA] issues encountered during build

2021-03-17 Thread Fan Liya
Hi Bob, Thanks a lot for your follow-up. Maybe you need to send a separate email to the dev to apply for the contributor permission. Best, Liya Fan On Thu, Mar 18, 2021 at 3:37 AM bobtins wrote: > > > On 2021/03/12 06:36:24, Fan Liya wrote: > > Hi Bob, > > > &g

Re: [JAVA] issues encountered during build

2021-03-11 Thread Fan Liya
Hi Bob, Thanks for reporting the issues. I remember encountering the same problems with the JDBC tests (over one year ago). Maybe it is not just related to the time zone, it is also related to the machine locale. I think we can open an issue to track it. Best, Liya Fan On Fri, Mar 12, 2021 at

Re: [VOTE] Allow source-only release vote for patch releases

2021-02-28 Thread Fan Liya
+1 On Sun, Feb 28, 2021 at 12:17 PM Ying Zhou wrote: > +1 (non-binding) > > > On Feb 27, 2021, at 11:19 AM, Neal Richardson < > neal.p.richard...@gmail.com> wrote: > > > > We've had some discussion about ways to reduce the cost of releasing and > > ways to allow maintainers of subprojects to mak

Re: [Java] Problem with maven build in docker

2021-02-26 Thread Fan Liya
issue is that current master is version 4.0.0-SNAPSHOT now, > but your PR is 3.0.0-SNAPSHOT: > https://github.com/apache/arrow/blob/master/java/format/pom.xml#L18 > > Thanks, > > Emilio > > On 2/26/21 4:58 AM, Fan Liya wrote: > > Dear all, > > > > In a recent

[Java] Problem with maven build in docker

2021-02-26 Thread Fan Liya
Dear all, In a recent PR [1], I have created a new sub-module of the Java project (arrow-compression). It works on my local machine, and the build finished successfully. However, it fails in the docker build [2], pointing to an error in the new pom.xml [3] file: *Non-resolvable parent POM for or

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Fan Liya
Hi Joris, The Java support for lz4 compression is on-going ( https://github.com/apache/arrow/pull/8949). Integration with C++/Python is not finished yet. We would appreciate it if you could share the file to help us with the integration test. Best, Liya Fan On Fri, Jan 29, 2021 at 2:41 AM Antoi

Re: [Java] PR review for ARROW-11173

2021-01-21 Thread Fan Liya
I will take a look in one or two days. Best, Liya Fan On Wed, Jan 20, 2021 at 3:48 AM Bryan Cutler wrote: > Hi Nick, > I left a note in the PR that I will try to review soon, thanks! > > > On Sun, Jan 17, 2021 at 8:22 PM Nick Bruno wrote: > > > Hi All, > > > > I'd like to get feedback on the p

Re: [Discuss] Should dense union offsets be always increasing?

2020-11-19 Thread Fan Liya
I think the Java implementation is not aligning with the spec, either. IMO, option 2 provides more performance optimization opportunities. However, it may lead to some unexpected behaviors. For example, when we change the value of one slot, the values of several other slots may be changed as well.

Re: Graph model in arrow

2020-11-18 Thread Fan Liya
Hi Leo, For graph data model, I can think of two popular ways of representations: 1) adjacent matrix: an n x n matrix A (where n is the number of vertices), and Aij = 1 indicates an arc from i to j. 2) adjacent list: a table head node for each vertex, and a list for each vertex to store arcs. For

Re: [ANNOUNCE] New Arrow committer: Andrew Lamb

2020-11-10 Thread Fan Liya
Congratulations, Andrew. Best, Liya Fan On Wed, Nov 11, 2020 at 7:50 AM Keerat Singh wrote: > Congratulations Andrew. 👏 > > On Tue, Nov 10, 2020 at 11:12 AM Andrew Lamb wrote: > > > Thank you all for your welcome and the effort you put into fostering this > > great community. I look forward to

Re: [ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-25 Thread Fan Liya
Congratulations, Wes! Best, Liya Fan On Sun, Oct 25, 2020 at 9:58 PM Wes McKinney wrote: > Thanks all! > > On Sun, Oct 25, 2020 at 6:29 AM Krisztián Szűcs > wrote: > > > > Congrats Wes! > > > > On Sun, Oct 25, 2020 at 2:40 AM David Li wrote: > > > > > > Congratulations Wes! > > > > > > Best,

Re: [Java] ArrowBuf bounds checking in getBytes/setBytes

2020-10-15 Thread Fan Liya
Hi Benjamin, Nice catch! The code has been like this for quite some time. I think one reason is that the 'setBytes' and 'getBytes' APIs support manipulating data in large batches, so it is less performance-critical. IMO, it is reasonable to respect the constant to improve performance. Best, Liy

Re: [VOTE] Accept donation of Julia implementation for Apache Arrow

2020-10-13 Thread Fan Liya
+1 (non-binding) Best, Liya Fan On Wed, Oct 14, 2020 at 9:02 AM Sutou Kouhei wrote: > +1 (binding) > > In > "[VOTE] Accept donation of Julia implementation for Apache Arrow" on > Mon, 12 Oct 2020 13:35:14 -0700, > Neal Richardson wrote: > > > Hi all, > > Last month [1] Jacob Quinn propos

Re: [VOTE][Format] Allow for 256-bit Decimal's in the Arrow specification

2020-09-29 Thread Fan Liya
+1 Best, Liya Fan On Tue, Sep 29, 2020 at 4:55 PM Antoine Pitrou wrote: > > +1 (binding) > > I didn't look at the implementation. > > Regards > > Antoine. > > > Le 29/09/2020 à 06:54, Micah Kornfield a écrit : > > I've opened a PR that updates the specification to allow for 256-bit > > Decimal

Re: Hello to the Arrow dev community

2020-09-23 Thread Fan Liya
Welcome, Bob. Thanks for sharing the interesting story. Best, Liya Fan On Wed, Sep 23, 2020 at 12:28 PM Micah Kornfield wrote: > Welcome to the community Bob. > > On Tue, Sep 22, 2020 at 12:27 PM Bob Tinsman wrote: > > > I'd like to introduce myself, because I've had an interest in Arrow for

Re: How to run Java benchmark?

2020-09-22 Thread Fan Liya
Hi Kazuaki, It seems the reason is that we have missed exec-maven-plugin in the pom.xml. We did not include it, because it would run all the benchmarks during maven build, which is extremely time consuming. I have opened ARROW-10069 to track this issue. Hopefully, I will provide a PR soon. Best,

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-09-21 Thread Fan Liya
with it and for Java at least Jacques is > opposed > > to it? > > > > Testing changes that break big-endian can be a potential drag on > developer > > productivity but there are methods to run locally (at least on more > recent > > OSes). > > > > Thoug

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-11 Thread Fan Liya
in the vector class for this > > specialized use case? If the user is advanced, that short memory access > > invocation seems fine to use. The whole idea with Arrow is that if you > have > > a specialized algorithm, you can hand write memory reads and writes > because

Re: Arrow as a streaming format

2020-09-09 Thread Fan Liya
+1 for introducing Arrow in streaming processing, as we have made some attempts on this. IMO, the metadata overhead is not likely to be a problem. If the streaming data is having a high arriving rate, we can compensate for this with a large batch size without impacting the response time, while if

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-09 Thread Fan Liya
://github.com/apache/arrow/pull/8147 On Fri, Mar 13, 2020 at 9:47 PM Fan Liya wrote: > Hi Jacques, > > Thanks a lot for your valuable comments. > > I agree with you that collapsing nullable and non-nullable implementations > is a good idea, and it does not contradict with the ide

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Fan Liya
Thank Kazuaki for the survey and thank Micah for starting the discussion. I do not oppose supporting BE. In fact, I am in general optimistic about the performance impact (for Java). IMO, this is going to be a painful way (many byte order related problems are tricky to debug), so I hope we can make

Re: [Java] Supporting Big Endian

2020-08-16 Thread Fan Liya
Thank Kazuaki Ishizaki for working on this. IMO, supporting the big-endian should be a large change, as in many places of the code base, we have implicitly assumed the little-endian platform (e.g. https://github.com/apache/arrow/blob/master/java/memory/memory-core/src/main/java/org/apache/arrow/mem

Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

2020-08-04 Thread Fan Liya
Hi Ji, IMO, for the correct order, the validity buffer should precede the offset buffer (e.g. this is the order used by BaseVariableWidthVector & BaseLargeVariableWidthVector). In ListVector#getBuffers, the offset buffer precedes the validity buffer, so I am a little confused why you say the order

Re: [DISCUSS] How to extended time value range for Timestamp type?

2020-08-04 Thread Fan Liya
Hi Ji, This sounds like a universal requirement, as 64-bit is not sufficient to hold the precision for nano-second. For the extension type, we have two choices: 1. Extending struct(int64, int32), which represents the design of SoA (Struct of Arrays). 2. Extending fixed width binary(12), which rep

Re: [DISCUSS] Support of higher bit-width Decimal type

2020-07-28 Thread Fan Liya
Hi Micah, Thanks for opening the discussion. I am aware of some scenarios where decimal requires more than 16 bytes, so I think it would be beneficial to support this in Arrow. Best, Liya Fan On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield wrote: > Hi Arrow Dev, > ZetaSQL (Google's open sour

Re: Question: How to pass data between two languages interprocess without extra libraries?

2020-07-06 Thread Fan Liya
Hi Teng, Arrow provides two formats for IPC between different languages: streaming and file. This article gives a tutorial for Java: https://arrow.apache.org/docs/java/ipc.html For other languages, it may be helpful to read the test cases. Best, Liya Fan On Sun, Jul 5, 2020 at 4:24 PM Teng Pen

Problem with master build failing

2020-07-02 Thread Fan Liya
Dear all, Currently, master build is failing occasionally. After investigation, we find it was caused by a cyclic dependency when class loading. We have provided a patch for it [1]. Please take a look. Best, Liya Fan [1] https://github.com/apache/arrow/pull/7628

Re: [ACTION REQUIRED] Changes to Arrow JIRA-related e-mail notifications

2020-06-18 Thread Fan Liya
JIRA e-mail notifications from within JIRA, > too. > > On Wed, Jun 17, 2020 at 11:08 PM Fan Liya wrote: > > > > Hi Wes, > > > > Thank you for your effort. > > I sent an email to issues-subscr...@arrow.apache.org, but got no > response. > > In add

Re: [ACTION REQUIRED] Changes to Arrow JIRA-related e-mail notifications

2020-06-17 Thread Fan Liya
Hi Wes, Thank you for your effort. I sent an email to issues-subscr...@arrow.apache.org, but got no response. In addition, I am not receiving JIRA information now. Best, Liya Fan On Mon, Jun 15, 2020 at 3:50 AM Wes McKinney wrote: > hi folks, > > Per the mailing list discussion and INFRA-20419

Re: [ANNOUNCE] New Arrow committers: Ji Liu and Liya Fan

2020-06-11 Thread Fan Liya
Dear all, I want to thank you all for all your kind help. It is a great honor to work with you in this great community. I Hope we can contribute more and make the community better. Best, Liya Fan On Fri, Jun 12, 2020 at 12:02 PM Ji Liu wrote: > Thanks everyone for the warm welcome! > It's a gr

Re: Help with Java PR backlog

2020-06-11 Thread Fan Liya
I would like to help with the review. I will spend some time on it late today. Best, Liya Fan On Fri, Jun 12, 2020 at 9:56 AM Wes McKinney wrote: > hi folks, > > There's a number of Java PRs that seem like they are close to being in > a merge-ready state, could we try to get the Java backlog m

Re: Problem with building C++ flight code

2020-05-11 Thread Fan Liya
f > which are test helpers. So I am guessing that if you build with the C++ > tests turned off, then it should compile. > > Neal > > On Mon, May 11, 2020 at 2:25 AM Fan Liya wrote: > > > Hi Antoine, > > > > I manually downloaded a boost package from https:/

Re: Problem with building C++ flight code

2020-05-11 Thread Fan Liya
release/1.71.0/source/boost_1_71_0.tar.gz;https://github.com/boostorg/boost/archive/boost-1.71.0.tar.gz;https://github.com/ursa-labs/thirdparty/releases/download/latest/boost_1_71_0.tar.gz ' tag='' Best, Liya Fan On Mon, May 11, 2020 at 3:45 PM Antoine Pitrou wrote: > > Le 11/

Re: Problem with building C++ flight code

2020-05-10 Thread Fan Liya
don't need to depend on your system package manager's Boost > > On Sun, May 10, 2020 at 9:23 PM Fan Liya wrote: > > > > Hi all, > > > > I was using the following command to build the flight code: > > > > cmake -DCMAKE_BUILD_TYPE=Debug -DARROW_FLIGHT

Problem with building C++ flight code

2020-05-10 Thread Fan Liya
Hi all, I was using the following command to build the flight code: cmake -DCMAKE_BUILD_TYPE=Debug -DARROW_FLIGHT=ON ../arrow/cpp make arrow_flight and got the following error: fatal error: boost/process.hpp: No such file or directory After some investigation, it seems the boost/process.hpp f

Re: [VOTE] Add "trivial" RecordBatch body compression to Arrow IPC protocol

2020-04-22 Thread Fan Liya
My vote: +1 Best, Liya Fan On Thu, Apr 23, 2020 at 8:24 AM Wes McKinney wrote: > Hello, > > I have proposed adding a simple RecordBatch IPC message body > compression scheme (using either LZ4 or ZSTD) to the Arrow IPC > protocol in GitHub PR [1] as discussed on the mailing list [2]. This > is d

Re: [Java] Memory Allocation Tips

2020-04-20 Thread Fan Liya
Hi Razvan, Arrow Java is based on off-heap memory. So it does not rely on GC. Some of the recommended best practice can be found in https://arrow.apache.org/docs/java/vector.html Best, Liya Fan On Mon, Apr 20, 2020 at 8:05 PM Razvan Chitu wrote: > Hi, > > Does the Arrow community have any ti

Re: 0.17 release blog post: help needed

2020-04-20 Thread Fan Liya
I have added some Java items. Best, Liya Fan On Mon, Apr 20, 2020 at 10:49 AM Kenta Murata wrote: > I've edited Ruby and C GLib parts. > Kou and Shiro will check them later. > > 2020年4月20日(月) 11:09 Wes McKinney : > > > > I made a pass through the changelog and added a bunch of TODOs related > >

Re: ORC JNI wrapper bugs [Re: 0.17 release procedure]

2020-04-16 Thread Fan Liya
One way to skip a test class is to place a "@Ignore" annotation in front of the class declaration. Best, Liya Fan On Thu, Apr 16, 2020 at 7:29 PM Krisztián Szűcs wrote: > On Thu, Apr 16, 2020 at 11:47 AM Antoine Pitrou > wrote: > > > > > > The ORC JNI wrapper is currently crashing on these lin

Re: Java: DefaultVectorComparators - invalid implementation

2020-04-09 Thread Fan Liya
Hi Martin, Thank you so much for reporting this problem. In the current implementation, we do not consider corner cases related to integer overflow, and this problem should be fixed. I have opened an issue to track this problem [1]. Do you want to provide a patch for it? Best, Liya Fan [1] htt

Re: Preparing for 0.17.0 Arrow release

2020-03-31 Thread Fan Liya
I see ARROW-6871 in the list. It seems it has some bugs, which are being fixed by ARROW-8239. So I have added ARROW-8239 to the list. The PR for ARROW-8239 is already approved, so it is expected to be resolved soon. Best, Liya Fan On Wed, Apr 1, 2020 at 12:01 PM Micah Kornfield wrote: > I move

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-25 Thread Fan Liya
d decompression overhead > > > > is little compared with the time savings due to high compression > > > > ratios. If people would like to see these numbers to help make a > > > > decision I can take a closer look > > > > > > > > As

Re: [DISCUSS][Java] Enhance code style checking for Java code

2020-03-14 Thread Fan Liya
e feedback. Thank you in advance. Best, Liya Fan [1] https://issues.apache.org/jira/browse/ARROW-8121 [2] https://github.com/apache/arrow/pull/6622 On Tue, Dec 24, 2019 at 4:17 PM Fan Liya wrote: > Hi Micah, > > Thanks a lot for your feedback. > > In the PR, I have update

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-13 Thread Fan Liya
And there is a "nullable" metadata-only flag at the > > > Field level. Could the same kinds of optimizations be implemented in > > > Java without introducing a "nullable" concept? > > > > Note Liya Fan did suggest pulling the nullable flag from t

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-11 Thread Fan Liya
omplexity this would > introduce. > > Thanks, > Micah > > On Tue, Mar 10, 2020 at 6:42 AM Fan Liya wrote: > > > Hi Wes, > > > > Thanks a lot for your quick reply. > > I think what you mentioned is almost exactly what we want to do in > Java.The > >

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-10 Thread Fan Liya
le" concept? > > - Wes > > On Tue, Mar 10, 2020 at 8:13 AM Fan Liya wrote: > > > > Dear all, > > > > A non-nullable vector is one that is guaranteed to contain no nulls. We > > want to support non-nullable vectors in Java. > > > > *Motiva

[DISCUSS][Java] Support non-nullable vectors

2020-03-10 Thread Fan Liya
Dear all, A non-nullable vector is one that is guaranteed to contain no nulls. We want to support non-nullable vectors in Java. *Motivations:* 1. It is widely used in practice. For example, in a database engine, a column can be declared as not null, so it cannot contain null values. 2.Non-nullabl

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-06 Thread Fan Liya
ZSTD. > > > https://github.com/apache/arrow/blob/apache-arrow-0.16.0/docs/source/format/Columnar.rst#extension-types > > On Thu, Mar 5, 2020 at 7:56 AM Fan Liya wrote: > > > > Hi Wes, > > > > Thanks a lot for your further clarification. > > > >

Re: [ANNOUNCE] New Arrow PMC member: Francois Saint-Jacques

2020-03-05 Thread Fan Liya
Congratulations, Francois Saint-Jacques! Best, Liya Fan On Thu, Mar 5, 2020 at 12:52 AM Wes McKinney wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Francois Saint-Jacques to become a PMC member and we are pleased to > announce > that Francois has accepted. > > C

Re: [ANNOUNCE] New Arrow PMC member: Neal Richardson

2020-03-05 Thread Fan Liya
Congratulations, Neal Richardson! Best, Liya Fan On Thu, Mar 5, 2020 at 12:51 AM Wes McKinney wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Neal Richardson to become a PMC member and we are pleased to announce > that Neal has accepted. > > Congratulations and we

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-05 Thread Fan Liya
n details more concrete) > > So in the USER_DEFINED case, how will the library know how to obtain > the uncompressed buffer? Is some additional metadata structure > required to provide instructions? > > On Wed, Mar 4, 2020 at 8:05 AM Fan Liya wrote: > > > > Hi Wes, > &

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-04 Thread Fan Liya
Hi Wes, I am thinking of adding an option named "USER_DEFINED" (or something similar) to enum CompressionType in your proposal. IMO, this option should be used primarily in Flight. Best, Liya Fan On Wed, Mar 4, 2020 at 11:12 AM Wes McKinney wrote: > On Tue, Mar 3, 2020, 8:1

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-03 Thread Fan Liya
Sure. I agree with you that we should not overdo this. I am wondering if we should provide an option to allow users to plugin their customized compression strategies. Best, Liya Fan On Tue, Mar 3, 2020 at 9:47 PM Wes McKinney wrote: > On Tue, Mar 3, 2020, 7:36 AM Fan Liya wrote: > >

Re: [DISCUSS] Adding "trivial" buffer compression option to IPC protocol (ARROW-300)

2020-03-03 Thread Fan Liya
I am so glad to see this discussion, and I am willing to provide help from the Java side. In the proposal, I see the support for basic compression strategies (e.g.gzip, snappy). IMO, applying a single basic strategy is not likely to achieve performance improvement for most scenarios. The optimal c

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-13 Thread Fan Liya
+1 (binding) On Thu, Feb 13, 2020 at 11:52 AM Wes McKinney wrote: > +1 (binding) > > On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou wrote: > > > > > > Ah, you're right, it's PR 6040: > > https://github.com/apache/arrow/pull/6040 > > > > Similarly, the C++ implementation is at PR 6026: > > https

Re: [Java] Issues with IntelliJ + errorprone + OpenJDK

2020-02-04 Thread Fan Liya
> > > installing in my local m2 repo but that didn't work. > > > > > > If anyone could scan their local drive for this file and let me know > > where > > > it is installed that could unblock me. > > > > > > Thanks, > > > > > > Andy. >

Re: [Java] Issues with IntelliJ + errorprone + OpenJDK

2020-02-03 Thread Fan Liya
I was having the same problem, and it was solved by 1. Install the "Error Prone Compiler" plugin to intellij 2. setting "Settings/Build, Execution, Deployment/Compiler/Java Compiler/Use compiler" to "Javac with error-prone" I am using Intellij 2019.3 (Community Edition) Best, Liya Fan On Tue, F

Re: [Java] PR Reviewers

2020-01-28 Thread Fan Liya
Hi Micah, Thank you so much for investing huge amounts of effort in reviewing Java PRs. I understand that you will stop reviewing Java PRs and focus on higher priority issues. However, I still hope you can (if possible) participate in relatively important Java discussions and give your valuable

Re: [Java] Large Memory Allocators (Taking a dependency on JNA?)

2020-01-19 Thread Fan Liya
> > memory. > > > > A simple way to do this is either to use unsafe directly or call the > > existing netty unsafe facade directly. > > > > PlatformDependent.allocateMemory(long) > > PlatformDependent.freeMemory(long) > > > > Should be re

Re: [Java] Large Memory Allocators (Taking a dependency on JNA?)

2020-01-18 Thread Fan Liya
Hi Micah, Thanks for the good suggestion. JNA seems like a good and reasonable tool for allocating large memory chunks. How about we directly use Java UNSAFE API? It seems the allocateMemory API is also based on the malloc method of the native implementation [1]. Best, Liya Fan [1] http://hg.op

Re: [C++] Arrow added to OSS-Fuzz

2020-01-15 Thread Fan Liya
Hi Antoine, Good job! And thanks for sharing the great news! Best, Liya Fan On Thu, Jan 16, 2020 at 2:59 AM Antoine Pitrou wrote: > > Hello, > > I would like to announce that Arrow has been accepted on the OSS-Fuzz > infrastructure (a continuous fuzzing infrastructure operated by Google): > ht

Re: Looking to 1.0

2020-01-04 Thread Fan Liya
gt; > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya wrote: > > > Hi Jacques, > > > > I am interested in the issues, and if it is possible, I would like to try > > to resolve them. > > > > Thanks. > > > > Liya Fan > > > > On Sat, Jan 4, 2

Re: Looking to 1.0

2020-01-03 Thread Fan Liya
I am sorry. I did not notice the issues have already been assigned. Best, Liya Fan On Sat, Jan 4, 2020 at 12:15 PM Fan Liya wrote: > Hi Jacques, > > I am interested in the issues, and if it is possible, I would like to try > to resolve them. > > Thanks. > > Liya Fan

Re: Looking to 1.0

2020-01-03 Thread Fan Liya
Hi Jacques, I am interested in the issues, and if it is possible, I would like to try to resolve them. Thanks. Liya Fan On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau wrote: > I identified three things in the java library that I think are top of mind > and should be fixed before 1.0 to avoid

Re: [DISCUSS][Java] Enhance code style checking for Java code

2019-12-24 Thread Fan Liya
nge) and from that point > forward validate the format as part of CI. > > Cheers, > Micah > > On Wed, Dec 18, 2019 at 1:44 AM Fan Liya wrote: > > > Dear all, > > > > We want to enhance the Java code style checking. > > > > This is due to a discussion

Re: [DISCUSS][C++] Pointer name aliasing

2019-12-22 Thread Fan Liya
IMO, this question relates to something general and fundamental. Generally, name alias leads to two results: 1) It makes writing code easier 2) It makes reading code more difficult Personally, I prefer readability to writability. However, I am wrondering if we have some general principles regardi

[DISCUSS][Java] Enhance code style checking for Java code

2019-12-18 Thread Fan Liya
Dear all, We want to enhance the Java code style checking. This is due to a discussion in [1]. In the discussion, we found the current style checking for Java code is not sufficient. So we want to enhace it in a series of "small" steps, in order to avoid having to change too many files at once.

Re: [ANNOUNCE] New Arrow committer: Joris van den Bossche

2019-12-09 Thread Fan Liya
Congratulations, Joris! Best, Liya Fan On Mon, Dec 9, 2019 at 7:55 PM Wes McKinney wrote: > On behalf of the Arrow PMC, I'm happy to announce that Joris has > accepted an invitation to become a committer on Apache Arrow. > > Welcome, and thank you for your contributions! >

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-08 Thread Fan Liya
+1, as this is useful IMO. Best, Liya Fan On Sat, Dec 7, 2019 at 12:21 PM Jacques Nadeau wrote: > -1 (binding) > > I'm voting -1 on this. I posted the thinking why on the PR. The high-level > is that I think it needs to better address the pipelined use case as right > now it fails to support th

Re: Java - Spark dataframe to Arrow format

2019-12-06 Thread Fan Liya
ney > *Sent:* Thursday, December 5, 2019 6:53 AM > *To:* dev > *Cc:* Fan Liya ; > jeetendra.jais...@impetus.co.in.invalid > > *Subject:* Re: Java - Spark dataframe to Arrow format > > hi folks, > > I understand the question to be about serialization. > > see >

Re: Java - Spark dataframe to Arrow format

2019-12-05 Thread Fan Liya
Hi Jeetendra, I am not sure if I understand your question correctly. Arrow is an in-memory columnar data format, and Spark has its own in-memory data format for DataFrame, which is invisible to end users. So the Spark user has no control over the underlying in-memory layout. If you really want t

Re: Unions: storing type_ids or type_codes?

2019-11-26 Thread Fan Liya
Hi Antoine, For Java, the physical child id is the same as the logical type code, as the index of each child vector is the code (ordinal) of the vector's minor type. This leads to a problem, that only a single vector for each type can exist in a union vector, so strictly speaking, the Java impleme

Re: [VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-25 Thread Fan Liya
I am sorry I did not follow the thread closely (will follow up later). However, the proposal above looks good to me. So I am +0.5 for this. Best, Liya Fan On Tue, Nov 26, 2019 at 1:12 PM Micah Kornfield wrote: > Could other members of the community chime in on this? In particular > getting vie

Re: Java API for Arrow Compute

2019-11-25 Thread Fan Liya
Hi Yuan, Currently, we have some APIs in the algorithm module of the Java project. If you have more requirements, maybe you can describe your requirements/scenarios, and start a discussion in the mailing list. Best, Liya Fan On Mon, Nov 25, 2019 at 11:17 PM Wes McKinney wrote: > There is a li

Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-24 Thread Fan Liya
Hi Wes, Thanks for your clarification. I agree with you that the problem should be considered in the implementation level. Best, Liya Fan On Mon, Nov 25, 2019 at 10:34 AM Wes McKinney wrote: > On Sun, Nov 24, 2019 at 8:07 PM Fan Liya wrote: > > > > Hi Wes, > > > >

Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-24 Thread Fan Liya
s no conflict with repeated or non-monotonic offset values. > > On Fri, Nov 22, 2019 at 1:49 AM Fan Liya wrote: > > > > This is an interesting question. > > IMO, to support repeated values, we also need to design a "coherency > > protocol", to avoid the

Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-21 Thread Fan Liya
This is an interesting question. IMO, to support repeated values, we also need to design a "coherency protocol", to avoid the scenario where once a value is witten, the change is propagated to another slot unexpectedly. Best, Liya Fan On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield wrote: > Hmm

Re: [Discuss][Java] Provide default for io.netty.tryReflectionSetAccessible to prevent errors

2019-11-20 Thread Fan Liya
Hi Bryan, Thanks for bringing this up. +1 for the change. I am not clear what is the right place to override the jvm property. It is possible that when we try to override it (possibly in a static block), the old property value has already been read by netty library. To avoid this problem, do we n

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-11-15 Thread Fan Liya
t;> > >> cell >>>>>>>> > >> > counts against massive contiguous memory is an anti pattern >>>>>>>> to scalable >>>>>>>> > >> > analytical processing--purely subjectiv

Re: [Java] Append multiple record batches together?

2019-11-14 Thread Fan Liya
chunked array with multiple vector buffers would be > >>> ideal, similar to C++. It might take a fair amount of work to add this > but > >>> would open up a lot more functionality. As for the API, > >>> VectorSchemaRoot.concat(Collection) seems good to me. >

[Discuss][Java] Appropriate semantics for comparing values in UnionVector

2019-11-14 Thread Fan Liya
Dear all, The problem arises from the discussion in a PR: https://github.com/apache/arrow/pull/5544#discussion_r338394941. We are trying to come up with a proper semantics to compare values in UnionVectors. According to the current logic in the code base, two values from two UnionVectors are com

Re: [Java] Question About Vector Allocation

2019-11-14 Thread Fan Liya
;} > > > > Thanks, > > > Azim Afroozeh > > On Fri, Nov 8, 2019 at 10:57 AM Fan Liya wrote: > > > Hi Azim, > > > > I think we should be aware of two distinct concepts: > > > > 1. vector capacity: the max number of values that can be sto

Re: [Java] Question About Vector Allocation

2019-11-08 Thread Fan Liya
Hi Azim, I think we should be aware of two distinct concepts: 1. vector capacity: the max number of values that can be stored in the vector, without reallocation 2. vector length: the number of values actually filled in the vector For any valid vector, we always have vector length <= vector capa

Re: [Java] Append multiple record batches together?

2019-11-07 Thread Fan Liya
Hi Micah, Thanks for bringing this up. > 1. An efficient solution already exists? It seems like TransferPair implementations could possibly be improved upon or have they already been optimized? Fundamnentally, memory copy is unavoidable, IMO, because the source and targe memory regions are like

Re: Arrow for low latency IPC

2019-11-01 Thread Fan Liya
Hi Samrat, Arrow has flexible support for IPC through grpc. The cpp benchmark can be found in: https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/flight_benchmark.cc The java benchmark can be found in: https://github.com/apache/arrow/blob/master/java/flight/src/test/java/org/apache

Re: [DISCUSS][Java] Builders for java classes

2019-10-24 Thread Fan Liya
Hi Micah, IMO, we need an adapter from on-heap array to off-heap array. This is useful because many third-party Java libraries populate data to an on-heap array. And I see this API in your design: IntVectorBuilder addAll(int[] values); So I am +1 for this. Best, Liya Fan On Thu, Oct 24, 2019

  1   2   >