Re: Best way to store ragged packet data in Parquet files

2020-11-04 Thread Rémi Dettai
Hi Jason! I guess this question would better echo on the Parquet mailing list https://parquet.apache.org/community/ Very interesting remark though. I looked into it and didn't find any obvious explanation. The entire size of the file is taken up by the "data" column as storing df[['data']] yields

[NIGHTLY] Arrow Build Report for Job nightly-2020-11-04-0

2020-11-04 Thread Crossbow
Arrow Build Report for Job nightly-2020-11-04-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-11-04-0 Failed Tasks: - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-11-04-0-travis-gandiva-jar-xenial - test-co

PyArrow: Using is_in compute to filter list of strings in a Table

2020-11-04 Thread Niklas B
Hi, I’m trying in Python to (without reading entire parquet file into memory) filter out certain rows (based on uuid-strings). My approach is to read each row group, then try to filter it without casting it to pandas (since it’s expensive for data-frames with lots of strings it in). Looking in

Re: PyArrow: Using is_in compute to filter list of strings in a Table

2020-11-04 Thread Joris Van den Bossche
Hi Niklas, The "is_in" docstring is not directly clear about it, but you need to pass the second argument as a keyword argument using "value_set" keyword name. Small example: In [19]: pc.is_in(pa.array(["a", "b", "c", "d"]), value_set=pa.array(["a", "c"])) Out[19]: [ true, false, true, f

Re: PyArrow: Using is_in compute to filter list of strings in a Table

2020-11-04 Thread Niklas B
Thank you! This looks awesome. Any good way to inverse the ChunkedArray? I know I can cast to Numpy (experimental) and do it there, but would love a native arrow version :) > On 4 Nov 2020, at 14:45, Joris Van den Bossche > wrote: > > Hi Niklas, > > The "is_in" docstring is not directly clea

Re: PyArrow: Using is_in compute to filter list of strings in a Table

2020-11-04 Thread Niklas B
Never mind, I realized I can use the pyarrow.compute.invert. Thank you again for the super fast answer > On 4 Nov 2020, at 15:13, Niklas B wrote: > > Thank you! This looks awesome. Any good way to inverse the ChunkedArray? I > know I can cast to Numpy (experimental) and do it there, but would

Crossbow + AppVeyor doesn't work

2020-11-04 Thread Sutou Kouhei
Hi, It seems that AppVeyor integration for https://github.com/ursa-labs/crossbow/ is disabled. https://ci.appveyor.com/project/ursa-labs/crossbow shows "Project not found or access denied.". Could anyone in Ursa Labs enable the AppVeyor integration? Context: https://github.com/apache/arrow/pull

Re: Github check error with ORC JNI adapter

2020-11-04 Thread Terence Honles
Hi Bryan, I started looking into the issue myself. I bisected one of the issues to a change in apache/arrow#8533 and asked the author about the check that seemed to be causing the issue.

Re: Crossbow + AppVeyor doesn't work

2020-11-04 Thread Neal Richardson
Kristián has said there has been some trouble with AppVeyor--assuming we got rate-limited (or otherwise flagged) due to our high usage on crossbow. He's tried a few things and has reached out to their support for help, but last I heard he hasn't had any luck. We'll keep on it though. Neal On Wed,

Re: Crossbow + AppVeyor doesn't work

2020-11-04 Thread Krisztián Szűcs
Crossbow has been deleted from appveyor (without any notice) and I'm banned from doing anything. Support has not responded yet, neither in email nor on github. You may try to setup the appveyor integration for ursa-labs/crossbow, perhaps it would work for your account. Otherwise we can switch to an

Re: Crossbow + AppVeyor doesn't work

2020-11-04 Thread Sutou Kouhei
Thanks! I understand. It seems that our Crossbow + AppVeyor usage isn't suitable for AppVeyor. We should use other CI such as GitHub Actions for Crossbow. Thanks, -- kou In "Re: Crossbow + AppVeyor doesn't work" on Wed, 4 Nov 2020 23:45:19 +0100, Krisztián Szűcs wrote: > Crossbow has bee

Re: Github check error with ORC JNI adapter

2020-11-04 Thread Terence Honles
I believe I have addressed the issue in https://issues.apache.org/jira/browse/ARROW-10499 / https://github.com/apache/arrow/pull/8595 but it looks like there are other unrelated CI errors happening right now 😕. On 2020/11/04 17:17:14, Terence Honles wrote: > Hi Bryan, > > I started looking i

Re: Github check error with ORC JNI adapter

2020-11-04 Thread Terence Honles
Hi Bryan, I tried sending the following earlier today, but it appears I was having issues with the mailing list. I started looking into the issue myself. I bisected one of the issues to a change in https://github.com/apache/arrow/pull/8533 and asked the author about the check that seemed to be

[DISCUSS] Extend specification with the definition of equality?

2020-11-04 Thread Jorge Cardoso Leitão
Hi, Recently, I revisited the code for array equality in Rust. While going through it, I observed some assumptions about how we conclude that two elements of an arrow array are equal, and when two arrays are equal. The notion of equality is also used throughout the document e.g. when we offer exa