[jira] [Created] (ARROW-7025) on 3.8 pip install -> fatal error: 'arrow/util/thread-pool.h' file not found

2019-10-29 Thread Yale Yng-Wong (Jira)
Yale Yng-Wong created ARROW-7025: Summary: on 3.8 pip install -> fatal error: 'arrow/util/thread-pool.h' file not found Key: ARROW-7025 URL: https://issues.apache.org/jira/browse/ARROW-7025 Project: A

Arrow sync call October 30 at 12:00 US/Eastern, 16:00 UTC

2019-10-29 Thread Neal Richardson
Hi all, reminder that our biweekly call is 12 hours from now at https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will be sent out to the mailing list afterwards. Neal

Re: [DISCUSS][Java] Builders for java classes

2019-10-29 Thread Micah Kornfield
> > Just to clarify, how will this be different than the current vector > writers that they are wrapping? Is it just the ability to add multiple > values at once, or more efficiently? Based on the discussion on the thread it sounds like we should only wrap the writers if they can provide best po

Re: [DISCUSS][Java] Builders for java classes

2019-10-29 Thread Bryan Cutler
Just to clarify, how will this be different than the current vector writers that they are wrapping? Is it just the ability to add multiple values at once, or more efficiently? Also, if we are going to be adding new APIs, maybe we can try to match more closely the existing builders in C++? I believ

Re: [VOTE] Release Apache Arrow 0.15.1 - RC0

2019-10-29 Thread Krisztián Szűcs
I have locally the same binary, so something must have happened silently during the downloading process, without exiting with an error. The proper wheel is available under the GitHub release for that particular crossbow task here [1]. I'll download, sign and upload it to Bintray tomorrow evening (C

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-29-0

2019-10-29 Thread Neal Richardson
I'll fix the docker-r-conda here: https://issues.apache.org/jira/browse/ARROW-7024 On Tue, Oct 29, 2019 at 5:17 AM Crossbow wrote: > > > Arrow Build Report for Job nightly-2019-10-29-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-29-0 > > Failed Tasks

[jira] [Created] (ARROW-7024) [CI][R] Update R dependencies for Conda build

2019-10-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7024: -- Summary: [CI][R] Update R dependencies for Conda build Key: ARROW-7024 URL: https://issues.apache.org/jira/browse/ARROW-7024 Project: Apache Arrow Issue

Re: Achieving parity with Java extension types in Python

2019-10-29 Thread Justin Polchlopek
That sounds about right. We're doing some work here that might require this feature sooner than later, and if we decide to go the route that needs this improved support, I'd be happy to make this PR. Thanks for showing that issue. I'll be sure to tag any contribution with that ticket number. On

Re: PySpark read Arrow C++ tables with unsigned field types?

2019-10-29 Thread Wes McKinney
I don't think there is a JIRA issue to that effect, but you can certainly create one describing what kind of C++ API you would be looking for. This is not a priority for me personally or my colleagues, at least, but might be of interest to others in the community. Apache projects are communities o

Re: PySpark read Arrow C++ tables with unsigned field types?

2019-10-29 Thread Isaac Myers
Wes, We use Arrow C++ (not PyArrow) exclusively for writing and PySpark for manipulation and analysis. I'm wondering if there are any plans for Arrow C++ to implement something similar to flavor='spark' in PyArrow. Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Tuesda

[jira] [Created] (ARROW-7023) [Python] pa.array does not use "from_pandas" semantics for pd.Index

2019-10-29 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7023: Summary: [Python] pa.array does not use "from_pandas" semantics for pd.Index Key: ARROW-7023 URL: https://issues.apache.org/jira/browse/ARROW-7023 Pro

Re: State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)

2019-10-29 Thread Wes McKinney
It depends on the origin of your data. If your data is not originating from Arrow, then it may be better to produce an array of FixedLenByteArray and pass that to the low level WriteBatch API. If you would like some other API, please feel free to propose something. On Tue, Oct 29, 2019 at 10:13 A

AW: State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)

2019-10-29 Thread roman.karlstetter
Hi Wes, that was a bit unclear, sorry for that. With "an array", I'm referring to a plain c++-type array, i.e. an array of float, uint32_t, ... This means that I do not use the arrow::Array-based write API, but I use the TypedColumnWriter::WriteBatch() function directly and do not have any arrow

Re: State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)

2019-10-29 Thread Wes McKinney
On Tue, Oct 29, 2019 at 3:11 AM wrote: > > Hi Wes, > > thanks for the response. There's one thing that is still a little unclear to > me: > I had a look at the code for function WriteArrowSerialize arrow::Decimal128Type> in the reference you provided. I don't have arrow data > in the first place

Re: PySpark read Arrow C++ tables with unsigned field types?

2019-10-29 Thread Wes McKinney
hi Isaac -- you are more than welcome to submit a PR to cause unsigned types to be written as signed integers when using flavor='spark' from pyarrow. The simplest thing would be to do the casting of unsigned types to signed prior to writing the Parquet file - Wes On Tue, Oct 29, 2019 at 9:09 AM I

PySpark read Arrow C++ tables with unsigned field types?

2019-10-29 Thread Isaac Myers
Fields with unsigned types written with Arrow C++ can't be read by PySpark, due to Spark's lack of support unsigned types (per https://issues.apache.org/jira/browse/SPARK-10113). There is already an issue to address the same problem when writing tables with unsigned fields using PyArrow (https:

[jira] [Created] (ARROW-7022) [Python] __arrow_array__ does not work for ExtensionTypes in Table.from_pandas

2019-10-29 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7022: Summary: [Python] __arrow_array__ does not work for ExtensionTypes in Table.from_pandas Key: ARROW-7022 URL: https://issues.apache.org/jira/browse/ARROW-7022

Re: Achieving parity with Java extension types in Python

2019-10-29 Thread Joris Van den Bossche
On Mon, 28 Oct 2019 at 22:41, Wes McKinney wrote: > Adding dev@ > > I don't believe we have APIs yet for plugging in user-defined Array > subtypes. I assume you've read > > > http://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types > > There may be some

[jira] [Created] (ARROW-7021) [Java] UnionFixedSizeListWriter decimal type should check writer index

2019-10-29 Thread Ji Liu (Jira)
Ji Liu created ARROW-7021: - Summary: [Java] UnionFixedSizeListWriter decimal type should check writer index Key: ARROW-7021 URL: https://issues.apache.org/jira/browse/ARROW-7021 Project: Apache Arrow

[jira] [Created] (ARROW-7020) [Java] Fix the bugs when calculating vector hash code

2019-10-29 Thread Liya Fan (Jira)
Liya Fan created ARROW-7020: --- Summary: [Java] Fix the bugs when calculating vector hash code Key: ARROW-7020 URL: https://issues.apache.org/jira/browse/ARROW-7020 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-7019) [Java] Improve the performance of loading validity buffers

2019-10-29 Thread Liya Fan (Jira)
Liya Fan created ARROW-7019: --- Summary: [Java] Improve the performance of loading validity buffers Key: ARROW-7019 URL: https://issues.apache.org/jira/browse/ARROW-7019 Project: Apache Arrow Issue T

[NIGHTLY] Arrow Build Report for Job nightly-2019-10-29-0

2019-10-29 Thread Crossbow
Arrow Build Report for Job nightly-2019-10-29-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-29-0 Failed Tasks: - docker-clang-format: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-29-0-circle-docker-clang-format - docke

AW: State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)

2019-10-29 Thread roman.karlstetter
Hi Wes, thanks for the response. There's one thing that is still a little unclear to me: I had a look at the code for function WriteArrowSerialize in the reference you provided. I don't have arrow data in the first place, but as I understand it, I need to have an array of FixedLenByteArrays obje

[jira] [Created] (ARROW-7018) Special characters as question mark in parquet files in R

2019-10-29 Thread Vidar Ingason (Jira)
Vidar Ingason created ARROW-7018: Summary: Special characters as question mark in parquet files in R Key: ARROW-7018 URL: https://issues.apache.org/jira/browse/ARROW-7018 Project: Apache Arrow