[jira] [Created] (ARROW-6849) Can not read a list of items type

2019-10-10 Thread Yevgeni Litvin (Jira)
Yevgeni Litvin created ARROW-6849: - Summary: Can not read a list of items type Key: ARROW-6849 URL: https://issues.apache.org/jira/browse/ARROW-6849 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-10 Thread Renjie Liu
Thanks wes. Sure I'll fix it. Wes McKinney 于 2019年10月11日周五 上午6:10写道: > I just merged the PR https://github.com/apache/arrow-testing/pull/11 > > Various aspects of this make me uncomfortable so I hope they can be > addressed in follow up work > > On Thu, Oct 10, 2019 at 5:41 AM Renjie Liu > wrot

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Hey there, I meant to remove the issues section at top and replace with the one in the community health section but forgot to remove the top part. I just submitted with the removed top part. Let me know if people want me to further edit. Thanks On Thu, Oct 10, 2019 at 1:54 PM Antoine Pitrou wrot

Re: Field metadata not retrievable from parquet file

2019-10-10 Thread Isaac Myers
Thanks for the quick response. When I use pyspark to read a parquet file written by arrow, I can't see even file-level metadata. Is that also a known issue? (Note: I searched the JIRA issues and couldn't find any info.) Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Thu

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-10 Thread Wes McKinney
I just merged the PR https://github.com/apache/arrow-testing/pull/11 Various aspects of this make me uncomfortable so I hope they can be addressed in follow up work On Thu, Oct 10, 2019 at 5:41 AM Renjie Liu wrote: > > I've create ticket to track here: > https://issues.apache.org/jira/browse/ARR

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-10 Thread Wes McKinney
@Neal -- that's fine, I just want to make sure that in the next release there is a responsible party (the RM?) to seek out someone to help build the documentation rather than let it sit silently unpublished for a week or two. So we may just want to amend the RM guide to include "Find someone to upd

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Antoine Pitrou
It's good with me. Regards Antoine. Le 10/10/2019 à 22:51, Jacques Nadeau a écrit : > Antoine, is my synopsis fair? > > On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney wrote: > >> +1 >> >> On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau wrote: >> >>> Proposed report update below. LMK your thou

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Antoine, is my synopsis fair? On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney wrote: > +1 > > On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau wrote: > > > Proposed report update below. LMK your thoughts. > > > > ## Description: > > The mission of Apache Arrow is the creation and maintenance of softw

pyarrow and macOS 10.15

2019-10-10 Thread Brian Hulette
In Beam we've had a few users report issues importing Beam Python after upgrading to macOS 10.15 Catalina, and it seems like our pyarrow import is the root cause [1]. Given that I don't see any reports of this on the arrow side I suspect that this is an issue just with pyarrow 0.14 (in Beam we've r

[jira] [Created] (ARROW-6848) [C++] Specify -std=c++11 instead of -std=gnu++11 when building

2019-10-10 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-6848: Summary: [C++] Specify -std=c++11 instead of -std=gnu++11 when building Key: ARROW-6848 URL: https://issues.apache.org/jira/browse/ARROW-6848 Project: Apache Arrow

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-10 Thread Joris Van den Bossche
Wes, if you don't get to it today, I can try to update the docs tomorrow. Joris On Thu, 10 Oct 2019 at 21:51, Neal Richardson wrote: > I updated the R docs because I had everything I needed to do that > locally: https://github.com/apache/arrow-site/pull/30 Doing the others > wasn't feasible for

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Wes McKinney
+1 On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau wrote: > Proposed report update below. LMK your thoughts. > > ## Description: > The mission of Apache Arrow is the creation and maintenance of software > related to columnar in-memory processing and data interchange > > ## Issues: > > * We are stru

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-10 Thread Neal Richardson
I updated the R docs because I had everything I needed to do that locally: https://github.com/apache/arrow-site/pull/30 Doing the others wasn't feasible for me on my computer (I don't have CUDA, and the case insensitivity of the macOS file system always bites me with the pyarrow docs anyway). IMO

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Proposed report update below. LMK your thoughts. ## Description: The mission of Apache Arrow is the creation and maintenance of software related to columnar in-memory processing and data interchange ## Issues: * We are struggling with Continuous Integration scalability as the project has defin

Re: [VOTE] Release Apache Arrow 0.15.0 - RC2

2019-10-10 Thread Wes McKinney
The docs on http://arrow.apache.org/docs/ haven't been updated yet. This happened the last release, too -- I ended up updating the docs manually after a week or two. Is this included in the release management guide? If no one beats me to it, I can update the docs by hand again later today On Mon,

Re: Field metadata not retrievable from parquet file

2019-10-10 Thread Wes McKinney
We haven't implemented storing field-level metadata in Parquet files yet. It's somewhat tricky. See https://issues.apache.org/jira/browse/ARROW-4359 On Thu, Oct 10, 2019 at 11:51 AM Isaac Myers wrote: > > I can write both field- and schema-level metadata and read the values back > from schema o

[jira] [Created] (ARROW-6847) [C++] Add a range_expression interface to Iterator<>

2019-10-10 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-6847: --- Summary: [C++] Add a range_expression interface to Iterator<> Key: ARROW-6847 URL: https://issues.apache.org/jira/browse/ARROW-6847 Project: Apache Arrow Issue

Re: Question about timestamps ...

2019-10-10 Thread David Boles
Joris, Thank you for the response. There's such a trail of stale information online w/r to the overall that it wasn't clear what the status was. For example, simple searches take you into the "INT96 is deprecated therefore suppport for nanoseconds is as well" cul-de-sac. Absence that confusing con

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Arg... accidental send before ready. What do think about the statement below for community health? Does it fairly capture the concerns/perspective? On Thu, Oct 10, 2019 at 10:24 AM Jacques Nadeau wrote: > Many contributors are struggling with the slowness of pre-commit CI. Arrow > has a large n

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Many contributors are struggling with the slowness of pre-commit CI. Arrow has a large number of different platforms and components and a complex build matrix. As new commits come in, they frequently take a long time to complete. The community is trying several ways to solve this. Some of those hav

[jira] [Created] (ARROW-6846) [C++] Build failures with glog enabled

2019-10-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6846: - Summary: [C++] Build failures with glog enabled Key: ARROW-6846 URL: https://issues.apache.org/jira/browse/ARROW-6846 Project: Apache Arrow Issue Type: Bug

Field metadata not retrievable from parquet file

2019-10-10 Thread Isaac Myers
I can write both field- and schema-level metadata and read the values back from schema or relevant field. I write the schema and table described by the schema to a local parquet file. Upon reading the table or schema from the parquet file, only schema metadata are present and field metadata are

Re: Simple Join Implementation Questions

2019-10-10 Thread Antoine Pitrou
Hi David, You should look into the visitor facilities provided by Arrow C++, in arrow/visitor_inline.h. I would especially look at two of them: - VisitArrayInline() will call the visitor's overloaded Visit() method with the right array concrete type (for example Int16Array, ListArray...) - On

Simple Join Implementation Questions

2019-10-10 Thread david sherrier
Hey all, I'm working on a simple serial join implementation and need to be able to compare data across two columns of the same type. Right now the only way I have found to do this is too use ArrayData::GetValues(1) and then iterate over the returned buffer comparing the values. The problem I am

Re: Looking ahead to 1.0

2019-10-10 Thread John Muehlhausen
The format change is ARROW-6836 ... add a custom_metadata:[KeyValue] field to the Footer table in File.fbs The other change (slicing a recordbatch to honor RecordBatch.length rather than array length if the former is smaller) will hopefully not affect the format. On Wed, Oct 9, 2019 at 11:55 PM

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-10-0

2019-10-10 Thread Krisztián Szűcs
Disabled it in https://github.com/apache/arrow/pull/5617 On Thu, Oct 10, 2019 at 3:12 PM Wes McKinney wrote: > Seems like CircleCI might be paywalling some features now > > " > #!/bin/sh -eo pipefail > # Blocked due to free-plan-docker-layer-caching-unavailable > # > # --- > # Warning: This

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-10-0

2019-10-10 Thread Wes McKinney
Seems like CircleCI might be paywalling some features now " #!/bin/sh -eo pipefail # Blocked due to free-plan-docker-layer-caching-unavailable # # --- # Warning: This configuration was auto-generated to show you the message above. # Don't rerun this job. Rerunning will have no effect. false "

Re: [C++] The quest for zero-dependency builds

2019-10-10 Thread Antoine Pitrou
Yes, we could express dependencies in a Python script and have it generate a CMake module of if/else chains in cmake_modules (which we would check in git to avoid having people depend on a Python install, perhaps). Still, that is an additional maintenance burden. Regards Antoine. Le 10/10/20

Re: [C++] The quest for zero-dependency builds

2019-10-10 Thread Wes McKinney
I guess one question we should first discuss is: who is the C++ build system for? The users who are most sensitive to benchmark-driven decision making will generally be consuming the project through pre-built binaries, like our Python or R packages. If C++ developers build the project from source

Re: [DISCUSS][Java] Design of the algorithm module

2019-10-10 Thread Fan Liya
Dear all, I have added the draft for the fourth part of the document. This part contains discussion of more algorithms, some of which are already in progress. Please pay special attention to Section 4.2.1, as it contains a general discussion about the representation of integer vectors. Please tak

Re: [C++] The quest for zero-dependency builds

2019-10-10 Thread Francois Saint-Jacques
There's always the route of vendoring some library and not exposing external CMake options. This would achieve the goal of compile-out-of-the-box and enable important feature in the basic build. We also simplify dependencies requirements (benefits CI or developer). The downside is following securit

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Wes McKinney
Here is a rejection of CircleCI more than 18 months ago https://issues.apache.org/jira/browse/INFRA-15964 On Thu, Oct 10, 2019 at 4:33 AM Antoine Pitrou wrote: > > > For the record, here is the ticket for Azure Pipelines integration: > https://issues.apache.org/jira/browse/INFRA-17030 > > I open

[NIGHTLY] Arrow Build Report for Job nightly-2019-10-10-0

2019-10-10 Thread Crossbow
Arrow Build Report for Job nightly-2019-10-10-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-10-0 Failed Tasks: - docker-go: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-10-0-circle-docker-go - docker-cpp-cmake32: URL

Re: [C++] The quest for zero-dependency builds

2019-10-10 Thread Tim Paine
FWIW for perspective, we ended up just using our own Cmake file to build arrow, we needed a minimal subset of functionality on a tight size budget and it was easier doing that than configuring all the flags. https://github.com/finos/perspective/blob/master/cmake/arrow/CMakeLists.txt Tim Paine

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-10 Thread Renjie Liu
I've create ticket to track here: https://issues.apache.org/jira/browse/ARROW-6845 For this moment, can we check in those pregenerated data to unblock rust version's arrow reader? On Thu, Oct 10, 2019 at 1:20 PM Renjie Liu wrote: > It would be fine in that case. > > Wes McKinney 于 2019年10月10日周

[jira] [Created] (ARROW-6845) Setup process to generate random data for integration tests

2019-10-10 Thread Renjie Liu (Jira)
Renjie Liu created ARROW-6845: - Summary: Setup process to generate random data for integration tests Key: ARROW-6845 URL: https://issues.apache.org/jira/browse/ARROW-6845 Project: Apache Arrow I

Re: Question about timestamps ...

2019-10-10 Thread Joris Van den Bossche
Hi David, This is intentional, see https://arrow.apache.org/docs/python/parquet.html#storing-timestamps for some explanation in the documentation. Basicly, the parquet format only supports ms and us resolution, and so nanosecond timestamps (which are supported by Arrow) are converted to one of tho

[C++] The quest for zero-dependency builds

2019-10-10 Thread Antoine Pitrou
Hi all, I'm a bit concerned that we're planning to add many additional build options in the quest to have a core zero-dependency build in C++. See for example https://issues.apache.org/jira/browse/ARROW-6633 or https://issues.apache.org/jira/browse/ARROW-6612. The problem is that this is creati

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Antoine Pitrou
For the record, here is the ticket for Azure Pipelines integration: https://issues.apache.org/jira/browse/INFRA-17030 I opened an issue back in May about the Travis-CI capacity situation: https://issues.apache.org/jira/browse/INFRA-18533 Apparently CI capacity has been a "hot topic as of late":