date:20200903

RE: [C++] Runtime SIMD dispatching for Arrow

2020-09-03 Thread Du, Frank

Just want to give some updates on the dispatching. Now we has workable runtime functionality include dispatch mechanism[1][2] and build framework for both the compute kernels and other parts of C++. There are some remaining SIMD static complier code under the code base that I will try to work l

Re: Decimal128 scale limits

2020-09-03 Thread Micah Kornfield

With regards to scale, my colleague discovered some inconsistencies and filed a JIRA with a proposed fix (a PR should be attached shortly). I think this is an edge case that should be fixed but if someone with more historical context has opinions, I'd like to here them. [1] https://issues.apach

Re: Multifile parquet support

2020-09-03 Thread Micah Kornfield

Hi Radu, This is a conversation best had on dev@parquet. It came up recently [1] and I cross-posted there as well. [1] https://lists.apache.org/thread.html/re4fe4bc80c9eadd446761588f9b03d827193f91269a7c14ce0c444dd%40%3Cdev.arrow.apache.org%3E On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu wrote

Multifile parquet support

2020-09-03 Thread Radu Teodorescu

Hello, What is the current thinking around allowing the logical content of a parquet file to be split across multiple files? I see that in theory there is support for reading files where different row groups are in separate files but I cannot see any features that allow that for writing. On a s

[Rust][DataFusion] Proposal for Basic Timestamp Support

2020-09-03 Thread Andrew Lamb

I am working on an engine for processing timeseries data. Unsurprisingly for such a system, values of timestamp type feature prominently and we need basic support for them in DataFusion. Initially, we want to use DataFusion with predicates such as '=', '<', '>', etc on timestamp columns and times

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Antoine Pitrou

It would be useful for outsiders to expose what those two API levels are, and to what usage they correspond. Is Parquet encryption used only with that Spark? While Spark interoperability is important, Parquet files are more ubiquitous than that. Regards Antoine. Le 03/09/2020 à 22:31, Gidon

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky

Why would the low level API be exposed directly.. This will break the interop between the two analytic ecosystems down the road. Again, let me suggest leveraging the high level interface, based on the PropertiesDrivenCryptoFactory. It should address your technical requirements; if it doesn't, we ca

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Roee Shlomo

Hi Itamar, I implemented some python wrappers for the low level API and would be happy to collaborate on that. The reason I didn't push this forward yet is what Gidon mentioned. The API to expose to python users needs to be finalized first and it must include the key tools API for interop with

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Itamar Turner-Trauring

On Thu, Sep 3, 2020, at 11:01 AM, Antoine Pitrou wrote: > > Hi Gidon, > > Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit : > > Hi Itamar, > > > > My suggestion would be wrap a different API in Python - the high-level > > encryption interface of > > https://github.com/apache/arrow/pull/8023 >

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky

Hi Antoine, Sounds good to me. This PR is already being actively reviewed, and it'd be good to have Itamar's assessment. Cheers, Gidon On Thu, Sep 3, 2020 at 6:01 PM Antoine Pitrou wrote: > > Hi Gidon, > > Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit : > > Hi Itamar, > > > > My suggestion

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Antoine Pitrou

Hi Gidon, Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit : > Hi Itamar, > > My suggestion would be wrap a different API in Python - the high-level > encryption interface of > https://github.com/apache/arrow/pull/8023 We need a strategy for reviewing those changes. The PR is quite large, touc

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky

Hi Itamar, My suggestion would be wrap a different API in Python - the high-level encryption interface of https://github.com/apache/arrow/pull/8023 This will enable interoperability with Apache Spark (and other frameworks), where we don't expose the low level parquet encryption API. If such a low

Re: Sort int tuples across Arrow arrays in C++

2020-09-03 Thread Wes McKinney

There are various open source columnar database engines you could look at to get inspiration for a varargs variant of sort_indices. On Thu, Sep 3, 2020 at 9:26 AM Ben Kietzman wrote: > > Hi Rares, > > The arrow API does not currently support sorting against multiple columns. > We'd welcome a JIRA

Re: Sort int tuples across Arrow arrays in C++

2020-09-03 Thread Ben Kietzman

Hi Rares, The arrow API does not currently support sorting against multiple columns. We'd welcome a JIRA/PR to add that support. One potential workaround is storing the tuple as a single column of fixed_size_list(int32, 2), which could then be viewed [1] as int64 (for which sorting is supported).

Adding Parquet encryption support to PyArrow

2020-09-03 Thread Itamar Turner-Trauring

Hi, I'm looking into implementing this, and it seems like there are two parts: packaging, but also wrapping the APIs in Python. Is the latter item accurate? If so, any examples of similar existing wrapped APIs, or should I just come up with something on my own? Context: https://github.com/apac

Re: [Flight Format] Authentication Redesign

2020-09-03 Thread David Li

The C++/Python authentication implementation is entirely different (because the C++/Python/Java gRPC APIs are in turn entirely different). In particular, gRPC middleware in C++ is still experimental (compared to Java) and much more limited (unless recent versions changed this). C++/Python might fun

Re: pyarrow filesystem interface for Azure Data Lake gen2

2020-09-03 Thread Joris Van den Bossche

Thanks for sharing! It's cool to see the new PyFileSystem directly being used ;) Note that there is also an fsspec-compatible Azule filesystem implementation that should support Data Lake Gen2 ( https://github.com/dask/adlfs) for another python-based implemenation, and which can be used with pyarr

Sort int tuples across Arrow arrays in C++

2020-09-03 Thread Rares Vernica

Hello, I have a set of integer tuples that need to be collected and sorted at a coordinator. Here is an example with tuples of length 2: [(1, 10), (1, 15), (2, 10), (2, 15)] I am considering storing each column in an Arrow array, e.g., [1, 1, 2, 2] and [10, 15, 10, 15], and have the Arrow arr

[NIGHTLY] Arrow Build Report for Job nightly-2020-09-03-0

2020-09-03 Thread Crossbow

Arrow Build Report for Job nightly-2020-09-03-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-03-0 Failed Tasks: - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-03-0-github-test-conda-pyt

RE: [C++] Runtime SIMD dispatching for Arrow

Re: Decimal128 scale limits

Re: Multifile parquet support

Multifile parquet support

[Rust][DataFusion] Proposal for Basic Timestamp Support

Re: Adding Parquet encryption support to PyArrow

Re: Adding Parquet encryption support to PyArrow

Re: Adding Parquet encryption support to PyArrow

Re: Adding Parquet encryption support to PyArrow

Re: Adding Parquet encryption support to PyArrow

Re: Adding Parquet encryption support to PyArrow

Re: Adding Parquet encryption support to PyArrow

Re: Sort int tuples across Arrow arrays in C++

Re: Sort int tuples across Arrow arrays in C++

Adding Parquet encryption support to PyArrow

Re: [Flight Format] Authentication Redesign

Re: pyarrow filesystem interface for Azure Data Lake gen2

Sort int tuples across Arrow arrays in C++

[NIGHTLY] Arrow Build Report for Job nightly-2020-09-03-0

19 matches

Site Navigation

Mail list logo

Footer information