Just want to give some updates on the dispatching.
Now we has workable runtime functionality include dispatch mechanism[1][2] and
build framework for both the compute kernels and other parts of C++. There are
some remaining SIMD static complier code under the code base that I will try to
work l
With regards to scale, my colleague discovered some inconsistencies and filed a
JIRA with a proposed fix (a PR should be attached shortly).
I think this is an edge case that should be fixed but if someone with more
historical context has opinions, I'd like to here them.
[1] https://issues.apach
Hi Radu,
This is a conversation best had on dev@parquet. It came up recently [1]
and I cross-posted there as well.
[1]
https://lists.apache.org/thread.html/re4fe4bc80c9eadd446761588f9b03d827193f91269a7c14ce0c444dd%40%3Cdev.arrow.apache.org%3E
On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu
wrote
Hello,
What is the current thinking around allowing the logical content of a parquet
file to be split across multiple files?
I see that in theory there is support for reading files where different row
groups are in separate files but I cannot see any features that allow that for
writing.
On a s
I am working on an engine for processing timeseries data. Unsurprisingly
for such a system, values of timestamp type feature prominently and we need
basic support for them in DataFusion.
Initially, we want to use DataFusion with predicates such as '=', '<', '>',
etc on timestamp columns and times
It would be useful for outsiders to expose what those two API levels
are, and to what usage they correspond.
Is Parquet encryption used only with that Spark? While Spark
interoperability is important, Parquet files are more ubiquitous than that.
Regards
Antoine.
Le 03/09/2020 à 22:31, Gidon
Why would the low level API be exposed directly.. This will break the
interop between the two analytic ecosystems down the road.
Again, let me suggest leveraging the high level interface, based on the
PropertiesDrivenCryptoFactory.
It should address your technical requirements; if it doesn't, we ca
Hi Itamar,
I implemented some python wrappers for the low level API and would be happy to
collaborate on that. The reason I didn't push this forward yet is what Gidon
mentioned. The API to expose to python users needs to be finalized first and it
must include the key tools API for interop with
On Thu, Sep 3, 2020, at 11:01 AM, Antoine Pitrou wrote:
>
> Hi Gidon,
>
> Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit :
> > Hi Itamar,
> >
> > My suggestion would be wrap a different API in Python - the high-level
> > encryption interface of
> > https://github.com/apache/arrow/pull/8023
>
Hi Antoine,
Sounds good to me. This PR is already being actively reviewed, and it'd be
good to have Itamar's assessment.
Cheers, Gidon
On Thu, Sep 3, 2020 at 6:01 PM Antoine Pitrou wrote:
>
> Hi Gidon,
>
> Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit :
> > Hi Itamar,
> >
> > My suggestion
Hi Gidon,
Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit :
> Hi Itamar,
>
> My suggestion would be wrap a different API in Python - the high-level
> encryption interface of
> https://github.com/apache/arrow/pull/8023
We need a strategy for reviewing those changes. The PR is quite large,
touc
Hi Itamar,
My suggestion would be wrap a different API in Python - the high-level
encryption interface of
https://github.com/apache/arrow/pull/8023
This will enable interoperability with Apache Spark (and other frameworks),
where we don't expose the low level parquet encryption API.
If such a low
There are various open source columnar database engines you could look
at to get inspiration for a varargs variant of sort_indices.
On Thu, Sep 3, 2020 at 9:26 AM Ben Kietzman wrote:
>
> Hi Rares,
>
> The arrow API does not currently support sorting against multiple columns.
> We'd welcome a JIRA
Hi Rares,
The arrow API does not currently support sorting against multiple columns.
We'd welcome a JIRA/PR to add that support.
One potential workaround is storing the tuple as a single column of
fixed_size_list(int32, 2), which could then be viewed [1] as int64 (for
which sorting
is supported).
Hi,
I'm looking into implementing this, and it seems like there are two parts:
packaging, but also wrapping the APIs in Python. Is the latter item accurate?
If so, any examples of similar existing wrapped APIs, or should I just come up
with something on my own?
Context:
https://github.com/apac
The C++/Python authentication implementation is entirely different
(because the C++/Python/Java gRPC APIs are in turn entirely
different). In particular, gRPC middleware in C++ is still
experimental (compared to Java) and much more limited (unless recent
versions changed this). C++/Python might fun
Thanks for sharing! It's cool to see the new PyFileSystem directly being
used ;)
Note that there is also an fsspec-compatible Azule filesystem
implementation that should support Data Lake Gen2 (
https://github.com/dask/adlfs) for another python-based implemenation, and
which can be used with pyarr
Hello,
I have a set of integer tuples that need to be collected and sorted at a
coordinator. Here is an example with tuples of length 2:
[(1, 10),
(1, 15),
(2, 10),
(2, 15)]
I am considering storing each column in an Arrow array, e.g., [1, 1, 2, 2]
and [10, 15, 10, 15], and have the Arrow arr
Arrow Build Report for Job nightly-2020-09-03-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-03-0
Failed Tasks:
- test-conda-python-3.7-hdfs-2.9.2:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-03-0-github-test-conda-pyt
19 matches
Mail list logo