Sure, I'll prep a brief summary on this by Sunday, got a weekend kicking in
here today.
Cheers, Gidon
On Thu, Sep 3, 2020 at 11:44 PM Antoine Pitrou wrote:
>
> It would be useful for outsiders to expose what those two API levels
> are, and to what usage they correspond.
> Is Parquet encryption
Hello,
This may be a stupid question but is Arrow used for or designed with
streaming processing use-cases in mind, where data is non-stationary. I.e:
Flink stream processing jobs?
Particularly, is it possible from a given event source (say Kafka) to
efficiently generate incremental record batche
Arrow Build Report for Job nightly-2020-09-04-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-04-0
Failed Tasks:
- test-conda-python-3.7-hdfs-2.9.2:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-04-0-github-test-conda-pyt
Micah and all,
Thanks for that pointer, I certainly didn’t follow it in detail at the time.
My question/thoughts are actually more limited in scope and I am specifically
targeting features supported by the standard AND are supported by other major
parquet implementation.
Specifically I would li
Hi Pedro,
You should be able to use flight for this: pack you subscription call in a
DoGet and listen on the FlightDataStream for new data.
I thinkˆyou can control the granularity of your messages through the size of
the record batches you are writing, but I am not a flight developer so don’t
t
Sounds good. In the suggestion above the builders for
FileEncryptionProperties/FileDecryptionProperties should not be exposed, so
only key tools would create those. This is just one option of course.
On 2020/09/03 20:44:26, Antoine Pitrou wrote:
>
> It would be useful for outsiders to expose
Hi Radu,
It might be easier to get feedback on some concrete code. Perhaps make a PR
with a proof of concept and we can discuss there?
Neal
On Fri, Sep 4, 2020 at 4:27 AM Radu Teodorescu
wrote:
> Micah and all,
> Thanks for that pointer, I certainly didn’t follow it in detail at the
> time.
>
>
Hi Pedro,
I think the answer is it likely depends. The main trade-off in using Arrow
in a streaming process is the high metadata overhead if you have very few
rows. There have been prior discussions on the mailing list about
row-based and streaming that might be useful [1][2] in expanding on the
Hello Radu,
If your goal is strictly "append" with common schema then maybe the
terminology you are looking for is "append a parquet file to a parquet
dataset" and not "append a row group to a multi-file parquet file".
Parquet datasets (and arrow datasets) support having a common schema
which is u
Are we concerned about backward compatibility with older FlightClients?
Would it make sense to continue to support handshakes with auth payloads
in addition to header-based authentication using middlewares? Perhaps we
create a dedicated endpoint for server capabilities if we need to remain
backward
10 matches
Mail list logo