Are we concerned about backward compatibility with older FlightClients?
Would it make sense to continue to support handshakes with auth payloads
in addition to header-based authentication using middlewares? Perhaps we
create a dedicated endpoint for server capabilities if we need to remain
backward
Hello Radu,
If your goal is strictly "append" with common schema then maybe the
terminology you are looking for is "append a parquet file to a parquet
dataset" and not "append a row group to a multi-file parquet file".
Parquet datasets (and arrow datasets) support having a common schema
which is u
Hi Pedro,
I think the answer is it likely depends. The main trade-off in using Arrow
in a streaming process is the high metadata overhead if you have very few
rows. There have been prior discussions on the mailing list about
row-based and streaming that might be useful [1][2] in expanding on the
Hi Radu,
It might be easier to get feedback on some concrete code. Perhaps make a PR
with a proof of concept and we can discuss there?
Neal
On Fri, Sep 4, 2020 at 4:27 AM Radu Teodorescu
wrote:
> Micah and all,
> Thanks for that pointer, I certainly didn’t follow it in detail at the
> time.
>
>
Sounds good. In the suggestion above the builders for
FileEncryptionProperties/FileDecryptionProperties should not be exposed, so
only key tools would create those. This is just one option of course.
On 2020/09/03 20:44:26, Antoine Pitrou wrote:
>
> It would be useful for outsiders to expose
Hi Pedro,
You should be able to use flight for this: pack you subscription call in a
DoGet and listen on the FlightDataStream for new data.
I thinkˆyou can control the granularity of your messages through the size of
the record batches you are writing, but I am not a flight developer so don’t
t
Micah and all,
Thanks for that pointer, I certainly didn’t follow it in detail at the time.
My question/thoughts are actually more limited in scope and I am specifically
targeting features supported by the standard AND are supported by other major
parquet implementation.
Specifically I would li
Arrow Build Report for Job nightly-2020-09-04-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-04-0
Failed Tasks:
- test-conda-python-3.7-hdfs-2.9.2:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-04-0-github-test-conda-pyt
Hello,
This may be a stupid question but is Arrow used for or designed with
streaming processing use-cases in mind, where data is non-stationary. I.e:
Flink stream processing jobs?
Particularly, is it possible from a given event source (say Kafka) to
efficiently generate incremental record batche
Sure, I'll prep a brief summary on this by Sunday, got a weekend kicking in
here today.
Cheers, Gidon
On Thu, Sep 3, 2020 at 11:44 PM Antoine Pitrou wrote:
>
> It would be useful for outsiders to expose what those two API levels
> are, and to what usage they correspond.
> Is Parquet encryption
10 matches
Mail list logo