Hi Micah,
Thanks for letting me know.
I will ping Ji Liu on the jira, and see how I can help with the jira issue.
Thanks
Karuppayya
On Fri, 24 Apr 2020, 21:05 Micah Kornfield, wrote:
> Hi Karuppayya,
> Welcome!
>
> The only issue I can think of off the top of my head on the Java side that
> i
Micah Kornfield created ARROW-8592:
--
Summary: [C++] Docs still list LLVM 7 as compiler used
Key: ARROW-8592
URL: https://issues.apache.org/jira/browse/ARROW-8592
Project: Apache Arrow
Issue
Hi Karuppayya,
Welcome!
The only issue I can think of off the top of my head on the Java side that
is on the basic side is: https://issues.apache.org/jira/browse/ARROW-6931
I'm not sure if Ji Liu is planning on working on it, you might ping Ji Liu
on the JIRA and see if you can help out. In parti
Hi Wes,
Thanks for your pointers.
It seems like to skip pandas as intermediary, I can only construct
pyarrow.RecordBatch from pyarrow.Array or
pyarrow.StructArray:https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html
And StructArray.from_pandas()'s description states, "Convert
Thanks for the tips Micah and Wes. The storage type is an int64 list, which
works in a roundtrip for parquet by itself. I'll look into it a bit more to
see what is going on.
On Fri, Apr 24, 2020 at 11:50 AM Wes McKinney wrote:
> Extension types will round trip correctly through Parquet so long a
Mahmut Bulut created ARROW-8591:
---
Summary: [Rust] Reverse lookup for a key in DictionaryArray
Key: ARROW-8591
URL: https://issues.apache.org/jira/browse/ARROW-8591
Project: Apache Arrow
Issue T
Mark Hildreth created ARROW-8590:
Summary: [Rust] Use Arrow pretty print utility in DataFusion
Key: ARROW-8590
URL: https://issues.apache.org/jira/browse/ARROW-8590
Project: Apache Arrow
Issu
gRPC breaks large buffers into smaller pieces that have to be
reassembled after receipt -- this does add some overhead. I would
guess that circumventing gRPC for the transfer of each IPC messages
would be the route to throughput beyond the 20-40Gbps that we're able
to achieve now.
On Fri, Apr 24,
I'm not sure a new transport for gRPC would change anything. gRPC
currently uses HTTP (HTTP2 I believe), and there's no reason for HTTP to
be the culprit here.
Regards
Antoine.
Le 24/04/2020 à 20:48, Micah Kornfield a écrit :
> A couple of questions:
> 1. For same node transport would doing
Extension types will round trip correctly through Parquet so long as
the storage type can be roundtripped (as Micah pointed out support for
reading all nested types is not yet available).
Note for reinforcement that Feather V2 is exactly an Arrow IPC file --
so IPC files could already do this prio
A couple of questions:
1. For same node transport would doing something with Plasma be a
reasonable approach?
2. What are the advantages/disadvantages of creating a new transport for
gRPC [1] vs building an entirely new backend of flight?
Thanks,
Micah
[1] https://github.com/grpc/grpc/issues/79
ryan created ARROW-8589:
---
Summary: ModuleNotFoundError: No module named 'pyarrow._orc'
Key: ARROW-8589
URL: https://issues.apache.org/jira/browse/ARROW-8589
Project: Apache Arrow
Issue Type: Bug
Having alternative backends for Flight has been a goal from the start,
hence why gRPC is wrapped and generally not exposed to the user. I
would be interested in collaborating on an HTTP/1 backend that is
accessible from the browser (or via an alternative transport meeting
the same requirements, e.g
Jack Fan created ARROW-8588:
---
Summary: `driver` param removed from `hdfs.connect()`
Key: ARROW-8588
URL: https://issues.apache.org/jira/browse/ARROW-8588
Project: Apache Arrow
Issue Type: Bug
Hi Bryan,
Extension types isn't explicitly called out but
https://issues.apache.org/jira/browse/ARROW-1644 (and related subtasks)
might be a good place to track this.
Thanks,
Micah
On Fri, Apr 24, 2020 at 11:13 AM Bryan Cutler wrote:
> I've been trying out IO with Arrow's extension types and I
I've been trying out IO with Arrow's extension types and I was able write a
parquet file but reading it back causes an error:
"pyarrow.lib.ArrowInvalid: Unsupported nested type: ...". Looking at the
code for the parquet reader, it checks nested types and only allows a few
specific ones. Is this a k
Hi Jiajia,
I see. I think there are two possible avenues to try and improve this:
* better use gRPC in the hope of achieving higher performance. This
doesn't seem to be easy, though. I've already tried to change some of
the parameters listed here, but didn't get any benefits:
https://grpc.gi
Hi Antoine,
>The question, though, is: do you *need* those higher speeds on localhost? In
>which context are you considering Flight?
We want to send large data(in cache) to the data analytic application(in local).
Thanks,
Jiajia
-Original Message-
From: Antoine Pitrou
Sent: Saturday
Hi Jiajia,
It's true one should be able to reach higher speeds. For example, I can
reach more than 7 GB/s on a simple TCP connection, in pure Python, using
only two threads:
https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e
The question, though, is: do you *need* those higher spe
Hi Antoine,
I think here 5 GB/s is in localhost. As localhost does not depend on network
speed and I've checked the CPU is not the bottleneck when running benchmark, I
think flight can get a higher throughput.
Thanks,
Jiajia
-Original Message-
From: Antoine Pitrou
Sent: Friday, April
Chengxin Ma created ARROW-8587:
--
Summary: Compilation error when linking arrow-flight-perf-server
Key: ARROW-8587
URL: https://issues.apache.org/jira/browse/ARROW-8587
Project: Apache Arrow
Issu
I recommend going directly via Arrow instead of routing through pandas (or
at least only using pandas as an intermediary to convert smaller chunks to
Arrow). Tables can be composed from smaller RecordBatch objects (see
Table.from_batches) so you don't need to accumulate much non-Arrow data in
memor
Hi,
I am new to Arrow and Parquet.
My goal is to decode a 4GB binary file (packed c struct) and write all records
to a file that can be used by R dataframe and Pandas dataframe and so others
can do some heavy analysis on the big dataset efficiently (in terms of loading
time and running statistic
+1 (binding)
On Fri, Apr 24, 2020 at 5:41 AM Krisztián Szűcs
wrote:
>
> +1 (binding)
>
> On 2020. Apr 24., Fri at 1:51, Micah Kornfield
> wrote:
>
> > +1 (binding)
> >
> > On Thu, Apr 23, 2020 at 2:35 PM Sutou Kouhei wrote:
> >
> > > +1 (binding)
> > >
> > > In
> > > "[VOTE] Add "trivial" Re
Hei created ARROW-8586:
--
Summary: Failed to Install arrow From CRAN
Key: ARROW-8586
URL: https://issues.apache.org/jira/browse/ARROW-8586
Project: Apache Arrow
Issue Type: Bug
Components: R
Hi,
Le 24/04/2020 à 01:36, karuppayya a écrit :
> Hi All,
> I am interested i contributing to Arrow project
>
> I am planning to start with some jiras on Arrow Java component.
> I tried looking for jiras with component *Java* and labels *beginner*,
> *beginners*, *newbie.*
We're not using the
Krisztian Szucs created ARROW-8585:
--
Summary: [Packaging][Python] Windows wheels fail to build because
of link error
Key: ARROW-8585
URL: https://issues.apache.org/jira/browse/ARROW-8585
Project: Apa
On Fri, Apr 24, 2020 at 12:07 PM Crossbow wrote:
>
>
> Arrow Build Report for Job nightly-2020-04-24-0
>
> All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-24-0
>
> Failed Tasks:
> - debian-stretch-amd64:
> URL:
> https://github.com/ursa-labs/crossbow/branc
Krisztian Szucs created ARROW-8584:
--
Summary: [Packaging][C++] Protobuf link error in debian-stretch
build
Key: ARROW-8584
URL: https://issues.apache.org/jira/browse/ARROW-8584
Project: Apache Arrow
Krisztian Szucs created ARROW-8583:
--
Summary: [C++][Doc] Undocumented parameter in Dataset namespace
Key: ARROW-8583
URL: https://issues.apache.org/jira/browse/ARROW-8583
Project: Apache Arrow
Krisztian Szucs created ARROW-8582:
--
Summary: [Packaging][Python] macOS wheels occasionally exceed
travis build time limit
Key: ARROW-8582
URL: https://issues.apache.org/jira/browse/ARROW-8582
Projec
Arrow Build Report for Job nightly-2020-04-24-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-24-0
Failed Tasks:
- debian-stretch-amd64:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-24-0-github-debian-stretch-amd64
- tes
Hi Wes,
Thanks for your reply!
Thanks,
Jiajia
-Original Message-
From: Wes McKinney
Sent: Friday, April 24, 2020 11:15 AM
To: dev
Subject: Re: Question regarding Arrow Flight Throughput
On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney wrote:
>
> hi Jiajia,
>
> See my TODO here
>
> htt
The problem with gRPC is that it was designed with relatively small
requests and payloads in mind. We're using it for a large data
application which it wasn't optimized for. Also, its threading model is
inscrutable (yielding those weird benchmark results).
However, 5 GB/s is indeed very good i
+1 (binding)
On 2020. Apr 24., Fri at 1:51, Micah Kornfield
wrote:
> +1 (binding)
>
> On Thu, Apr 23, 2020 at 2:35 PM Sutou Kouhei wrote:
>
> > +1 (binding)
> >
> > In
> > "[VOTE] Add "trivial" RecordBatch body compression to Arrow IPC
> > protocol" on Wed, 22 Apr 2020 19:24:09 -0500,
> >
Adam Szmigin created ARROW-8581:
---
Summary: [C#] Date32/64Array write & read back introduces
off-by-one error
Key: ARROW-8581
URL: https://issues.apache.org/jira/browse/ARROW-8581
Project: Apache Arrow
36 matches
Mail list logo