Hi Antoine,

>The question, though, is: do you *need* those higher speeds on localhost?  In 
>which context are you considering Flight?

We want to send large data(in cache) to the data analytic application(in local).

Thanks,
Jiajia

-----Original Message-----
From: Antoine Pitrou <anto...@python.org> 
Sent: Saturday, April 25, 2020 1:01 AM
To: dev@arrow.apache.org
Subject: Re: Question regarding Arrow Flight Throughput


Hi Jiajia,

It's true one should be able to reach higher speeds.  For example, I can reach 
more than 7 GB/s on a simple TCP connection, in pure Python, using only two 
threads:
https://gist.github.com/pitrou/6cdf7bf6ce7a35f4073a7820a891f78e

The question, though, is: do you *need* those higher speeds on localhost?  In 
which context are you considering Flight?

Regards

Antoine.


Le 24/04/2020 à 18:52, Li, Jiajia a écrit :
> Hi Antoine,
> 
> I think here 5 GB/s is in localhost. As localhost does not depend on network 
> speed and I've checked the CPU is not the bottleneck when running benchmark, 
> I think flight can get a higher throughput.
> 
> Thanks,
> Jiajia
> 
> -----Original Message-----
> From: Antoine Pitrou <anto...@python.org>
> Sent: Friday, April 24, 2020 5:47 PM
> To: dev@arrow.apache.org
> Subject: Re: Question regarding Arrow Flight Throughput
> 
> 
> The problem with gRPC is that it was designed with relatively small requests 
> and payloads in mind.  We're using it for a large data application which it 
> wasn't optimized for.  Also, its threading model is inscrutable (yielding 
> those weird benchmark results).
> 
> However, 5 GB/s is indeed very good if between different machines.
> 
> Regards
> 
> Antoine.
> 
> 
> Le 24/04/2020 à 05:15, Wes McKinney a écrit :
>> On Thu, Apr 23, 2020 at 10:02 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>>
>>> hi Jiajia,
>>>
>>> See my TODO here
>>>
>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/fli
>>> g
>>> ht_benchmark.cc#L182
>>>
>>> My guess is that if you want to get faster throughput with multiple 
>>> cores, you need to run more than one server and serve on different 
>>> ports rather than having all threads go to the same server through 
>>> the same port. I don't think we've made any manycore scalability 
>>> claims, though.
>>>
>>> I tried to run this myself but I can't get the benchmark executable 
>>> to run on my machine right now -- this seems to be a regression.
>>>
>>> https://issues.apache.org/jira/browse/ARROW-8578
>>
>> This turned out to be a false alarm and went away after a reboot.
>>
>> On my laptop a single thread is faster than multiple threads making 
>> requests to a sole server, so this supports the hypothesis that 
>> concurrent requests on the same port does not increase throughput.
>>
>> $ ./release/arrow-flight-benchmark -num_threads 1
>> Speed: 5131.73 MB/s
>>
>> $ ./release/arrow-flight-benchmark -num_threads 16
>> Speed: 4258.58 MB/s
>>
>> I'd suggest improving the benchmark executable to spawn multiple 
>> servers as the next step to study multicore throughput. That said 
>> with the above being ~40gbps already it's unclear how higher 
>> throughput can go realistically.
>>
>>
>>>
>>> - Wes
>>>
>>> On Thu, Apr 23, 2020 at 8:17 PM Li, Jiajia <jiajia...@intel.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have some doubts about arrow flight throughput. In this 
>>>> article(https://www.dremio.com/understanding-apache-arrow-flight/),  it 
>>>> said "High efficiency. Flight is designed to work without any 
>>>> serialization or deserialization of records, and with zero memory copies, 
>>>> achieving over 20 Gbps per core."  And in the other article 
>>>> (https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/), it 
>>>> said "As far as absolute speed, in our C++ data throughput benchmarks, we 
>>>> are seeing end-to-end TCP throughput in excess of 2-3GB/s on localhost 
>>>> without TLS enabled. This benchmark shows a transfer of ~12 gigabytes of 
>>>> data in about 4 seconds:"
>>>>
>>>> Here 20 Gbps /8 = 2.5GB/s, does it mean if we test benchmark in a server 
>>>> with two cores, the throughput will be 5 GB/s?  But I have run the 
>>>> arrow-flight-benchmark, my server with 40 cores, but the result is " 
>>>> Speed: 2420.82 MB/s" .
>>>>
>>>> So what should I do to increase the throughput? Please correct me if I am 
>>>> wrong. Thank you in advance!
>>>>
>>>> Thanks,
>>>> Jiajia
>>>>
>>>>
>>>>

Reply via email to