Re: [c++] Futures API review & help understanding benchmark result

Antoine Pitrou Tue, 27 Oct 2020 07:49:05 -0700

Le 27/10/2020 à 15:47, Antoine Pitrou a écrit :
> 
> Hi Weston,
> 
> Note: I'm on vacation, so won't be able to look at this before ~2 weeks.
> 
> For information, there's a micro-benchmark of thread pools and task
> groups in src/arrow/util/thread_pool_benchmark.cc.  It should allow you
> isolate performance concerns a bit better.

Oops, sorry, just saw that you already using it.  That's what I get for
reading too quickly.

Regards

Antoine.


> 
> Regards
> 
> Antoine.
> 
> 
> Le 26/10/2020 à 16:48, Weston Pace a écrit :
>> Hi all,
>>
>> I've completed the initial composable futures API and iterator work.
>> The CSV reader portion is still WIP.
>>
>> First, I'm interested in getting any feedback on the futures API.  In
>> particular Future<T>.Then in future.h (and the type erased
>> Composable.Compose).  The actual implementation can probably be
>> cleaned up with regards to DRY (the 10 specializations of the Continue
>> function) which I plan to do at the end.
>>
>> This approach is a little different than my earlier prototype.  In the
>> prototype it would always submit continuations on the thread pool as
>> new tasks.  Instead I've changed it so continuations will run
>> synchronously when the future is marked complete.  If there is a
>> desire to move the continuation into a thread pool task it can be done
>> with Executor.Transfer.  As an example usage this is done in
>> AsyncForEachHelper so that the applied for-each function is not run on
>> the reader thread.
>>
>> Second, and perhaps what I'm more interested in, I've switched
>> ThreadedTaskGroup to using futures (e.g. using Compose to add a
>> callback that calls OneTaskDone instead of making a wrapper lambda
>> function).  In theory this should be more or less the exact same work
>> as the previous task group implementation.  However, I am seeing
>> noticeable overhead in arrow-thread-pool-benchmark for small tasks.
>> The benchmark runs on my system at ~950k items/s for no task group,
>> ~890k items/s with the old task group implementation, and ~450k
>> items/s with the futures based implementation.  The change is isolated
>> to one method in task_group.cc so if you replace the method at line
>> 102 with the commented out version at line 127 the original
>> performance returns.  I've verified that the task is not getting
>> copied.  There are a few extra moves and function calls and futures
>> have to be created and copied around so it is possible that is the
>> cause of it but I'm curious if a second eye could see some other cause
>> for the degradation that I am missing.  I'll also be seeing if I can
>> get gprof running later in hopes that can provide some insight.
>> However, I probably won't spend too much more time on it before
>> finishing up the CSV reader work and checking the performance of the
>> CSV reader.
>>
>> If I can't figure out the cause of the performance I can always allow
>> task group to keep the implementation it has for Append(task) while
>> using future for Append(future).  I suspect that the CSV reader tasks
>> are long enough tasks that the overhead won't be an issue.
>>
>> Code: 
>> https://github.com/apache/arrow/compare/master...westonpace:feature/arrow-10183?expand=1
>>
>> -Weston
>>
Re: [c++] Futures API review & help understanding benchmark result

Reply via email to