Le 27/10/2020 à 15:47, Antoine Pitrou a écrit :
>
> Hi Weston,
>
> Note: I'm on vacation, so won't be able to look at this before ~2 weeks.
>
> For information, there's a micro-benchmark of thread pools and task
> groups in src/arrow/util/thread_pool_benchmark.cc. It should allow you
> isolate performance concerns a bit better.
Oops, sorry, just saw that you already using it. That's what I get for
reading too quickly.
Regards
Antoine.
>
> Regards
>
> Antoine.
>
>
> Le 26/10/2020 à 16:48, Weston Pace a écrit :
>> Hi all,
>>
>> I've completed the initial composable futures API and iterator work.
>> The CSV reader portion is still WIP.
>>
>> First, I'm interested in getting any feedback on the futures API. In
>> particular Future<T>.Then in future.h (and the type erased
>> Composable.Compose). The actual implementation can probably be
>> cleaned up with regards to DRY (the 10 specializations of the Continue
>> function) which I plan to do at the end.
>>
>> This approach is a little different than my earlier prototype. In the
>> prototype it would always submit continuations on the thread pool as
>> new tasks. Instead I've changed it so continuations will run
>> synchronously when the future is marked complete. If there is a
>> desire to move the continuation into a thread pool task it can be done
>> with Executor.Transfer. As an example usage this is done in
>> AsyncForEachHelper so that the applied for-each function is not run on
>> the reader thread.
>>
>> Second, and perhaps what I'm more interested in, I've switched
>> ThreadedTaskGroup to using futures (e.g. using Compose to add a
>> callback that calls OneTaskDone instead of making a wrapper lambda
>> function). In theory this should be more or less the exact same work
>> as the previous task group implementation. However, I am seeing
>> noticeable overhead in arrow-thread-pool-benchmark for small tasks.
>> The benchmark runs on my system at ~950k items/s for no task group,
>> ~890k items/s with the old task group implementation, and ~450k
>> items/s with the futures based implementation. The change is isolated
>> to one method in task_group.cc so if you replace the method at line
>> 102 with the commented out version at line 127 the original
>> performance returns. I've verified that the task is not getting
>> copied. There are a few extra moves and function calls and futures
>> have to be created and copied around so it is possible that is the
>> cause of it but I'm curious if a second eye could see some other cause
>> for the degradation that I am missing. I'll also be seeing if I can
>> get gprof running later in hopes that can provide some insight.
>> However, I probably won't spend too much more time on it before
>> finishing up the CSV reader work and checking the performance of the
>> CSV reader.
>>
>> If I can't figure out the cause of the performance I can always allow
>> task group to keep the implementation it has for Append(task) while
>> using future for Append(future). I suspect that the CSV reader tasks
>> are long enough tasks that the overhead won't be an issue.
>>
>> Code:
>> https://github.com/apache/arrow/compare/master...westonpace:feature/arrow-10183?expand=1
>>
>> -Weston
>>