Re: Parallel processing

Romain Francois Fri, 04 Jan 2019 03:26:03 -0800

Thanks. I think Task Group suits my needs almost. I might need some extra layer 
around it.


Here is my use case. When converting a record batch to an R data structures, 
all R allocation has to happen on the main thread, but then filling the vectors 
can (for some of them) be done in a task that runs on a different thread. Not 
all of them, e.g. filling R character vectors needs the main thread. 

So I was thinking doing something like the pseudo code: 

auto n = num_columns(); 
auto serial = TaskGroup::MakeSerial();
auto threaded = TaskGroup::MakeThreaded(...);

for(int i=0; i<n; i++) {
   - Allocate column I
   
   if( <can run in paralel> ) {
      threaded.AddTask(...)
   } else {
      serial.AddTash()
   }
} 

- start threaded tasks
- start serial tasks 

- combine


I guess that just means I need some way to hold the tasks before they go in the 
task groups. 



> Le 3 janv. 2019 à 14:36, Antoine Pitrou <anto...@python.org> a écrit :
> 
> 
> Hi Romain,
> 
> No, it's better if you use the CPU thread pool directly (or through
> TaskGroup, if that suits your execution model better).
> 
> Regards
> 
> Antoine.
> 
> 
> Le 03/01/2019 à 14:29, Romain Francois a écrit :
>> Hello, 
>> 
>> Are the functions in parallel.h the de facto model for parallelisation in 
>> arrow ? 
>> https://github.com/apache/arrow/blob/42cf69abfc1368c9884f4581811e2e7900c98fcd/cpp/src/arrow/util/parallel.h
>>  
>> <https://github.com/apache/arrow/blob/42cf69abfc1368c9884f4581811e2e7900c98fcd/cpp/src/arrow/util/parallel.h>
>> 
>> Just wondering if things like intel tbb were considered, IIRC managing 
>> threads manually can be expensive and tasks are usually cheaper. 
>> 
>> Romain
>>

Re: Parallel processing

Reply via email to