Re: Parallel Find Advanced for SMPs

Tommaso Cucinotta Sun, 13 Nov 2011 17:44:09 -0800

Il 13/11/2011 19:23, Abdelrazak Younes ha scritto:

On 11/11/2011 11:27, Tommaso Cucinotta wrote:
Il 10/11/2011 22:07, Abdelrazak Younes ha scritto:
Threads is not always about using all available processors.
ok, but here I'm really addressing a different problem: I'm not usingthreads for improving design or whatever, but merely for speeding upthe search process in case you have multiple cores
This means that you're going to do CPU intensive task for each coreand in effect this means that you're going to give less CPU time toother running processes in your system, more context switching, and asa result less responsiveness.

CPU-intensive tasks are quickly de-prioritized by the OS scheduler, atleast on Linux: I'm not sure about the other OSes, but I'd guess eachone has some sort of heuristic to discriminate among interactive tasksthat are boosted in their dynamic priority (i.e., just woken up tasks)as compared to long running CPU hogs. When I have long compile sessionson my dual-core (e.g., Linux kernel, LyX, ffmpeg, ...), I always use"make -j4", and it runs for ~15-30 mins (depending on options etc.),during which I enjoy browsing, e-mail-ing, bash-ing, emacs-ing, LyX-ing,without problems.Now, if you argue that the OS should always leave one spare core justfor keeping the system more responsive(*), and double the execution timeof any batch computing task, then I'd guess the right place for this"heuristic" (or, personal preference, I'd say) should be in an OSpreference option, reflected someway intoQThreadPool::getIdealThreadCount(), and *not* in the applications' code.

(*) ...assuming it would be if the core is idle -- AFAIK, an idle coremight go into deep sleeping [or not, depending on how its power state istied to the other running cores], so waking up from deep idle requiresanyway extra time -- if it is running you have context-switch + durationof a possible kernel section with preemption disabled -- I don't thinkthese times fall within the perception of the user...

Because you do not manage which paragraph are search next, you indeedhave to use mutexes, which are not so cheap. You could alternatelysplit the document in 2 or 3 (or the number of core minus 1or 2) andand forget about mutexes.

Right, however I'm not interested in finding all the matches, so far,but only the first one (from the cursor position). So, I could have:a) each thread synchronizing over a shared par ID in order to decide thenext paragraph to search (as in the current patch);b) intermixed searched paragraphs, i.e., if cursor is on pit, then firstthread out of m threads searches on pit, pit+m, pit+2m, ...; secondthread searches on pit+1, pit+1+m, pit+1+2m, ...; etc.

You're claiming that with a) we need to sync too often with the mutex. Iargue that with b) we may end up waiting much more than needed, beforefinding a match, because once a thread finished on a paragraph, itdoesn't go immediately to the next one not yet being searched, but itskips "blindly" m paragraphs forward. Furthermore, what happens once theodds/evens are over ? I can only see the thread stopping there (i.e.,decreased parallelism).

Perhaps a nice choice is in the middle between a) and b), but I suspectit would be more complex...

Actually, I'm thinking that a) may be realized without the mutex,relying exclusively on atomic ops (all I need is atomic increment, afterall), e.g., atomic_ops.h (but I'm not sure about how emulate thecondition variable in such a case). However, I'm not going to play withsuch a solution if the parallelized search is not appreciated by othersanyway... (still, my laptop keeps being the only one that runs AdvancedFind at double speed :-) ).

T.

Re: Parallel Find Advanced for SMPs

Reply via email to