Il 13/11/2011 19:23, Abdelrazak Younes ha scritto:
On 11/11/2011 11:27, Tommaso Cucinotta wrote:
Il 10/11/2011 22:07, Abdelrazak Younes ha scritto:
Threads is not always about using all available processors.

ok, but here I'm really addressing a different problem: I'm not using threads for improving design or whatever, but merely for speeding up the search process in case you have multiple cores

This means that you're going to do CPU intensive task for each core and in effect this means that you're going to give less CPU time to other running processes in your system, more context switching, and as a result less responsiveness.

CPU-intensive tasks are quickly de-prioritized by the OS scheduler, at least on Linux: I'm not sure about the other OSes, but I'd guess each one has some sort of heuristic to discriminate among interactive tasks that are boosted in their dynamic priority (i.e., just woken up tasks) as compared to long running CPU hogs. When I have long compile sessions on my dual-core (e.g., Linux kernel, LyX, ffmpeg, ...), I always use "make -j4", and it runs for ~15-30 mins (depending on options etc.), during which I enjoy browsing, e-mail-ing, bash-ing, emacs-ing, LyX-ing, without problems. Now, if you argue that the OS should always leave one spare core just for keeping the system more responsive(*), and double the execution time of any batch computing task, then I'd guess the right place for this "heuristic" (or, personal preference, I'd say) should be in an OS preference option, reflected someway into QThreadPool::getIdealThreadCount(), and *not* in the applications' code.

(*) ...assuming it would be if the core is idle -- AFAIK, an idle core might go into deep sleeping [or not, depending on how its power state is tied to the other running cores], so waking up from deep idle requires anyway extra time -- if it is running you have context-switch + duration of a possible kernel section with preemption disabled -- I don't think these times fall within the perception of the user...

Because you do not manage which paragraph are search next, you indeed have to use mutexes, which are not so cheap. You could alternately split the document in 2 or 3 (or the number of core minus 1or 2) and and forget about mutexes.

Right, however I'm not interested in finding all the matches, so far, but only the first one (from the cursor position). So, I could have: a) each thread synchronizing over a shared par ID in order to decide the next paragraph to search (as in the current patch); b) intermixed searched paragraphs, i.e., if cursor is on pit, then first thread out of m threads searches on pit, pit+m, pit+2m, ...; second thread searches on pit+1, pit+1+m, pit+1+2m, ...; etc.

You're claiming that with a) we need to sync too often with the mutex. I argue that with b) we may end up waiting much more than needed, before finding a match, because once a thread finished on a paragraph, it doesn't go immediately to the next one not yet being searched, but it skips "blindly" m paragraphs forward. Furthermore, what happens once the odds/evens are over ? I can only see the thread stopping there (i.e., decreased parallelism).

Perhaps a nice choice is in the middle between a) and b), but I suspect it would be more complex...

Actually, I'm thinking that a) may be realized without the mutex, relying exclusively on atomic ops (all I need is atomic increment, after all), e.g., atomic_ops.h (but I'm not sure about how emulate the condition variable in such a case). However, I'm not going to play with such a solution if the parallelized search is not appreciated by others anyway... (still, my laptop keeps being the only one that runs Advanced Find at double speed :-) ).

    T.

Reply via email to