Il 13/11/2011 19:23, Abdelrazak Younes ha scritto:
On 11/11/2011 11:27, Tommaso Cucinotta wrote:
Il 10/11/2011 22:07, Abdelrazak Younes ha scritto:
Threads is not always about using all available processors.
ok, but here I'm really addressing a different problem: I'm not using
threads for improving design or whatever, but merely for speeding up
the search process in case you have multiple cores
This means that you're going to do CPU intensive task for each core
and in effect this means that you're going to give less CPU time to
other running processes in your system, more context switching, and as
a result less responsiveness.
CPU-intensive tasks are quickly de-prioritized by the OS scheduler, at
least on Linux: I'm not sure about the other OSes, but I'd guess each
one has some sort of heuristic to discriminate among interactive tasks
that are boosted in their dynamic priority (i.e., just woken up tasks)
as compared to long running CPU hogs. When I have long compile sessions
on my dual-core (e.g., Linux kernel, LyX, ffmpeg, ...), I always use
"make -j4", and it runs for ~15-30 mins (depending on options etc.),
during which I enjoy browsing, e-mail-ing, bash-ing, emacs-ing, LyX-ing,
without problems.
Now, if you argue that the OS should always leave one spare core just
for keeping the system more responsive(*), and double the execution time
of any batch computing task, then I'd guess the right place for this
"heuristic" (or, personal preference, I'd say) should be in an OS
preference option, reflected someway into
QThreadPool::getIdealThreadCount(), and *not* in the applications' code.
(*) ...assuming it would be if the core is idle -- AFAIK, an idle core
might go into deep sleeping [or not, depending on how its power state is
tied to the other running cores], so waking up from deep idle requires
anyway extra time -- if it is running you have context-switch + duration
of a possible kernel section with preemption disabled -- I don't think
these times fall within the perception of the user...
Because you do not manage which paragraph are search next, you indeed
have to use mutexes, which are not so cheap. You could alternately
split the document in 2 or 3 (or the number of core minus 1or 2) and
and forget about mutexes.
Right, however I'm not interested in finding all the matches, so far,
but only the first one (from the cursor position). So, I could have:
a) each thread synchronizing over a shared par ID in order to decide the
next paragraph to search (as in the current patch);
b) intermixed searched paragraphs, i.e., if cursor is on pit, then first
thread out of m threads searches on pit, pit+m, pit+2m, ...; second
thread searches on pit+1, pit+1+m, pit+1+2m, ...; etc.
You're claiming that with a) we need to sync too often with the mutex. I
argue that with b) we may end up waiting much more than needed, before
finding a match, because once a thread finished on a paragraph, it
doesn't go immediately to the next one not yet being searched, but it
skips "blindly" m paragraphs forward. Furthermore, what happens once the
odds/evens are over ? I can only see the thread stopping there (i.e.,
decreased parallelism).
Perhaps a nice choice is in the middle between a) and b), but I suspect
it would be more complex...
Actually, I'm thinking that a) may be realized without the mutex,
relying exclusively on atomic ops (all I need is atomic increment, after
all), e.g., atomic_ops.h (but I'm not sure about how emulate the
condition variable in such a case). However, I'm not going to play with
such a solution if the parallelized search is not appreciated by others
anyway... (still, my laptop keeps being the only one that runs Advanced
Find at double speed :-) ).
T.