Jonas Maebe pisze:
On 18 Sep 2009, at 21:30, Dariusz Mazur wrote:
On 18 Sep 2009, at 16:24, Dariusz Mazur wrote:
I use own lockfree FIFO http://www.emadar.com/fpc/lockfree.htm to
distribute task between threads
its much faster and well scaling on multicore.
Note that it won't work as is on non-x86 machines, because it's
missing memory barriers (and I think that you may actually need
memory barriers on x86 too). Atomic operations are not memory
barriers by themselves, and the fact that you perform an atomic
operation does not mean that afterwards all cpu's will immediately
see this new value.
I don't know other machines. On x86 used atomic operation, as Intel
said, all needed cache are resolved.
I've made very stress test on several computers. it works good.
At least on PowerPC it will fail (and I guess on SPARC as well).
I don't say, that this will work on other than x86. I only test on Intel
and AMD.
Algorithm is very simple, need only 32bit CAS, thus implement it is
possible on most platforms.
One thing is needed: multiplatform threadswich.
I use sleep(0) ,but its not best solution (I think).
At least on Mac OS X that will not do anything. There's a procedure
called "ThreadSwitch" in the interface of the system unit though. At
least for Unix platforms it will work if cthreads is used.
I sow it, but not test.
Seems enough to me.
But is this optimal solution:
procedure SysThreadSwitch;
begin
Sleep(0);
end;
WinThreadManager.ThreadSwitch :=...@systhreadswitch;
procedure ThreadSwitch;
begin
CurrentTM.ThreadSwitch;
end;
We have 2 unnecessary invoke function. Can compiler optimize this? Or
maybe better do ThreadSwitch and SysThreadSwitch inlined.
--
Darek
_______________________________________________
fpc-pascal maillist - fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal