On Mon, 05 Jan 2004 15:43, Nigel Sandever wrote; > I accept that it may not be possible on all platforms, and it may > be too expensive on some others. It may even be undesirable in the > context of Parrot, but I have seen no argument that goes to > invalidate the underlying premise.
I think you missed this: LT> Different VMs can run on different CPUs. Why should we make atomic LT> instructions out if these? We have a JIT runtime performing at 1 LT> Parrot instruction per CPU instruction for native integers. Why LT> should we slow down that by a magnitude of many tenths? LT> We have to lock shared data, then you have to pay the penalty, but LT> not for each piece of code. and this: LT> I think, that you are missing multiprocessor systems totally. You are effectively excluding true parallellism by blocking other processors from executing Parrot ops while one has the lock. You may as well skip the thread libraries altogether and multi-thread the ops in a runloop like Ruby does. But let's carry the argument through, restricting it to UP systems, with hyperthreading switched off, and running Win32. Is it even true that masking interrupts is enough on these systems? Win32 `Critical Sections' must be giving the scheduler hints not to run other pending threads whilst a critical section is running. Maybe it uses the CPU sti/cli flags for that, to avoid the overhead of setting a memory word somewhere (bad enough) or calling the system (crippling). In that case, setting STI/CLI might only incur a ~50% performance penalty for integer operations. but then there's this: NS> Other internal housekeeping operations, memory allocation, garbage NS> collection etc. are performed as "sysopcodes", performed by the VMI NS> within the auspices of the critical section, and thus secured. UG> there may be times when a GC run needs to be initiated DURING a VM UG> operation. if the op requires an immediate lare chunk of ram it UG> can trigger a GC pass or allocation request. you can't force those UG> things to only happen between normal ops (which is what making UG> them into ops does). so GC and allocation both need to be able to UG> lock all shared things in their interpreter (and not just do a UG> process global lock) so those things won't be modified by the UG> other threads that share them. I *think* this means that even if we *could* use critical sections for each op, where this works and isn't terribly inefficient, GC throws a spanner in the works. This could perhaps be worked around. In any case, it won't work on the fastest known threading implementations (Solaris, Linux NPTL, etc), as they won't know to block all the other threads in a given process just because one of them set a CPU flag cycles before it was pre-empted. So, in summary - it won't work on MP, and on UP, it couldn't possibly be as overhead-free as the other solutions. Clear as mud ? :-) [back to processors] > Do these need to apply lock on every machine level entity that > they access? Yes, but the only resource that matters here is memory. Locking *does* take place inside the processor, but the locks are all close enough to be inspected in under a cycle. And misses incur a penalty of several cycles - maybe dozens, depending on who has the memory locked. Registers are also "locked" by virtue of the fact that the out-of-order execution and pipelining logic will not schedule/allow an instruction to proceed until its data is ready. Any CPU with pipelining has this problem. There is an interesting comparison to be drawn between the JIT assembly happening inside the processor from the bytecode being executed (x86) into a RISC core machine language (µ-ops) on hyperthreading systems, and Parrot's compiling PASM to native machine code. It each case is the µ-ops that are ordered to maximize performance and fed into the execution units. On a hyperthreading processor, it has the luxury of knowing how long it will take to check the necessary locks for each instruction, probably under a cycle, so that µ-ops may scream along. With Parrot, it might have to contact another host over an ethernet controller to acquire a lock (eg, threads running in an OpenMOSIX cluster). This cannot happen for every instruction! -- Sam Vilain, [EMAIL PROTECTED] The golden rule is that there are no golden rules GEORGE BERNARD SHAW