On Mon, 05 Jan 2004 15:43, Nigel Sandever wrote;

  > I accept that it may not be possible on all platforms, and it may
  > be too expensive on some others. It may even be undesirable in the
  > context of Parrot, but I have seen no argument that goes to
  > invalidate the underlying premise.

I think you missed this:

LT> Different VMs can run on different CPUs. Why should we make atomic
LT> instructions out if these? We have a JIT runtime performing at 1
LT> Parrot instruction per CPU instruction for native integers. Why
LT> should we slow down that by a magnitude of many tenths?

LT> We have to lock shared data, then you have to pay the penalty, but
LT> not for each piece of code.

and this:

LT> I think, that you are missing multiprocessor systems totally.

You are effectively excluding true parallellism by blocking other
processors from executing Parrot ops while one has the lock.  You may
as well skip the thread libraries altogether and multi-thread the ops
in a runloop like Ruby does.

But let's carry the argument through, restricting it to UP systems,
with hyperthreading switched off, and running Win32.  Is it even true
that masking interrupts is enough on these systems?

Win32 `Critical Sections' must be giving the scheduler hints not to
run other pending threads whilst a critical section is running.  Maybe
it uses the CPU sti/cli flags for that, to avoid the overhead of
setting a memory word somewhere (bad enough) or calling the system
(crippling).  In that case, setting STI/CLI might only incur a ~50%
performance penalty for integer operations.

but then there's this:

  NS> Other internal housekeeping operations, memory allocation, garbage
  NS> collection etc. are performed as "sysopcodes", performed by the VMI
  NS> within the auspices of the critical section, and thus secured.

UG> there may be times when a GC run needs to be initiated DURING a VM
UG> operation. if the op requires an immediate lare chunk of ram it
UG> can trigger a GC pass or allocation request. you can't force those
UG> things to only happen between normal ops (which is what making
UG> them into ops does). so GC and allocation both need to be able to
UG> lock all shared things in their interpreter (and not just do a
UG> process global lock) so those things won't be modified by the
UG> other threads that share them.

I *think* this means that even if we *could* use critical sections for
each op, where this works and isn't terribly inefficient, GC throws a
spanner in the works.  This could perhaps be worked around.

In any case, it won't work on the fastest known threading
implementations (Solaris, Linux NPTL, etc), as they won't know to
block all the other threads in a given process just because one of
them set a CPU flag cycles before it was pre-empted.

So, in summary - it won't work on MP, and on UP, it couldn't possibly
be as overhead-free as the other solutions.

Clear as mud ?  :-)

[back to processors]
> Do these need to apply lock on every machine level entity that
> they access?

Yes, but the only resource that matters here is memory.  Locking
*does* take place inside the processor, but the locks are all close
enough to be inspected in under a cycle.  And misses incur a penalty
of several cycles - maybe dozens, depending on who has the memory
locked.

Registers are also "locked" by virtue of the fact that the
out-of-order execution and pipelining logic will not schedule/allow an
instruction to proceed until its data is ready.  Any CPU with
pipelining has this problem.

There is an interesting comparison to be drawn between the JIT
assembly happening inside the processor from the bytecode being
executed (x86) into a RISC core machine language (µ-ops) on
hyperthreading systems, and Parrot's compiling PASM to native machine
code.  It each case is the µ-ops that are ordered to maximize
performance and fed into the execution units.

On a hyperthreading processor, it has the luxury of knowing how long
it will take to check the necessary locks for each instruction,
probably under a cycle, so that µ-ops may scream along.

With Parrot, it might have to contact another host over an ethernet
controller to acquire a lock (eg, threads running in an OpenMOSIX
cluster).  This cannot happen for every instruction!
-- 
Sam Vilain, [EMAIL PROTECTED]

  The golden rule is that there are no golden rules
GEORGE BERNARD SHAW


Reply via email to