Hi Sam:

(becomes off-topic here, but for the sake of argument)

On 19 Jan 2011, at 04:14, Sam Vilain wrote:

> On 19/01/11 10:50, Stefan Marr wrote:
>> On 18 Jan 2011, at 22:16, Sam Vilain wrote:
>>> there doesn't seem to
>>> be an interpreter under the sun which has successfully pulled off
>>> threading with shared data.
>> Could you explain what you mean with that statement?
>> 
>> Sorry, but that's my topic, and the most well know interpreters that 'pulled 
>> off' threading with shared data are for Java. The interpreter I am working 
>> on is for manycore systems (running on a 64-core Tilera chip) and executes 
>> Smalltalk (https://github.com/smarr/RoarVM).
> 
> You raise a very good point.  My statement is too broad and should
> probably apply only to dynamic languages, executed on reference counted
> VMs.  Look at some major ones - PHP, Python, Ruby, Perl, most JS engines
> - none of them actually thread properly.
Ok, but the reason here is that building such VMs is inherently complex.
And it has nothing to do with dynamic or not, with typed or what ever.
The mentioned languages happen to be very successful in the domain of web 
applications, and as others already mentioned, the need for fine-grained 
shared-memory parallelism here is not clear. So, why don't we have Python 
without the GIL? Because nobody cared enough. However, there is still JRuby...

> Well, Perl's "threading" does
> run full speed, but actually copies every variable on the heap for each
> new thread, massively bloating the process.
Cutting corners is the only way, if you do not have a great team of engineers.
For the RoarVM we also have to cut more corners than we would like.

> So the question is why should this be so, if C++ and Java, even
> interpreted on a JVM, can do it?
JVMs suffer from the same complexity. And C++, well, last time I checked there 
is just no threading model.
There will be a memory model in C++0x, but there is nothing which makes it 
inherently hard to implement.
Since you don't get any guarantees (beside the memory model semantics) and you 
don't have any GC either.

> In general, Java's basic types typically correspond with types that can
> be dealt with atomically by processors, or are small enough to be passed
> by value.  This already makes things a lot easier.
I don't think that buys you anything. Which basic types can be pass by copy?
Ints, and bools perhaps. That takes a bit pressure from the GC, but does not 
really help with making things safe. Smalltalk does not know basic types. 
However, it knows an implementation technique called tagged pointers/tagged 
integers. This allows you to have 31-bit integers since pointer are aligned and 
do not need all bits. However, that really helps only with GC pressure.  

> 
> I've had another reason for the differences explained to me.  I'm not
> sure I understand it fully enough to be able to re-explain it, but I'll
> try anyway.  As I grasped the concept, the key to making VMs fully
> threadable with shared state, is to first allow reference addresses to
> change, such as via generational garbage collection.
Hm, there is usually the wish that you can run your GC threads in parallel with 
mutator threads, here it is indeed helpful to support moving GCs. But how does 
it help with threads working in parallel on some shared object? Any point were 
an object is allowed to move requires synchronization. So, either someone has 
to change the pointer you own to that object, or you need an additional level 
of indirection.

I guess you are talking here about having such an additional indirection, 
object handles?

> This allows you to
> have much clearer "stack frames", perhaps even really stored on the
> thread-local/C stack, as opposed to most dynamic language interpreters
> which barely use the C stack at all.
Why does having object handles give you a better stack frame layout?
Using the C stack can be helpful for performance, well, makes other languages 
features harder to implement.
For instance what about closures?
Other techniques like recycling you stack-frame-objects is usually a simpler 
optimization without making it harder to stuff like closures.


>  Then, when the long-lived objects
> are discovered at scope exit time they can be safely moved into the next
> memory pool,
Ui ui ui. Slooow. I don't follow. Ok, there are things like escape analysis.
And then there are techniques like on-stack-allocation. Both usually done in 
JIT compilers, not so much in interpreters. Are we still talking about 
interpreters?
Or are you implying a incremental GC that is triggered on the return of method 
calls?


> as well as letting access to "old" objects be locked (or
> copied, in the case of Software Transactional Memory).
There are to many things here discussed in a single sentence. Sorry, I am lost.

>  Access to
> objects in your own frame can therefore be fast, and the number of locks
> that have to be held reduced.
Ok, on-stack-allocation and biased locking? 

> - memory allocation: object references' timeline and garbage collection
> - call stack frames and/or return continuations - the C stack or the heap?
> - atomicity of functions (that's the "synchronized" keyword?)
> - timely object destruction
> 
> I put it forward that the overall design of the interpreter, and
> therefore what is possible in terms of threading, is highly influenced
> by these factors.
Sure, but neither the fact that it is implemented in an interpreter is highly 
relevant (it is in terms of performance, and whether some of these techniques 
are actually relevant (overhead vs. performance benefit)) nor whether the 
language is dynamic or not.


> When threading in C or C++ for instance (and this includes HipHop-TBB),
> the call stack frame is on the C stack, so shared state is possible so
> long as you pass heap pointers around and synchronise appropriately. 
Nobody prevents you from handing out a pointer to a stack object in C/C++.
Nobody prevents you from not synchronizing properly.
How is that related to the implementation of an interpreter? If you are 
satisfied with those kind of guarantees than it is pretty easy to implement 
such an interpreter, no? (minus the GC question of course)


> The "virtual" machine is of a different nature, and it can work.  For
> JVMs, as far as I know references are temporary and again the nature of
> the execution environment is different.
References are temporary?


> For VMs where there is basically nothing on the stack, and everything on
> the heap, it becomes a lot harder.
What exactly becomes harder?


>  To talk about a VM I know better,
> Perl has about 6 internal stacks all represented on the heap; a function
> call/return stack, a lexical scope stack to represent what is in scope,
> a variable stack (the "tmps" stack) for variables declared in those
> scopes and for timely destruction, a stack to implement local($var)
> called the "save" stack, a "mark" stack used for garbage collection, ok
> well only 5 but I think you get my point.  From my reading of the PHP
> internals so far there are similar set there too, so comparisons are
> quite likely to be instructive.  It's a bit hard figuring out everything
> that is going on internally (all these internal void* types don't help
> either), and whether or not there is some inherent property of reference
> counting, or whether it just makes a shared state model harder, is a
> question I'm not sure is easy to answer.
GC is always a problem as you already pointed out earlier.
But the fact that reference counting is used should actually make the model 
more simple.


Anyway, back to the original question. I think it is by now: What are the 
fundamentally hard things when it comes to implement threaded shared-memory 
VMs? Well, and the answer is: GC.
If you don't have GC and you don't give any fancy guarantees then you only have 
to care about following what your memory model promises, and probably restrict 
yourself to insert a memory fence here and there, no? 


Best regards
Stefan


-- 
Stefan Marr
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
http://soft.vub.ac.be/~smarr
Phone: +32 2 629 2974
Fax:   +32 2 629 3525


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to