A reply from the anonymous colleague.
I pass it along as presented to me, unaltered except for this prelude.
Note that while these are not *my* questions, I find both the
original questions and the followups compelling and in need of
answering.
I hope that we can get some of the design team (past and present)
involved in this thread, as this is part of the documentation effort
I mentioned.
... And I hope Anonymous Colleague chooses to de-cloak soon so I can
stop all this CNP nonsense.
.end
On May 23, 2007, at 1:58 AM, Joshua Isom wrote:
> On May 21, 2007, at 5:56 PM, Will Coleda wrote:
>
> > 1. Why Parrot?
> >
> > http://www.parrotcode.org/docs/intro.html:
> >
> > "Parrot is designed with the needs of dynamically typed languages
> > (such as Perl and Python) in mind, and should be able to run
programs
> > written in these languages more efficiently than VMs developed with
> > static languages in mind (JVM, .NET). Parrot is also designed to
> > provide interoperability between languages that compile to it. In
> > theory, you will be able to write a class in Perl, subclass it in
> > Python and then instantiate and use that subclass in a Tcl
program."
> >
> > a. What, precisely, about Parrot makes possible more efficient
> > execution of a dynamically typed language than would be the case
with
> > the JVM or the CLR?
>
> Parrot is a register based machine instead of a stack based machine.
> This is the way your computer is designed. Although many
architectures
> heavily use the stack, registers are far more efficient. Using a
> register based machine makes JITing executable code far more
efficient
> to come far closer to machine compiled speeds.
>
> But that mainly affects statically typed languages, such as a parrot
> without pdd15. WRT dynamically typed languages, parrot's designed
for
> it. It's as simple as that.
I confess to not grasping the point you claim is simple. As you
understand it, what is there about a register based machine, as
opposed to a stack based machine, that specifically improves the
performance of operating on dynamically typed data, without regard to
performance differences between the two architectures that are
independent of typing models?
> > b. Whatever that is, how will it adversely impact the execution of
> > statically typed languages, including type-inferred languages?
>
> If we don't force many high level components on all languages(such
as a
> scalar is a scalar and is not an integer), and provide a capacity
for a
> language to create it's own types(new pmc's), they can provide the
> functionality they need without excessive overhead of other
operations.
> But this is where "one vm for them all" comes to hurt us, as
well. In
> trying to support all languages, and provide at least the capacity
for
> all languages, we hurt our optimization for one specific language
which
> is what many languages do.
It sounds like you are saying that languages are free to implement
their own semantics using their own code, and that they can choose not
to interoperate with predefined Parrot types or types from other
languages when that would negatively impact their goals, such as
performance. While that rings true, it seems that Parrot is not
providing that ability -- languages can already implement whatever
they want without Parrot. And if languages are free to ignore
predefined and foreign types, when what benefit will they actually get
from Parrot?
Moreover, this does not address my initial question. I am asking, to
rephrase it bluntly, "If Parrot makes dynamic typing faster, doesn't
that have to make static typing slower?" That is, is Parrot making a
tradeoff here? If it is, how large is the tradeoff and what is its
nature. If it is not, then why doesn't everyone else simply do what
you are doing and gain the same benefit?
It would seem that Parrot either has to be different from the JVM and
CLR due to design or implementation optimizations that favor a
specific typing model over others -- which is what it seems to claim --
or else it does not -- either it is not thus differently designed, or
it is not thus differently implemented. If it does not, then it seems
inappropriate for it to make the claim -- and thus would raise the
question of why Parrot should be considered a superior target for
dynamically or statically typed language compilers.
> I imagine parrot won't have a significant issue with statically type
> languages, but that it will be more of an issue of the compiler
itself.
> Parrot should be able to run java fast and efficently, so long
as it's
> compiled from java to pir, instead of running java bytecode, or
> compiling java bytecode to pir.
What tradeoffs could Parrot be making that will have a significant
benefit for dynamically typed languages -- significant enough to
justify the creation of Parrot itself -- without significant detriment
to statically typed languages? Again, if these tradeoffs are so
broadly beneficial, why would the JVM or CLR not simply implement them
themselves?
Most simply: What is being lost to gain whatever is being gained?
> > c. How will this impact the execution of statically typed code in
> > Perl, Python and other targeted languages?
>
> Most problems will be from coding styles most likely.
Interoperability
> between functional programs will probably be a non-issue, but two
> different oo languages(and thus two inheritance models) will likely
> impact performance more. But this is an issue of having one vm for
> all.
I don't understand your answer. Allow me to rephrase and expand the
question.
If Parrot is designed to benefit of dynamically typed languages, how
will Parrot handle statically typed code in those languages. Will
Parrot discourage the use of static typing features in languages like
Perl by making that code execute more slowly or inefficiently than
equivalent dynamically typed code?
> > 2. General Features
> >
> > a. How will Parrot support reflection and attributes?
> >
> > b. How will Parrot support generics types?
> >
> > c. How will Parrot support interface types?
> >
> > d. What kind of security models will Parrot support?
> >
> > e. How will Parrot support small-footprint systems?
>
> Perhaps miniparrot can help take care of this. If miniparrot's a
> miniature parrot, and perhaps supporting only those features that
that
> language needs, we might be able to get a parrot suited for embedded
> systems. PMC's not needed won't be compiled in, the runcores other
> than the default could be left out, and parrot's size could shrink
> dramatically.
While many things are perhaps true, this answer sounds like "There is
no definite plan for supporting this."
> > f. How will Parrot support direct access to "unmanaged" resources?
>
> Is this like UnmanagedStruct?
I mean supporting direct access to the underlying address space and
support for determining the sizes of data within that memory. For
example, direct access to a framebuffer.
> > g. How will Parrot facilitate distributed processing?
>
> With native threading support.
I think you misunderstood my question. By "distributed", I meant the
execution of code in multiple address spaces, or the non-concurrent
execution of code. What support will Parrot provide for migrating
data or code between environment with different byte orders. How will
Parrot support capturing execution state into a preservable or
transportable form?
> > 3. Parrot PMC Issues
> >
> > The Parrot PMC vtable provides a large number of optional
functions,
> > which PMCs can either implement or not. If not implemented, they
will
> > throw an exception at runtime.
> >
> >
> > a. What support will Parrot provide a compiler to interrogate a
PMC at
> > compile time to know what it actually implements?
> >
> > All of these functions appear to be predefined because there is no
> > mechanism for extending this functionality at runtime. It
appears that
> > compilers will be limited to implementing functionality that is
> > defined in the vtable. The vtable contains the common operations
> > required by certain languages.
>
> The only extendibility that I know of is via PIR, or a dnypmc
library.
> But the vtables are primarily for interoperability with everything.
> Methods can still be addded to a pmc to provide additional needs.
Again, this does not seem to be clear, so I will provide an
example. If a Perl compiler is compiling Perl code, and that code is
written to increment the result of a call into some Python code that
returns a PythonString, how can the compiler ask the PythonString PMC
if it implements the "increment", so that it can detect at compile
time what the behavior of the statement will be?
More broadly, how can statically typed code determine if the values
produced by an operation will conform to the type requirements?
> > b. How will Parrot handle languages with operations that are not
> > provided?
> >
> > http://www.parrotcode.org/docs/vtables.html:
> >
> > "To be perfectly honest, this is a slightly flawed example,
since it's
> > unlikely that there will be a distinct "Python scalar" PMC
class. The
> > Python compiler could well type-inference variables such that a
would
> > be a PythonString and b would be a PythonNumber. But the point
remains
> > - incrementing a PythonString is very different from incrementing a
> > PerlScalar."
> >
> > c. How will Parrot address cross-language semantics?
> >
>
> The purpose of a common calling convention, and vtables, are to
address
> cross language semantics. All languages will implement the basic
> things in the same way. It's not a "our way or the high way" but
> rather a "our way is the best way for parrot."
What are "basic things"? What if a language inherently differs in how
it handles those things? For example, incrementing a scalar would
seem to be a basic operation in Perl, but Python will not implement
that basic thing in the same way. It would seem that one or both
sides of this cross-language exchange of very basic types of data will
be problematic.
You say "the best way for parrot" -- how can Parrot have a judgmental
reference point independent from the languages that target it and the
users of those languages?
> > d. Will each language have to provide its own support for
interacting
> > with PMCs for other languages?
> >
>
> No, the PMC's will do that themselves. Getting the PMC's is another
> story. A language is reponsible for it's cross language semantics.
> But parrot is designed for the widest possible case. Many languages
> limit valid characters that a subroutine can use, but parrot does
not.
> But as long as "common" cases are adhered to, most problems will not
> exist, e.g. no unicode whitespace in a subroutine name.
You say "No" initially, but then go on to say "yes" in substance. If
the PMCs are responsible for this, and if languages provide the PMCs,
then the languages are responsible for this.
To explicitly state what is implied by this question. If every
language must provide PMCs that understand how to interact with types
of other languages, then languages will only be able to interact with
each other to the degree that one or both of those languages provides
support. For Perl to use data returned from Python code, either Perl
will have to recognize Python types or Python will have to know to
produce Perl types. Then for Perl to call Tcl code, Perl and/or Tcl
will have to be taught about each other. And then for Python to call
Tcl, yet additional code will need to be created. Indeed, it could be
necessary for Python code to call Perl code that calls Tcl code,
because Perl might understand how to handle a Tcl type that Python
does not. And the more languages that are added, the more types each
language will be asked to implement code to interact with.
This seems like a scalability problem.
One possible approach would be to tell every language that when they
wish to interact, they must produce Parrot-provided types, like String
or Number. Another possible approach would be for Parrot to forcibly
convert language-specific data to Parrot-provided types. Both of
these approaches have issues.
Incidentally, the JVM/CTS approach is to tell every language to use
the same primitive types all the time and to use the same object types
as close to all the time as possible. (I am only aware of one case of
this, being the separate 'String' type in Rhino, needed to provide
both Java and JavaScript String semantics. In that case, Java code
returns a Java String object and the caller must explicitly convert it
to a JavaScript string with an operation like 'string+""'.)
> > e. How will a PerlScalar interact with a PythonString?
>
> The best method would probably convert both down to a String, do
> whatever operation, and convert up to whatever is request. But, for
> optimization, multimethod vtables could be used to provide custom
> behavior. I know src/pmc/complex.pmc has some examples of
multimethod
> vtables.
See above. The intent of this question is not so much "What could
someone happen to do in this situation", but rather "What exactly will
Parrot enforce, require or provide?"
> > f. What will happen when a PythonString is incremented in Perl
code?
>
> Parrot call's PythonString's increment vtable. Perl doesn't have an
> increment, but PerlScalar does. Python doesn't have an increment,
but
> PythonString does. Now, if the PMC doesn't implement that vtable
> function, an exception is thrown, but Parrot still tries to call it.
This would mean that any cross-language code could generate runtime
exceptions in operations that otherwise are generally considered not to
be able to fail. Indeed, it would seem that every possible operation
would possibly fail at runtime when handling foreign data.
This would seem to strongly discourage multi-language programming --
to the point of it never happening.
What will Parrot do to make this acceptable? Will end-users be forced
to write their own test cases that attempt all valid combinations of
all data between all languages they wish to use?
> > Comparing the vtable for a PMC to the JVM and CLR base Object
classes,
> > the PMC is essentially an "abstract" class with dozens of
> > "unimplemented" methods, while Java's Object provides (and
implements)
> > the following public methods:
> >
> > equals getClass hashCode notify notifyAll toString wait
> >
> > Discounting the methods related to Java's peculiar threading
> > implementation, that's:
> >
> > equals getClass hashCode toString
> >
> > Similarly, the CLR's CTS Object provides:
> >
> > Equals ReferenceEquals GetType GetHashCode ToString
> >
> > g. Why is it a good thing that PMCs essentially non-contractual
> > abstract base classes that define a lot of functionality without
> > implementing it?
>
> In some instances, this is a benefit. Suppose you want an
> auto-iterating string array. For the most part, it's an array with
> normal array properties. But if you get it's string value, it
iterates
> over the next one. If you set it's string value, maybe it splices
that
> value into the array. Having both array and string properties is
> beneficial in this case.
I do not see the benefit. You could implement exactly that without
having an undefined, abstract base type. For example, with the
following code (which is clearly simplified):
Module Main
Sub Main()
Dim i As New IncrString
Console.WriteLine(i)
Console.WriteLine(i)
Console.WriteLine(i)
Console.WriteLine(i)
i.addValue = "A"
Console.WriteLine(i)
Console.WriteLine(i)
Console.WriteLine(i)
Console.ReadKey()
End Sub
End Module
Class IncrString
Private value As String = "a"
Public WriteOnly Property addValue() As String
Set(value As String)
Me.value += value
End Set
End Property
Public Overrides Function ToString() As String
Dim curValue As String = value
Dim newValue As Char() = value.ToCharArray()
Dim length As Integer = newValue.Length
newValue(length-1) = chr(asc(newValue(length-1)) + 1)
value = newValue
Return curValue
End Function
End Class
This code produces:
a
b
c
d
eA
eB
eC
Now, this was not the best of examples in the first place, because I
would not argue that 'ToString' is not the kind of really-useful thing
you want in a core data type. The essential meaning of the routine
being "make something a human can read" -- and humans are the people
using the machines. But, as you can see, there was no need for the
core data type to provide me with an implemented 'addValue' -- it can
simply be layered on using a more primitive and extensible runtime
support for properties.
> But the downside is most things, such as an Integer, don't need
many of
> the vtables provided. In fact, if you look at the c output of a pmc
> file, you'll see that every vtable is created. I imagine it's
more for
> simplicity and speed than for memory(both executable and ram) than
> anything else.
I don't see the simplicity or the speed benefit. I do see the memory
cost. If anything, I suspect that these larger objects will fill a
CPU cache faster and be slower to load because of this increased size,
leading to slower runtime performance.
> > h. Why is there no first-tier depth in Parrot's type system,
such as:
> >
> > PMCString, PMCIntger, PMCNumber, ...
> >
>
> You mean like String, Integer, Number, Array, etc?
No, I mean why is the type-specific functionality not pushed down into
the next tier where it is actually needed, like the JVM and CTS do,
leaving the base PMC with only the same four or five methods those
systems have?
> > 4. Parrot VM Issues
> >
> > Parrot provides what it calls "registers" with no guarantee that
these
> > map to hardware registers.
> >
> > a. Will any registers ever map, in a Parrot-controlled way, to
hardware
> > registers?
>
> Yes. If a subroutine uses less than or equal to the number of
> registers on an architecture, the entire subroutine can be converted
> into native code, leaving parrot out entirely. Using a PPC system is
> better than an i386 system in this case, since it has more registers.
> Even if the subroutine isn't compiled entirely to native code,
portions
> of that code will be compiled to native code as best as possible.
>
> The very basis of parrot's jit system is that both parrot and the
> native system use registers, and that keeping data in registers helps
> to improve speed greatly.
Without opening a can of bees, this sounds like Parrot's performance
will vary greatly, depending on the quantity of variables in scope in
subroutines. While it is generally true for most languages that a
large number of variables can trigger load/store operations when the
register capacity is exceeded, Parrot will switch from JIT code to
purely interpreted code? While most people don't worry about
incurring a few load/store operations, this kind of variation may
cause programmers to alter their programming style significantly in
order to avoid unacceptable performance.
As you say, i386 has fewer registers, but it is a very common
platform. Given that, many programmers may consider it necessary to
write code that will be JIT-able on that platform, leading to a rather
awkward programming style, encouraging the use of a larger number of
subroutines, thus more calling, and ultimately a lot of register
shuffling anyway.
> > b. How can a compiler efficiently allocate registers if it does not
> > know which ones will map to hardware registers?
>
> I don't believe there's a capacity for doing this at the moment.
It's
> up to parrot to decide how much is jitted and how much isn't.
When I asked this question, I thought I was asking if the compiler
could suggest which variables should map to registers and which ones
should be loaded/stored. But it seems this is a question of which
subroutines will use registers at all. In that case, I wonder what
mechanisms Parrot will provide to inform a compiler how JIT-able a
subroutine is -- both on the current platform and on other
architectures -- to enable the compiler to know when it would make
sense to either automatically modify the code into JIT-able form, or
to warn the developer.
> > 5. Parrot Design Issues
> >
> > Parrot has many operators and number of Core PMC types for them to
> > operate on. Parrot has so many operators that it appears to be
using
> > them instead of having a standard library. This is markedly
different
> > than the CLR and JVM systems.
> >
> > a. Why was this done this way?
>
> If you look at the number of ops x86 has with an FPU, there are a
> massive number. The x87 cpu has an opcode for sine, just like parrot
> does. Many of parrot's opcodes are for accessing features of parrot
> through pir. Many of parrot's operations can't be easily taken away.
> One of the likely reasons is speed, but what things are the
questioner
> curious about?
Frankly, this is not much of an answer. I am not asking if CISC
architectures exist, but rather I am asking why you are choosing to
create one.
Moreover, I am not questioning your choices in terms of design options
and tradeoffs. I am simply looking for the description of why what
you have was done the way you did it.
> > b. What is the basis for deciding what will be an operator?
> >
> > c. How can substantial quantities of additional functionality be
added
> > to this design cleanly?
>
> New vtable's can be added by editting vtable.tbl, new ops can be
added
> by adding to src/ops/experimental.ops, new pmc's can just be added to
> src/pmc afaik. New charsets in src/charset, new jit architectures
> under src/jit(just add --jitcapable and it'll try to compile it in).
> I'd say it's a fairly clean layout for expanding things. There's
even
> the capacity for adding a new garbage collected.
It is not sufficient to say that one can write the code. How will
Parrot inform an existing compiler that the new operation exists (or
does not exist if the version of Parrot is older). Will compilers
have to themselves be recompiled even if they do not use the new
operators?
Also, this seems, as a design, to simply be a bag of operations.
Finally, I would like to add some additional questions.
2.h. Will Parrot support inline assembly language?
2.i. Will Parrot support primitive types?
4.c. How will registers benefit PMCs (e.g. PerlScalar), which are not
primitive types and cannot be stored in a hardware register?