On May 23, 2007, at 8:06 PM, Will Coleda wrote:
On May 23, 2007, at 1:58 AM, Joshua Isom wrote:
I confess to not grasping the point you claim is simple. As you
understand it, what is there about a register based machine, as
opposed to a stack based machine, that specifically improves the
performance of operating on dynamically typed data, without regard to
performance differences between the two architectures that are
independent of typing models?
I don't believe there's a benefit for a certain type, but all types.
There will be less time moving data around, and keeping the same data
in the same places. The less you move data around, the more you
improve speed. Proven concepts for compiling to a register based
machine will be easier to implement. And since after all, speed is
often in large part to the compiler, a better compiler will create
better programs. So speed will be a combination of parrot version, and
compiler version.
When it comes to which is better for a type of language, both Java and
Perl 5 use a stack based machine, but Python can be ran as native code
or on a JVM, so it's hard to truly say what is best for a type of
language. To me personally, a virtual machine simulating real hardware
as much as practical is a great benefit.
It sounds like you are saying that languages are free to implement
their own semantics using their own code, and that they can choose not
to interoperate with predefined Parrot types or types from other
languages when that would negatively impact their goals, such as
performance. While that rings true, it seems that Parrot is not
providing that ability -- languages can already implement whatever
they want without Parrot. And if languages are free to ignore
predefined and foreign types, when what benefit will they actually get
from Parrot?
A language can use whatever semantics it so chooses. But
interoperability between languages will likely be broken, and they
won't be able to utilize future improvements in parrot. Part of the
whole point of parrot is to provide one portable virtual machine for
languages. Languages will be able to benefit from improvements in
parrot without needing to spend all their time on their own virtual
machine. If you have an idea for a new language, you can use parrot as
a base and deal solely with designing and compiling, instead of
running.
Moreover, this does not address my initial question. I am asking, to
rephrase it bluntly, "If Parrot makes dynamic typing faster, doesn't
that have to make static typing slower?" That is, is Parrot making a
tradeoff here? If it is, how large is the tradeoff and what is its
nature. If it is not, then why doesn't everyone else simply do what
you are doing and gain the same benefit?
What about static typing is so fast? Compiler optimizations help, but
those can still exist. I don't see any reason why parrot can't still
run statically typed code faster than dynamically typed code(as is
often the case anyway).
It would seem that Parrot either has to be different from the JVM and
CLR due to design or implementation optimizations that favor a
specific typing model over others -- which is what it seems to claim --
or else it does not -- either it is not thus differently designed, or
it is not thus differently implemented. If it does not, then it seems
inappropriate for it to make the claim -- and thus would raise the
question of why Parrot should be considered a superior target for
dynamically or statically typed language compilers.
Parrot has a benefit of starting from scratch. We aren't trying to
compile java to perl 5's vm, or compiling perl 5 to java's vm, instead
we're building a vm that will work well for them all without late term
slow hacks. A language won't need to use all of parrot's features.
Most won't, but they won't need to use weird hacks to get the
functionality they need. They can use a subset of parrot.
Personally, I rarely use some of the higher level features of parrot.
But parrot having the additional features doesn't seem to slow down my
code at all. Sometimes I want to mix regular expressions and jitable
assembly.
What tradeoffs could Parrot be making that will have a significant
benefit for dynamically typed languages -- significant enough to
justify the creation of Parrot itself -- without significant detriment
to statically typed languages? Again, if these tradeoffs are so
broadly beneficial, why would the JVM or CLR not simply implement them
themselves?
Most simply: What is being lost to gain whatever is being gained?
I'm not sure anything is lost. Perhaps when a Java to Parrot compiler
and a Perl 5 to Parrot compiler are both finished, we can see how they
compare to their "original" vm's.
I don't understand your answer. Allow me to rephrase and expand the
question.
If Parrot is designed to benefit of dynamically typed languages, how
will Parrot handle statically typed code in those languages. Will
Parrot discourage the use of static typing features in languages like
Perl by making that code execute more slowly or inefficiently than
equivalent dynamically typed code?
There's no reason for parrot to hinder one languages performance. Can
you give an example of code to use that would help others to understand
your question? Vague questions get vague answers, or no answers.
Clarify for us, please.
> Perhaps miniparrot can help take care of this. If miniparrot's a
> miniature parrot, and perhaps supporting only those features that
that
> language needs, we might be able to get a parrot suited for embedded
> systems. PMC's not needed won't be compiled in, the runcores other
> than the default could be left out, and parrot's size could shrink
> dramatically.
While many things are perhaps true, this answer sounds like "There is
no definite plan for supporting this."
There is no immediate plan for embedded systems. But the groundwork is
already laid.
> > f. How will Parrot support direct access to "unmanaged" resources?
>
> Is this like UnmanagedStruct?
I mean supporting direct access to the underlying address space and
support for determining the sizes of data within that memory. For
example, direct access to a framebuffer.
> > g. How will Parrot facilitate distributed processing?
>
> With native threading support.
I think you misunderstood my question. By "distributed", I meant the
execution of code in multiple address spaces, or the non-concurrent
execution of code. What support will Parrot provide for migrating
data or code between environment with different byte orders. How will
Parrot support capturing execution state into a preservable or
transportable form?
All PIR is compiled to PBC, a portable(including endianness) format for
executing. Parrot can run it directly, or compile and run it in
memory. You can take a PBC file written on a power pc and run it on
IA64(theoretically, I don't believe it's been tested lately), and visa
versa.
Again, this does not seem to be clear, so I will provide an
example. If a Perl compiler is compiling Perl code, and that code is
written to increment the result of a call into some Python code that
returns a PythonString, how can the compiler ask the PythonString PMC
if it implements the "increment", so that it can detect at compile
time what the behavior of the statement will be?
More broadly, how can statically typed code determine if the values
produced by an operation will conform to the type requirements?
I believe your understanding of parrot is somewhat at fault of
confusion. Perl doesn't increment a string. Python doesn't increment
a string. A PerlString increment's itself, and a PythonString
increment's itself.
It's possible that you're integrating code for a language that you
don't have installed, so it may not be possible. But cross language
handling probably isn't the most explored part of parrot. But most
issues will be more a matter of assumptions regarding handling two
languages at once than parrot's handling. But don't forget, most
people who use parrot will only use one language at a time.
What are "basic things"? What if a language inherently differs in how
it handles those things? For example, incrementing a scalar would
seem to be a basic operation in Perl, but Python will not implement
that basic thing in the same way. It would seem that one or both
sides of this cross-language exchange of very basic types of data will
be problematic.
Even if one doesn't implement increment on a string, it doesn't mean
that the "slot" for increment is at the same place.
You say "the best way for parrot" -- how can Parrot have a judgmental
reference point independent from the languages that target it and the
users of those languages?
If a language is being compiled for parrot, the compiler author is
obviously aware of parrot's abilities and potential. A language may be
implemented differently, but under the concepts that parrot is built
upon, it won't use parrot's capabilities and potential to the fullest.
Using Intel's ABI for a compiler may not be the best method for the
given language, but it will help provide interoperability between other
languages that use it, and benefit from advances Intel makes for
improving that method.
> > d. Will each language have to provide its own support for
interacting
> > with PMCs for other languages?
> >
>
> No, the PMC's will do that themselves. Getting the PMC's is another
> story. A language is reponsible for it's cross language semantics.
> But parrot is designed for the widest possible case. Many languages
> limit valid characters that a subroutine can use, but parrot does
not.
> But as long as "common" cases are adhered to, most problems will not
> exist, e.g. no unicode whitespace in a subroutine name.
You say "No" initially, but then go on to say "yes" in substance. If
the PMCs are responsible for this, and if languages provide the PMCs,
then the languages are responsible for this.
I can write a program that would benefit greatly from cross language
communication. Say I wish to combine Perl with AppleScript to control
my computer. But if AppleScript doesn't support it, but Perl does,
I'll have to use Perl to send data to AppleScript and to retrieve it.
If my AppleScript program has weird subroutine names in it that Perl
doesn't like, I can't call those subroutines. But if I can call those
subroutines, and send it a PerlScalar, I can get to do things with that
variable in AppleScript as well as Perl. Parrot will handle the
PerlScalar, even in AppleScript, but I have to get it to cross the
boundary between languages. I will be able to join together two Perl
strings from within AppleScript(although it'll probably return a Perl
string instead of an AppleScript string).
This is only an example, and someone else can probably come up with a
more valid example.
To explicitly state what is implied by this question. If every
language must provide PMCs that understand how to interact with types
of other languages, then languages will only be able to interact with
each other to the degree that one or both of those languages provides
support. For Perl to use data returned from Python code, either Perl
will have to recognize Python types or Python will have to know to
produce Perl types. Then for Perl to call Tcl code, Perl and/or Tcl
will have to be taught about each other. And then for Python to call
Tcl, yet additional code will need to be created. Indeed, it could be
necessary for Python code to call Perl code that calls Tcl code,
because Perl might understand how to handle a Tcl type that Python
does not. And the more languages that are added, the more types each
language will be asked to implement code to interact with.
This seems like a scalability problem.
Vtables are the type. It all comes down to being a Parrot type, using
parrot's interface.
To create the scenario you are visioning, parrot would need to be
extremely minimalistic, and all opcodes and pmc's would need to be
loading via a dynpmc library and a dynop library, with no set
definition of how to implement things. Fortunately, "rules" exist!
Rules are why Windows NT was a POSIX system as well to all the other
types it is/was. It aid's in the "it just works" idea that helps so
many programmers in their code.
One possible approach would be to tell every language that when they
wish to interact, they must produce Parrot-provided types, like String
or Number. Another possible approach would be for Parrot to forcibly
convert language-specific data to Parrot-provided types. Both of
these approaches have issues.
Incidentally, the JVM/CTS approach is to tell every language to use
the same primitive types all the time and to use the same object types
as close to all the time as possible. (I am only aware of one case of
this, being the separate 'String' type in Rhino, needed to provide
both Java and JavaScript String semantics. In that case, Java code
returns a Java String object and the caller must explicitly convert it
to a JavaScript string with an operation like 'string+""'.)
> > e. How will a PerlScalar interact with a PythonString?
>
> The best method would probably convert both down to a String, do
> whatever operation, and convert up to whatever is request. But, for
> optimization, multimethod vtables could be used to provide custom
> behavior. I know src/pmc/complex.pmc has some examples of
multimethod
> vtables.
See above. The intent of this question is not so much "What could
someone happen to do in this situation", but rather "What exactly will
Parrot enforce, require or provide?"
Crossing the language boundary is up to the language, handling the
language boundary is largely up to parrot, and the programmer.
Don't increment a PythonString from Perl. If you write your code to
support cross language support, you won't have a problem. But if you
try to use cross language abilities with a function that doesn't expect
it, you might have trouble.
> > f. What will happen when a PythonString is incremented in Perl
code?
>
> Parrot call's PythonString's increment vtable. Perl doesn't have an
> increment, but PerlScalar does. Python doesn't have an increment,
but
> PythonString does. Now, if the PMC doesn't implement that vtable
> function, an exception is thrown, but Parrot still tries to call it.
This would mean that any cross-language code could generate runtime
exceptions in operations that otherwise are generally considered not to
be able to fail. Indeed, it would seem that every possible operation
would possibly fail at runtime when handling foreign data.
Asking for the third element in a PerlString isn't possible, but
perfectly normal in C. There's the potential for a problem, but in
many instances you'll be passing an array to something expecting an
array rather than passing a string to something expecting an array.
This would seem to strongly discourage multi-language programming --
to the point of it never happening.
The reasons programming languages exist are to aid development, namely
speed of development. The language used for a program is often a
result of the features of the language. PGE was written in PIR because
it was easier than writing it in C. This, at least at present, has a
speed hit, but it made it easier to implement.
Suppose you want to mix Fortran with Perl 6. Both have their
advantages in coding for different aspects, so you choose to write
different parts of your code in either. You have quicker development
overall, and few lines of code to debug. Suppose you want Perl's regex
combined with Java's IO groups, with Parrot it becomes possible.
What will Parrot do to make this acceptable? Will end-users be forced
to write their own test cases that attempt all valid combinations of
all data between all languages they wish to use?
I'll remember to require a string argument to every Perl subroutine I
write and increment it just to forbid Python from using it.
If you don't intend it for cross language support, then don't care. If
you do intend it for cross language support, be more prudent about your
choices for a language, and work with users of other languages to
improve your library.
> > Comparing the vtable for a PMC to the JVM and CLR base Object
classes,
> > the PMC is essentially an "abstract" class with dozens of
> > "unimplemented" methods, while Java's Object provides (and
implements)
> > the following public methods:
> >
> > equals getClass hashCode notify notifyAll toString wait
> >
> > Discounting the methods related to Java's peculiar threading
> > implementation, that's:
> >
> > equals getClass hashCode toString
> >
> > Similarly, the CLR's CTS Object provides:
> >
> > Equals ReferenceEquals GetType GetHashCode ToString
> >
> > g. Why is it a good thing that PMCs essentially non-contractual
> > abstract base classes that define a lot of functionality without
> > implementing it?
>
> In some instances, this is a benefit. Suppose you want an
> auto-iterating string array. For the most part, it's an array with
> normal array properties. But if you get it's string value, it
iterates
> over the next one. If you set it's string value, maybe it splices
that
> value into the array. Having both array and string properties is
> beneficial in this case.
I do not see the benefit. You could implement exactly that without
having an undefined, abstract base type. For example, with the
following code (which is clearly simplified):
[...]
Now, this was not the best of examples in the first place, because I
would not argue that 'ToString' is not the kind of really-useful thing
you want in a core data type. The essential meaning of the routine
being "make something a human can read" -- and humans are the people
using the machines. But, as you can see, there was no need for the
core data type to provide me with an implemented 'addValue' -- it can
simply be layered on using a more primitive and extensible runtime
support for properties.
But you forget "zz"++ == "aaa" as in perl!
Anyway, wouldn't you much rather write i += "A"? I know I would.
There may be no need for the vtables, but otherwise you MUST know how a
language implements a given function. With vtables, you don't, parrot
does.
> But the downside is most things, such as an Integer, don't need many
of
> the vtables provided. In fact, if you look at the c output of a pmc
> file, you'll see that every vtable is created. I imagine it's more
for
> simplicity and speed than for memory(both executable and ram) than
> anything else.
I don't see the simplicity or the speed benefit. I do see the memory
cost. If anything, I suspect that these larger objects will fill a
CPU cache faster and be slower to load because of this increased size,
leading to slower runtime performance.
Perhaps some confusion was caused. Parrot has an Integer PMC, like
java's Integer object. Parrot also has an int type, which is quite
simply an unsigned native integer, which has no more hit than in C. If
you're worried about your speed and memory footprint in java, you'll
use int far more than Integer.
No, I mean why is the type-specific functionality not pushed down into
the next tier where it is actually needed, like the JVM and CTS do,
leaving the base PMC with only the same four or five methods those
systems have?
We have a Default, which others utilize. Granted, parrot still uses a
"macro" approach, but it works. But as far as I know this is more
implementation than design. Parrot could have just one tiny base
class.
Without opening a can of bees, this sounds like Parrot's performance
will vary greatly, depending on the quantity of variables in scope in
subroutines. While it is generally true for most languages that a
large number of variables can trigger load/store operations when the
register capacity is exceeded, Parrot will switch from JIT code to
purely interpreted code? While most people don't worry about
incurring a few load/store operations, this kind of variation may
cause programmers to alter their programming style significantly in
order to avoid unacceptable performance.
As you say, i386 has fewer registers, but it is a very common
platform. Given that, many programmers may consider it necessary to
write code that will be JIT-able on that platform, leading to a rather
awkward programming style, encouraging the use of a larger number of
subroutines, thus more calling, and ultimately a lot of register
shuffling anyway.
Your computer behaves the same way, most likely. Currently I
use(although not as my "desktop") FreeBSD on AMD64, with 16 total
integer registers and 16 floating point registers. If I look at the
disassembled output of a function with a large number of variables, I'm
amazed at how few registers it uses, instead opting to constantly move
data in and out of the stack(where local variables are stored). In
fact, this is how Parrot has be able to achieve "faster than c" status
before, because data is kept in registers.
If a programmer really wishes to ensure that the code is jitable, then
they'll need to look at the compiler output and the jit implementation.
But if you're looking at how to write your code to aid in jitability,
you shouldn't forget how well your compiler optimizes for certain
features, and how much of a hit certain assumptions about optimization
can hurt your program, such as arrays of arrays.
When I asked this question, I thought I was asking if the compiler
could suggest which variables should map to registers and which ones
should be loaded/stored. But it seems this is a question of which
subroutines will use registers at all. In that case, I wonder what
mechanisms Parrot will provide to inform a compiler how JIT-able a
subroutine is -- both on the current platform and on other
architectures -- to enable the compiler to know when it would make
sense to either automatically modify the code into JIT-able form, or
to warn the developer.
The tricky thing is, if it's compiled to PBC, parrot's "ELF" as it
were, you can't optimize for a particular platform. Other than being
open source, parrot doesn't provide any capacity for aiding a compiler
with this at the moment, and a design hasn't been implemented, other
than some odd(and probably unportable) use of NCI(native call
interface) and ManagedStruct's.
Frankly, this is not much of an answer. I am not asking if CISC
architectures exist, but rather I am asking why you are choosing to
create one.
Moreover, I am not questioning your choices in terms of design options
and tradeoffs. I am simply looking for the description of why what
you have was done the way you did it.
Why not ask AMD why their processors are RISC processors, with a CISC
interpreter as it were.
But consider a common function in cryptography. Bitwise rotation is
often used. Both PPC and x86 have a rotate opcode, but C does not.
Simple code such as "(a << b) | (a >> (sizeof(int)*8 - b))" will be
compiled to the shifts and or that it specifies. With a CISC vm, this
can be JIT'ed to one opcode perhaps, because Parrot does have a rot
opcode. Instead of trying to match against a sequence of codes to turn
into one, parrot provides that one opcode.
> > b. What is the basis for deciding what will be an operator?
> >
> > c. How can substantial quantities of additional functionality be
added
> > to this design cleanly?
>
> New vtable's can be added by editting vtable.tbl, new ops can be
added
> by adding to src/ops/experimental.ops, new pmc's can just be added to
> src/pmc afaik. New charsets in src/charset, new jit architectures
> under src/jit(just add --jitcapable and it'll try to compile it in).
> I'd say it's a fairly clean layout for expanding things. There's
even
> the capacity for adding a new garbage collected.
It is not sufficient to say that one can write the code. How will
Parrot inform an existing compiler that the new operation exists (or
does not exist if the version of Parrot is older). Will compilers
have to themselves be recompiled even if they do not use the new
operators?
Your question was about extending parrot's features, not seeing if they
exist. The best method would probably be finding the current version
of parrot, and going from there. Once parrot reaches version 1.0, the
bytecode should remain relatively constant. New opcodes may be added,
but older files will work regardless. An older parrot may not run a
new bytecode file, but as long as a new opcode or feature isn't
expected, it should work find. Given the fact that parrot isn't at
version 1.0, these issues are theoretical.
Also, this seems, as a design, to simply be a bag of operations.
Finally, I would like to add some additional questions.
2.h. Will Parrot support inline assembly language?
Inline native assembly, doubtful. But a language could support inline
pir which could be jitted, which is a little bit close. With NCI,
which allows for calling a c function, the power of parrot can become
massive.
2.i. Will Parrot support primitive types?
It does, namely integers, floats, strings, and PMC's which are much
like a pointer.
4.c. How will registers benefit PMCs (e.g. PerlScalar), which are not
primitive types and cannot be stored in a hardware register?
The same way C does, as a pointer.