Re: [RFC] imcc calling conventions

2003-02-25 Thread Jerome Vouillon
On Tue, Feb 25, 2003 at 08:47:55AM +0100, Leopold Toetsch wrote:
> >Um... no. tail call optimization implies being able to replace *any*
> >tail call, not just a recursive one with a simple goto. Consider the
> >following calling sequence:
> 
> 
> >   b(arg) -> Push Continuation A onto the continuation chain
> 
> 
> Continuations are different anyway. They store the context of the parent 
> at the point they were created, and on invoke they swap context.

You don't mean the same thing by continuation.  For Piers, a
continuation is just a place where one can jump.  So, for a function
call, you push the return continuation (the place where you must jump
on return) onto the stack and jump to the continuation corresponding
to the function body.  What you mean by a continuation is what one may
call "first-class continuation", that is a continuation which can be
manipulated like any other "object" (you can pass it as argument of a
function, return it from a function, put it in a variable, ...)

-- Jerome


Re: [RFC] imcc calling conventions

2003-02-25 Thread Jerome Vouillon
On Mon, Feb 24, 2003 at 05:29:26PM +, Piers Cawley wrote:
[Tail-call optimization]
> Under caller saves, this is easy to do. Under callee saves, b's second
> call to c() cannot simply return to Continuation A but must unwind the
> call stack via b to make sure that the right things are restored.

Note that we have the same situation with exceptions.  Under caller
saves, we just need to consider the deeper stack frame, while under
callee saves one need to unwind the stack.

-- Jerome


A couple easy questions...

2003-02-25 Thread David
How do you determine the datatype of a PMC? For example, if I create the 
following array:

new P0, .PerlArray
set P0[1], "cat"
set P0[2], 123
set P0[3], 456.789

and then grab a value from the array:

set P1, P2[1]

how can I test to determine the datatype of the object in P1?

Also, are there any pre-built Windows binaries of Parrot available?

Thanks!


This week's Perl 6 Summary

2003-02-25 Thread Piers Cawley
The Perl 6 summary for the week ending 20030223
Another week, another Perl 6 Summary, in which you'll find gratuitous
mentions of Leon Brocard, awed descriptions of what Leopold Tötsch got
up to and maybe even a summary of what's been happening in Perl 6 design
and development.

Kicking off with perl6-internals as usual.

  Strings and header reuse
Dan responded to prompting from Leo Tötsch about the use and reuse of
string headers. The problem is that most of the string functions that
produce modified strings do it in new string headers; there's no way of
reusing existing string headers. This can end up generating loads of
garbage. Dan's going through the various string handling ops and PMC
interfaces working out what needs to do what, and documenting them, as
well as adding in versions of the ops that take their destination string
headers as an argument. Dan hopes that 'we can make the changes quickly
and get this out of the way once and for all', leading Robert Spier to
mutter something about 'famous last words'.



  PXS help
Tupshin Harper has been trying to use an XML parser from within Parrot
and started off by looking at the PXS example (in examples/pxs) but had
problems following the instructions given there as his compiler spat out
errors by the bucket load. Leo Tötsch thought that PXS was probably
deprecated and the native call interface (NCI) was the thing to use.
Being Leo, he provided a port of the PXS Qt example to NCI. Although PXS
appears to be out of sync with the parrot core, nobody was entirely sure
whether it should be removed.



  Bit rot in parrot/language/*
Tupshin Harper had some problems with some of the language examples not
working well with the most recent versions of Parrot. Leo Tötsch
addressed most of the issues he raised, but there are definitely issues
with the interpreter and the languages getting out of sync.





  Macros in IMCC (part 2)
Jürgen Bömmels extended the macro support in IMCC, implementing
".constant" and adding some docs. The patch was promptly applied.



  [RFD] IMCC calling conventions
Leo Tötsch posted an RFD covering his understanding of the various
calling conventions that IMCC would have to deal with which sparked some
discussion. I'm now confused as to whether function calls in Parrot will
be caller saves, callee saves, or some unholy mixture of the two.



  Parrot performance vs the good, the bad and the ugly
Tupshin Harper decided to port primes.pasm to C and Perl 5 to compare
results. Parrot came out very quick indeed. (Close to C). For bonus
points he then took a python primes algorithm that had been ported to C
and ported that to Parrot as well. In full on, all stops pulled out,
knobs turned to 11 mode, Parrot came in at about 50% slower than C and
around 14 times faster than Python. There was some muttering about the
demo being rigged. However, Jim Meyer redid the Perl and Python
implementations to use a loop that duplicated the algorithm used in
primes.pasm and, whilst it improved their performance somewhat, Parrot
was still substantially faster.

This probably won't be the test that's run when Dan and Guido face the
possibility of custard pies at 10 paces or whatever the performance
challenge stake stands at now.



  Mmm... spaceships...
Leon Brocard patched examples/assembly/life.pasm to use a small
spaceship as its starting pattern. Apparently because it 'can provide
hours of enjoyment at conferences if projected onto a big screen while
answering parrot questions.' Nobody thought to ask him how a spaceship
was going to answer Parrot questions, but Leo Tötsch applied the patch.



  Using IMCC as JIT optimizer
Apparently, Leo Tötsch finds it unbearable that 'optimized compiled C is
still faster than parrot -j' so he's been experimenting with adding
smarts to IMCC, making it add hardware register allocation hints to its
emitted bytecode. Sean O'Rourke liked the basic idea, but reckoned that
the information generated by IMCC should really be platform-independent,
suggesting that it'd be okay to pass a control flow graph to the JIT,
but that hardware register allocation for a specific number of registers
would iffy. He suggested that another option would be for IMCC to 'just
rank the Parrot registers in order of decreasing spill cost', then the
JIT could just move the most important parrot registers into
architectural registers.

Dan thought the id

Re: Configure.pl --cgoto=0 doesn't work

2003-02-25 Thread Juergen Boemmels
Juergen Boemmels <[EMAIL PROTECTED]> writes:

> > we can use imcc rather than assemble.pl as the default assembler for the
> > regression tests? imcc is a lot (>20 times) faster.
> 
> make test IMCC=1
> Leo had introduced this several days ago, shortly after the
> macro-support for imcc went in.

Arggh, first testing then posting. The correct commandline for the
quick test is:

make test IMCC=languages/imcc/imcc

This gets makes the testsuite really fast. Even more than replacing
assemble.pl with exec languages/imcc/imcc

> bye
> boe


Re: Using imcc as JIT optimizer

2003-02-25 Thread Leopold Toetsch
Leopold Toetsch wrote:


- do register allocation for JIT in imcc
- use the first N registers as MAPped processor registers


I have committed the next bunch of changes and an updated jit.pod.
- it should now be platform independent, *but* other platforms have to
  define what they consider as preserved (callee-saved) registers and
  put these first in the mapped register lists.
- for testing enable JIT_IMCC_OJ in jit.c and for platforms != i386:
  copy the MAP macro at bottom of jit/i386/jit_emit.h to your jit_emit.h
- run programs like so:
  imcc -Oj -d8 primes.pasm (-d8 shows generates ins)
It runs now ~95% of parrot tests on i386 but YMMV.

Have fun,

leo




Re: A couple easy questions...

2003-02-25 Thread Leon Brocard
David sent the following bits through the ether:

> how can I test to determine the datatype of the object in P1?

You'd be wanting "typeof". The following prints out "PerlString", for
example:

new P0, .PerlArray
set P0[1], "cat"
set P0[2], 123
set P0[3], 456.789
set P1, P0[1]
typeof S0, P1
print S0
print "\n"
end

Leon

ps i fixed your code
-- 
Leon Brocard.http://www.astray.com/
scribot.http://www.scribot.com/

... Useless invention no. 404: Inflatable anchor 


Re: [RFC] imcc calling conventions

2003-02-25 Thread Piers Cawley
Leopold Toetsch <[EMAIL PROTECTED]> writes:

> Piers Cawley wrote:
>
>> Steve Fink <[EMAIL PROTECTED]> writes:
>>
>>>... I didn't follow about how that interferes with tail-call
>>>optimization. (To me, "tail call optimization" == "replace recursive
>>>call with a goto to the end of the function preamble")
>>>
>> Um... no. tail call optimization implies being able to replace *any*
>> tail call, not just a recursive one with a simple goto. Consider the
>> following calling sequence:
>
>
>>b(arg) -> Push Continuation A onto the continuation chain
>
>
> Continuations are different anyway. They store the context of the
> parent at the point they were created, and on invoke they swap context.

But the 'return target' of the current subroutine can be thought of as
a continuation. It often makes sense to do so.

-- 
Piers


Re: A couple easy questions...

2003-02-25 Thread Leopold Toetsch
David wrote:

How do you determine the datatype of a PMC? For example, if I create the 
following array:


From docs/core_ops.pod (built from core.ops):

=item B(out STR, in PMC)

=item B(out INT, in PMC)

Return the type of PMC in $2.

The String result is the class name, the int result is the enum value of 
 the class, which might vary, when new classes are added.

Thanks!


leo






This week's summary

2003-02-25 Thread Piers Cawley
The Perl 6 summary for the week ending 20030223
Another week, another Perl 6 Summary, in which you'll find gratuitous
mentions of Leon Brocard, awed descriptions of what Leopold Tötsch got
up to and maybe even a summary of what's been happening in Perl 6 design
and development.

Kicking off with perl6-internals as usual.

  Strings and header reuse
Dan responded to prompting from Leo Tötsch about the use and reuse of
string headers. The problem is that most of the string functions that
produce modified strings do it in new string headers; there's no way of
reusing existing string headers. This can end up generating loads of
garbage. Dan's going through the various string handling ops and PMC
interfaces working out what needs to do what, and documenting them, as
well as adding in versions of the ops that take their destination string
headers as an argument. Dan hopes that 'we can make the changes quickly
and get this out of the way once and for all', leading Robert Spier to
mutter something about 'famous last words'.



  PXS help
Tupshin Harper has been trying to use an XML parser from within Parrot
and started off by looking at the PXS example (in examples/pxs) but had
problems following the instructions given there as his compiler spat out
errors by the bucket load. Leo Tötsch thought that PXS was probably
deprecated and the native call interface (NCI) was the thing to use.
Being Leo, he provided a port of the PXS Qt example to NCI. Although PXS
appears to be out of sync with the parrot core, nobody was entirely sure
whether it should be removed.



  Bit rot in parrot/language/*
Tupshin Harper had some problems with some of the language examples not
working well with the most recent versions of Parrot. Leo Tötsch
addressed most of the issues he raised, but there are definitely issues
with the interpreter and the languages getting out of sync.





  Macros in IMCC (part 2)
Jürgen Bömmels extended the macro support in IMCC, implementing
".constant" and adding some docs. The patch was promptly applied.



  [RFD] IMCC calling conventions
Leo Tötsch posted an RFD covering his understanding of the various
calling conventions that IMCC would have to deal with which sparked some
discussion. I'm now confused as to whether function calls in Parrot will
be caller saves, callee saves, or some unholy mixture of the two.



  Parrot performance vs the good, the bad and the ugly
Tupshin Harper decided to port primes.pasm to C and Perl 5 to compare
results. Parrot came out very quick indeed. (Close to C). For bonus
points he then took a python primes algorithm that had been ported to C
and ported that to Parrot as well. In full on, all stops pulled out,
knobs turned to 11 mode, Parrot came in at about 50% slower than C and
around 14 times faster than Python. There was some muttering about the
demo being rigged. However, Jim Meyer redid the Perl and Python
implementations to use a loop that duplicated the algorithm used in
primes.pasm and, whilst it improved their performance somewhat, Parrot
was still substantially faster.

This probably won't be the test that's run when Dan and Guido face the
possibility of custard pies at 10 paces or whatever the performance
challenge stake stands at now.



  Mmm... spaceships...
Leon Brocard patched examples/assembly/life.pasm to use a small
spaceship as its starting pattern. Apparently because it 'can provide
hours of enjoyment at conferences if projected onto a big screen while
answering parrot questions.' Nobody thought to ask him how a spaceship
was going to answer Parrot questions, but Leo Tötsch applied the patch.



  Using IMCC as JIT optimizer
Apparently, Leo Tötsch finds it unbearable that 'optimized compiled C is
still faster than parrot -j' so he's been experimenting with adding
smarts to IMCC, making it add hardware register allocation hints to its
emitted bytecode. Sean O'Rourke liked the basic idea, but reckoned that
the information generated by IMCC should really be platform-independent,
suggesting that it'd be okay to pass a control flow graph to the JIT,
but that hardware register allocation for a specific number of registers
would iffy. He suggested that another option would be for IMCC to 'just
rank the Parrot registers in order of decreasing spill cost', then the
JIT could just move the most important parrot registers into
architectural registers.

Dan thought the id

Re: invoke

2003-02-25 Thread Leopold Toetsch
Steve Fink wrote:

On Feb-23, Leopold Toetsch wrote:

I think I kind of have a grasp on what's going on, now. So I've
attached two possible patches. 


I'm currently on the imcc stuff and could have a more detailed look at 
both patches after. But I think, both have the same effect on generated 
code - and while the 2. seems to address the issue at the root of the 
problems, you could just commit it.

leo




Re: Using imcc as JIT optimizer

2003-02-25 Thread Angel Faus
I explained very badly. The issue is not spilling (at the parrot 
level)

The problem is: if you only pick the highest priority parrot registers 
and put them in real registers you are losing oportunities where 
copying the date once will save you from copying it many times. You 
are, in some sense, underspilling.

Let's see an example. Imagine you are compilling this imc, to be run 
in a machine which has 3 registers free (after temporaries):

set $I1, 1
add $I1, $I1, 1
print $I1

set $I2, 1
add $I2, $I2, 1
print $I2

set $I3, 1
add $I3, $I3, 1
print $I3

set $I4, 1
add $I4, $I4, 1
print $I4

set $I5, 1
add $I5, $I5, 1
print $I5

print $I1
print $I2
print $I3
print $I4
print $I5

Very silly code indeed, but you get the idea.

Since we have only 5 vars, imcc would turn this into:

set I1, 1
add I1, I1, 1
print I1

set I2, 1
add I2, I2, 1
print I2

set I3, 1
add I3, I3, 1
print I3

set I4, 1
add I4, I4, 1
print I4

set I5, 1
add I5, I5, 1
print I5

print I1
print I2
print I3
print I4
print I5

Now, assuming you put registers I1-I3 in real registers, what would it 
take to execute this code in JIT?

It would have to move the values of I4 and I5 from memory to registers 
a total of 10 times (4 saves and 6 restores if you assume the JIT is 
smart)

[This particular example could be improved by making the jit look if 
the same parrot register is going to be used in the next op, but 
that's not the point]

But, if IMCC knew that there were really only 3 registers in the 
machine, it would generate:

set I1, 1
add I1, I1, 1
print I1

set I2, 1
add I2, I2, 1
print I2

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 1

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 2

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 3

print I1
print I2
fast_restore I3, 3
print I3
fast_restore I3, 2
print I3
fast_restore I3, 1
print I3

When running this code in the JIT, it would only require 6 moves (3 
saves, 3 restores): exactly the ones generated by imcc. 

In reality this would be even better, because as you have the garantee 
of having the data already in real registers you need less 
temporaries and so have more machine registers free.

> So the final goal could be, to emit these load/stores too, which
> then could be optimized to avoid duplicate loading/storing. Or imcc
> could emit a register move, if in the next instruction the parrot
> register is used again.

Yes, that's the idea: making imcc generate the loads/stores, using the 
info about how many registers are actually available in the real 
machine _and_ its own knowledge about the program flow.

An even better goal would be to have imcc know how many temporaries 
every JITed op requires, and use this information during register 
allocation.

All this is obviously machine dependent: the code generated should 
only run in the machine it was compiled for. So we should always keep 
the original imc code in case we copy the pbc file to another 
machine.

-angel




Re: Using imcc as JIT optimizer

2003-02-25 Thread Leopold Toetsch
Angel Faus wrote:

Saturday 22 February 2003 16:28, Leopold Toetsch wrote:

With your approach there are three levels of parrot "registers": 

- The first N registers, which in JIT will be mapped to physical 
registers.

- The others 32 - N parrot registers, which will be in memory.

- The "spilled" registers, which are also on memory, but will have to 
be copied to a parrot register (which may be a memory location or a 
physical registers) before being used.


Spilling is really rare, you have to work hard, to get a test case :-) 
But when it comes to spilling, we should do some register renumbering 
(which is the case for processor registers too). The current allocation 
is per basic block. When we start spilling, new temp registers are 
created, so that the register life range is limited to the usage of the 
new temp register and the spill code.
This is rather expensive, as for one spilled register, the whole life 
analysis has to be redone.


I believe it would be smarter if we instructed IMCC to generate code 
that only uses N parrot registers (where N is the number of machine 
register available). This way we avoid the risk of having to copy 
twice the data.


I don't think so. When we have all 3 levels of registers, using less 
parrot registers would just produce more spilled registers.

Actually, I'm currently generating code that uses 32+N registers. The 
processor registers are numbered -1, -2 ... for the top used parrot 
registers 0, 1, ... But the processor registers are only fixed mirrors 
of the parrot registers.


This is also insteresting because it gives the register allocation 
algorithm all the information about the actual structure of the 
machine we are going to run in. I am quite confident that code 
generated this way would run faster.


All the normal operations boil down basically to 2 different machine 
instruction types e.g. for some binop :

   _rm or _rr (i386)
   _rrr (RISC arch)
These are surrounded by mov_rm / mov_mr to load/store non mapped 
processor registers from/to parrot registers, the reg(s) are some 
scratch registers then like %eax on i386 or r11/r12 for ppc.

s. e.g. jit/{i386,ppc}/core.jit

So the final goal could be, to emit these load/stores too, which then 
could be optimized to avoid duplicate loading/storing. Or imcc could 
emit a register move, if in the next instruction the parrot register is 
used again.
Then processor specific hints could come in, like:
  shr_rr_i for i386 has to have the shift count in %ecx.

We also need tho have a better procedure for saving and restoring 
spilled registers. Specially in the case of JIT compilation, where it 
could be translated to a machine save/restore.


I don't see much here. Where should the spilled registers be stored then?


What do you think about it?


I think, when it comes to spilling, we should divide the basic block, to 
get shorter life ranges, which would allow register renumbering then.


-angel
leo



Re: Using imcc as JIT optimizer

2003-02-25 Thread Phil Hassey
On Tuesday 25 February 2003 08:51, Leopold Toetsch wrote:
> Angel Faus wrote:
> > Saturday 22 February 2003 16:28, Leopold Toetsch wrote:
> >
> > With your approach there are three levels of parrot "registers":
> >
> > - The first N registers, which in JIT will be mapped to physical
> > registers.
> >
> > - The others 32 - N parrot registers, which will be in memory.
> >
> > - The "spilled" registers, which are also on memory, but will have to
> > be copied to a parrot register (which may be a memory location or a
> > physical registers) before being used.
>
> Spilling is really rare, you have to work hard, to get a test case :-)
> But when it comes to spilling, we should do some register renumbering
> (which is the case for processor registers too). The current allocation
> is per basic block. When we start spilling, new temp registers are
> created, so that the register life range is limited to the usage of the
> new temp register and the spill code.
> This is rather expensive, as for one spilled register, the whole life
> analysis has to be redone.

Not knowing much about virtual machine design...  Here's a question --
Why do we have a set number of registers?  Particularily since JITed code 
ends up setting the register constraints again, I'm not sure why parrot 
should set up register limit constraints first.  Couldn't each code block say 
"I need 12 registers for this block" and then the JIT system would go on to 
do it's appropriate spilling magic with the system registers...

I suspect the answer has something to do with optimized C and not making 
things hairy, but I had to ask anyway.  :)

...

Phil


Re: A couple easy questions...

2003-02-25 Thread David
Leopold Toetsch wrote:

>  From docs/core_ops.pod (built from core.ops):

Thanks. I better upgrade my version, I'm not seeing it in 0.0.9.

-- David Cuny


Re: A couple easy questions...

2003-02-25 Thread David
Leon Brocard wrote:

> You'd be wanting "typeof". 

Thanks.

> ps i fixed your code

Thanks again. :-)

Anyone know about a Parrot Windows binary?

-- David Cuny



Re: [RFC] imcc calling conventions

2003-02-25 Thread Piers Cawley
Jerome Vouillon <[EMAIL PROTECTED]> writes:

> On Tue, Feb 25, 2003 at 08:47:55AM +0100, Leopold Toetsch wrote:
>> >Um... no. tail call optimization implies being able to replace *any*
>> >tail call, not just a recursive one with a simple goto. Consider the
>> >following calling sequence:
>> 
>> 
>> >   b(arg) -> Push Continuation A onto the continuation chain
>> 
>> 
>> Continuations are different anyway. They store the context of the parent 
>> at the point they were created, and on invoke they swap context.
>
> You don't mean the same thing by continuation.  For Piers, a
> continuation is just a place where one can jump. 

No, a continuation is just a place to which one might return.

> So, for a function call, you push the return continuation (the place
> where you must jump on return) onto the stack and jump to the
> continuation corresponding to the function body.  What you mean by a
> continuation is what one may call "first-class continuation", that
> is a continuation which can be manipulated like any other "object"
> (you can pass it as argument of a function, return it from a
> function, put it in a variable, ...)

Which can also be thought of as 'just a place to which one can
return', but you can explicitly alter what one returns.

-- 
Piers


0.1.0

2003-02-25 Thread Leon Brocard
David sent the following bits through the ether:

> Thanks. I better upgrade my version, I'm not seeing it in 0.0.9.

It's been a while since 0.0.9 (errr, 20th Dec). A lot has changed
since then. Maybe it's time for a 0.1.0 release. What are we waiting
for? And why do we have so many version numbers? It'd be nice to have
objects, otherwise we're restricted to toy languages.

Leon
-- 
Leon Brocard.http://www.astray.com/
scribot.http://www.scribot.com/

... Komputors nefer maik erers


Re: 0.1.0

2003-02-25 Thread Jerome Quelin
Leon Brocard wrote:
> It's been a while since 0.0.9 (errr, 20th Dec). A lot has changed
> since then. Maybe it's time for a 0.1.0 release. What are we waiting
> for? 

Dan said: "either exceptions or objects". Once we have one, we'll go to 
0.1.0, and when the second will be implemented (order does not matter), 
we'll go to 0.2.0. Or am I wrong?
Btw, a proper i/o layer would be nice to have, too...

> And why do we have so many version numbers? It'd be nice to have
> objects, otherwise we're restricted to toy languages.

And even toy languages may benefit from objects (yes, I really need 
objects in order to implement -98 version of Befunge, especially since 
I want to include concurrent-funge support). Well, I could use my own 
hand-crafted objects as a list of whatever, but fun would be much 
greater with objects. 

Jerome
-- 
[EMAIL PROTECTED]



Re: 0.1.0

2003-02-25 Thread Simon Glover

On Tue, 25 Feb 2003, Jerome Quelin wrote:

> I want to include concurrent-funge support.

 I'm not even going to ask :-)

 Simon



Re: Using imcc as JIT optimizer

2003-02-25 Thread Angel Faus
Saturday 22 February 2003 16:28, Leopold Toetsch wrote:
> Gopal V wrote:
> > If memory serves me right, Leopold Toetsch wrote:
> >
> >
> > Ok .. well I sort of understood that the first N registers will
> > be the ones MAPped ?. So I thought re-ordering/sorting was the
> > operation performed.
>
> Yep. Register renumbering, so that the top N used (in terms of
> score) registers are I0, I1, ..In-1

With your approach there are three levels of parrot "registers": 

- The first N registers, which in JIT will be mapped to physical 
registers.

- The others 32 - N parrot registers, which will be in memory.

- The "spilled" registers, which are also on memory, but will have to 
be copied to a parrot register (which may be a memory location or a 
physical registers) before being used.

I believe it would be smarter if we instructed IMCC to generate code 
that only uses N parrot registers (where N is the number of machine 
register available). This way we avoid the risk of having to copy 
twice the data.

This is also insteresting because it gives the register allocation 
algorithm all the information about the actual structure of the 
machine we are going to run in. I am quite confident that code 
generated this way would run faster.

We also need tho have a better procedure for saving and restoring 
spilled registers. Specially in the case of JIT compilation, where it 
could be translated to a machine save/restore.

What do you think about it?

-angel



Re: Using imcc as JIT optimizer

2003-02-25 Thread Jason Gloudon
On Tue, Feb 25, 2003 at 07:18:11PM +0100, Angel Faus wrote:
> I believe it would be smarter if we instructed IMCC to generate code 
> that only uses N parrot registers (where N is the number of machine 
> register available). This way we avoid the risk of having to copy 
> twice the data.

It's not going to be very good if I compile code to pbc on an x86 where there
are about 3 usable registers and try to run it on any other CPU with a lot more
registers.

-- 
Jason


Re: Using imcc as JIT optimizer

2003-02-25 Thread Leopold Toetsch
Phil Hassey wrote:

Not knowing much about virtual machine design...  Here's a question --
Why do we have a set number of registers?  Particularily since JITed code 
ends up setting the register constraints again, I'm not sure why parrot 
should set up register limit constraints first.  Couldn't each code block say 
"I need 12 registers for this block" and then the JIT system would go on to 
do it's appropriate spilling magic with the system registers...


This is somehow the approach, the current optimizer in jit.c takes. The 
optimizer looks at a section (a JITed part of a basic block) checks 
register usage and then assigns the top N registers to processor registers.

This has 2 disadvantages:
- its done at runtime - always. It's pretty fast, but could have non 
trivial overhead for big programs
- as each section and therefore each basic block has its own set of 
mapped registers, now on almost every boundary of a basic block and when 
calling out to non JITed code, processor registers have to be saved 
parrot's and restored back again. These memory accesses slow things 
down, so I want to avoid them where possible.


Phil
leo




Re: 0.1.0

2003-02-25 Thread Dan Sugalski
At 4:52 PM + 2/25/03, Leon Brocard wrote:
David sent the following bits through the ether:

 Thanks. I better upgrade my version, I'm not seeing it in 0.0.9.
It's been a while since 0.0.9 (errr, 20th Dec). A lot has changed
since then. Maybe it's time for a 0.1.0 release. What are we waiting
for?
Objects or exceptions. (Or a full I/O layer, or events)

 And why do we have so many version numbers?
Major, minor, point. Reasonably standard, as these things go.

It'd be nice to have
objects, otherwise we're restricted to toy languages.
While I'll call C many things (not all of them repeatable) I'm not 
sure "toy" is one of them. Nor Forth, Fortran, APL, COBOL, Lisp, or 
Basic... :)

Objects are coming, though I've been too pressed for time recently. 
String rework first, then objects.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: 0.1.0

2003-02-25 Thread Leon Brocard
Dan Sugalski sent the following bits through the ether:

> While I'll call C many things (not all of them repeatable) I'm not 
> sure "toy" is one of them. Nor Forth, Fortran, APL, COBOL, Lisp, or 
> Basic... :)

Granted, but those aren't the languages we're interested in. Parrot is
for dynamic languages, and that gives away the fact that objects would
help in their implementation.

> Objects are coming, though I've been too pressed for time recently. 
> String rework first, then objects.

Excellent.

Leon
-- 
Leon Brocard.http://www.astray.com/
scribot.http://www.scribot.com/

... What's brown and sticky? A stick!


Re: Using imcc as JIT optimizer

2003-02-25 Thread Nicholas Clark
On Wed, Feb 26, 2003 at 02:21:32AM +0100, Angel Faus wrote:

[snip lots of good stuff]

> All this is obviously machine dependent: the code generated should 
> only run in the machine it was compiled for. So we should always keep 
> the original imc code in case we copy the pbc file to another 
> machine.

Er, but doesn't that mean that imc code has now usurped the role of parrot
byte code?

I'm not sure what is a good answer here. But I thought that the intent of
parrot's bytecode was to be the same bytecode that runs everywhere. Which
is slightly incompatible with compiling perl code to something that runs
as fast as possible on the machine that you're both compiling and running
on. (These two being the same machine most of the time).

Maybe we starting to get to the point of having imcc deliver parrot bytecode
if you want to be portable, and something approaching native machine code
if you want speed. Or maybe if you want the latter we save "fat" bytecode
files, that contain IMC code, bytecode and JIT-food for one or more
processors.

And is this all premature optimisation, give that we haven't got objects,
exceptions, IO or a Z-code interpreter yet?

Nicholas Clark


Re: 0.1.0

2003-02-25 Thread Piers Cawley
Jerome Quelin <[EMAIL PROTECTED]> writes:
> And even toy languages may benefit from objects (yes, I really need
> objects in order to implement -98 version of Befunge, especially
> since I want to include concurrent-funge support). Well, I could use
> my own hand-crafted objects as a list of whatever, but fun would be
> much greater with objects.

Yeah, I'm waiting for objects before I have a crack at a scheme
interpreter in parrot. Yeah, I *know* there's a scheme compiler in
parrot but I want to do it a different way for giggles.

-- 
Piers


Re: [RFC] imcc calling conventions

2003-02-25 Thread Benjamin Goldberg
Piers Cawley wrote:
[snip]
> Um... no. tail call optimization implies being able to replace *any*
> tail call, not just a recursive one with a simple goto.
[snip]

In perl5, doing a tail call optimization can be done with just a simple
goto... well, 'goto &subname', anyway.  (Well, first you'd assign
something to @_, then goto &subname).

Since (supposedly) there's going to be a perl5->parrot compiler, there
needs to be support for perl5's goto &subname.  ISTM that once we have
figured out an efficient way of implementing that, we'll also have an
efficient way of doing native tail call optimization.

As a wild-ass-guess, an optimized tail call will look something like:

 .sub _foo   # sub foo(int a, int b)
   saveall
   .param int a # a = pop @_
   .param int b # b = pop @_
   ...

   .arg int x # push @_, x
   .arg int u # push @_, u
   .arg int q # push @_, q
   restoreall
   jnsr _bar  # goto &_bar
 .end

 .sub _bar  # sub bar(int q, int u, int x) {
   saveall
   .param int q # q = pop @_
   .param int u # u = pop @_
   .param int x # x = pop @_
   ...

   .return int pl # push @_, pl
   .return int ml # push @_, ml
   restoreall
   ret
 .end

The 'jnsr' opcode (JumpNoSaveReturn) might be spelled as 'j' or as
'goto', or something else entirely, depending on what's least confusing,
and most aesthetically pleasing.

-- 
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print


Re: Using imcc as JIT optimizer

2003-02-25 Thread Leopold Toetsch
Nicholas Clark wrote:

On Wed, Feb 26, 2003 at 02:21:32AM +0100, Angel Faus wrote:

[snip lots of good stuff]


All this is obviously machine dependent: the code generated should 
only run in the machine it was compiled for. So we should always keep 
the original imc code in case we copy the pbc file to another 
machine.

Er, but doesn't that mean that imc code has now usurped the role of parrot
byte code?


No. It's like another runtime option. Run "imcc -Oj the.pasm" and you 
get what you want, a differently optimized piece of JIT code, that might 
run faster then "imcc -j the.pasm".
And saying "imcc -Oj -o the.pbc the.pasm" should spit out the fastest 
bytecode possible, for your very machine.


I'm not sure what is a good answer here. But I thought that the intent of
parrot's bytecode was to be the same bytecode that runs everywhere. 


Yep

... Which
is slightly incompatible with compiling perl code to something that runs
as fast as possible on the machine that you're both compiling and running
on. (These two being the same machine most of the time).


At PBC level, imcc already has "-Op" which does parrot register 
renumbering (modulo NCI and such, where fixed registers are needed, and 
this is -- hmmm suboptimal then :) and imcc can write out CFG 
information in some machine independent form, i.e. at basic block level. 
But no processor specific load/store instructions and such.
This can help JIT optimizer to do the job faster, though it isn't that 
easy, because there are non JITed code sequences intersparsed.

I think some difficulties arise, when looking at, what imcc now is: It's 
the assemble.pl generating PBC files. But it's also parrot, it can run 
PBC files - and it's both - it can run PASM (or IMC) files - 
immediately. And the latter one can be always as fast as the $arch 
allows. Generating PBC doesn't have to use the same compile options - as 
you wouldn't use, when running "gcc -b machine".


Maybe we starting to get to the point of having imcc deliver parrot bytecode
if you want to be portable, and something approaching native machine code
if you want speed. 


IMHO yes, the normal options produce a plain PBC file, more or less 
optimized at PASM level, the -Oj option is definitely a machine 
optimization option, which can run or will create a PBC that runs only 
on a machine with equally or less mapped registers and the same external 
(non JITted instructions) i.e. on the same $arch.
But the normal case is, that I compile the source for my machine and run 
it here - with all possible optimizations.
I never did do any cross compilation here. Shipping the source is 
enough. Plain PBC is still like an unoptimized executable running 
everywhere - not a machine specific cross compile EXE.

... Or maybe if you want the latter we save "fat" bytecode
files, that contain IMC code, bytecode and JIT-food for one or more
processors.


There is really no need for a fat PBC. Though - as already stated - I 
could imagine some cross compile capabilities for -Oj PBCs.


And is this all premature optimisation, give that we haven't got objects,
exceptions, IO or a Z-code interpreter yet?


It is a different approach to JIT register allocation. The current 
optimizer allocates registers per JITed section, with no chance (IMHO) 
to reuse registers after a branch, because the optimizer lacks all the 
information to know, that this branch target will only be reached from 
here, and that the registers are the same, so finally knows, the 
savin/loading processor registers to memory could be avoided.

OTOH imcc has almost all this info already at hand (coming out of 
CFG/life information needed for allocating parrot regs from $temps). So 
the chance for generating faster code is there, IMHO.

Premature optimization - partly of course yes/no:
My copy here runs now all parrot tests except op/interp_2 (obvious, this 
compares traced instructions, where the -Oj inserted some register 
load/saves) and the pmc/nci tests, where just the fixed parameter/return 
result register are mess up - the "imcc calling conventions" thread has 
a proposal for this.
And yes: We don't have exceptions and threads yet. The other items, 
don't matter (IMHO).
But we will come to a point, where for certain languages, we will 
optimize P-registers, or mix them with I-regs, reusing same processor 
regs. :-)


Nicholas Clark
leo



Re: Using imcc as JIT optimizer

2003-02-25 Thread Leopold Toetsch
[ you seem to be living some hors ahead in time ]

Angel Faus wrote:

I explained very badly. The issue is not spilling (at the parrot 
level)


The problem stays the same: spilling processors to parrot's or parrots 
to array.

[ ... ]


set I3, 1
add I3, I3, 1
print I3
fast_save I3, 1

set I3, 1


Above's "fast_save" is spilling at parrot register level and moving regs 
to parrot registers a processor regs level. Actual machine code could be:

mov 1, %eax # first write to a parrot register
inc %eax# add I3, I3, 1 => (*) add I3, 1 => inc I3
mov %eax, I3# store reg to parrot registers mem
print I3# print is external
*) already done now
Above sequence of code wouldn't consume any mapped register - for the 
whole sequence originally shown.


So the final goal could be, to emit these load/stores too, which
then could be optimized to avoid duplicate loading/storing. 

An even better goal would be to have imcc know how many temporaries 
every JITed op requires, and use this information during register 
allocation.


As shown above, yep.


All this is obviously machine dependent: the code generated should 
only run in the machine it was compiled for. So we should always keep 
the original imc code in case we copy the pbc file to another 
machine.


I'l answer this part in the reply to Nicholas reply.


-angel
leo



Re: [RFC] imcc calling conventions

2003-02-25 Thread Piers Cawley
Benjamin Goldberg <[EMAIL PROTECTED]> writes:

> Piers Cawley wrote:
> [snip]
>> Um... no. tail call optimization implies being able to replace *any*
>> tail call, not just a recursive one with a simple goto.
> [snip]
>
> In perl5, doing a tail call optimization can be done with just a simple
> goto... well, 'goto &subname', anyway.  (Well, first you'd assign
> something to @_, then goto &subname).

Ah... this discussion has been done in p5p and elsewhere; whilst goto
&sub could, in theory, do tail call optimization, in practice it seems
to be as slow as any other function call.

>
> Since (supposedly) there's going to be a perl5->parrot compiler, there
> needs to be support for perl5's goto &subname.  ISTM that once we have
> figured out an efficient way of implementing that, we'll also have an
> efficient way of doing native tail call optimization.
>
> As a wild-ass-guess, an optimized tail call will look something like:
>
>  .sub _foo   # sub foo(int a, int b)
>saveall
>.param int a # a = pop @_
>.param int b # b = pop @_
>...
>
>.arg int x # push @_, x
>.arg int u # push @_, u
>.arg int q # push @_, q
>restoreall
>jnsr _bar  # goto &_bar
>  .end
>
>  .sub _bar  # sub bar(int q, int u, int x) {
>saveall
>.param int q # q = pop @_
>.param int u # u = pop @_
>.param int x # x = pop @_
>...
>
>.return int pl # push @_, pl
>.return int ml # push @_, ml
>restoreall
>ret
>  .end
>
> The 'jnsr' opcode (JumpNoSaveReturn) might be spelled as 'j' or as
> 'goto', or something else entirely, depending on what's least confusing,
> and most aesthetically pleasing.

The problem here is that you've pushed two loads of registers onto the
register stack, and the return is only going to pop one set off. And
it'll be the wrong set at that. And you can't add an extra
'restoreall' to _bar because _bar could easily be called normally as
well as via a tailcall.

-- 
Piers