date:20010903

An overview of the Parrot interpreter

2001-09-03 Thread Simon Cozens


Here's the first of a bunch of things I'm writing which should give you
practical information to get you up to speed on what we're going to be doing
with Parrot so we can get you coding away. :) Think of them as having a
Apocalypse->Exegesis relationship to the PDDs. 

I haven't yet finished writing the other documents this refers to yet, but
I'll do those as soon as I can.

As usual, this is a draft, this is not sacred or inviolate. If this raises
more questions than it answers, ask them, and I'll put the answers in the
next release.

---

=head1 An overview of the Parrot intepreter

This document is an introduction to the structure of and the concepts 
used by the Parrot shared bytecode compiler/interpreter system. We will
primarily concern ourselves with the interpreter, since this is the
target platform for which all compiler frontends should compile their
code.

=head1 The Software CPU

Like all interpreter systems of its kind, the Parrot interpreter is
a virtual machine; this is another way of saying that it is a software
CPU. However, unlike other VMs, the Parrot interpreter is designed to
more closely mirror hardware CPUs.

For instance, the Parrot VM will have a register architecture, rather
than a stack architecture. It will also have extremely low-level
operations, more similar to Java's than the medium-level ops of Perl and
Python and the like.

The reasoning for this decision is primarily that by resembling the
underlying hardware to some extent, it's possible to compile down Parrot
bytecode to efficient native machine language. It also allows us to make
use of the literature available on optimizing compilation for hardware
CPUs, rather than the relatively slight volume of information on
optimizing for macro-op based stack machines.

To be more specific about the software CPU, it will contain a large
number of registers. The current design provides for four groups of 32
registers; each group will hold a different data type: integers,
floating-point numbers, strings, and PMCs. (Parrot Magic Cookies,
detailed below.)

Registers will be stored in register frames, which can be pushed and
popped onto the register stack. For instance, a subroutine or a block
might need its own register frame.

=head1 The Operations

The Parrot interpreter has a large number of very low level
instructions, and it is expected that high-level languages will compile
down to a medium-level language before outputting pure Parrot machine
code.

Operations will be represented by several bytes of Parrot machine code;
the first C will specify the operation number, and the remaining
arguments will be operator-specific. Operations will usually be targeted
at a specific data type and register type; so, for instance, the
C takes two Cs as arguments, and decrements contents of the
integer register designated by the first C by the value in the
second C. Naturally, operations which act on C registers will
use Cs for constants; however, since the first argument is almost
always a register B rather than actual data, even operations on
string and PMC registers will take an C as the first argument. 

As in Perl, Parrot ops will return the pointer to the next operation in
the bytecode stream. Although ops will have a predetermined number and
size of arguments, it's cheaper to have the individual ops skip over
their arguments returning the next operation, rather than looking up in
a table the number of bytes to skip over for a given opcode. 

There will be global and private opcode tables; that is to say, an area
of the bytecode can define a set of custom operations that it will use.
These areas will roughly map to compilation units of the original
source; each precompiled module will have its own opcode table.

For a closer look at Parrot ops, see L.

=head1 PMCs

PMCs are roughly equivalent to the C, C and C (and more
complex types) defined in Perl 5, and almost exactly equivalent to
C types in Python. They are a completely abstracted data
type; they may be string, integer, code or anything else. As we will see
shortly, they can be expected to behave in certain ways when instructed
to perform certain operations - such as incrementing by one, converting
their value to an integer, and so on.

The fact of their abstraction allows us to treat PMCs as, roughly
speaking, a standard API for dealing with data. If we're executing Perl
code, we can manufacture PMCs that behave like Perl scalars, and the
operations we perform on them will do Perlish things; if we execute
Python code, we can manufacture PMCs with Python operations, and the
same underlying bytecode will now perform Pythonic activities.

=head1 Vtables

The way we achieve this abstraction is to assign to each PMC a set of
function pointers that determine how it ought to behave when asked to do
various things. In a sense, you can regard a PMC as an object in an
abstract virtual class; the PMC needs a set of metho

Re: Source/Program metadata from within a program

2001-09-03 Thread Michael G Schwern


On Mon, Sep 03, 2001 at 04:56:28PM +0100, Nick Ing-Simmons wrote:
> >The problem is, it appears DATA is only opened if there's an __END__
> >or __DATA__ tag.  I don't remember it working this way...
> >
> >*shrug*  We can fix that easy. :)
> 
> No you can't - you run out of fd's pretty quick if every .pm file you touch
> leaves one open to be seekable...

Simple, just tie it so it only opens upon being used.


OR, and I have no idea why I never thought of this before, instead of
magic filehandles, just peek in %INC and open that file.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl6 Quality Assurance <[EMAIL PROTECTED]>   Kwalitee Is Job One
That which stirs me, stirs everything.
-- Squonk Opera, "Spoon"

Re: An overview of the Parrot interpreter

2001-09-03 Thread Dan Sugalski

On Sun, 2 Sep 2001, Simon Cozens wrote:

> For instance, the Parrot VM will have a register architecture, rather
> than a stack architecture. It will also have extremely low-level
> operations, more similar to Java's than the medium-level ops of Perl and
> Python and the like.

For those of you worrying that parrot will be *just* low-level ops,
don't. There will be medium and high level ops in the set as well. (call
method, start thread, and start dead-object detection, for example) We'll
compile down to what we need to do things fastest.

Dan

Re: An overview of the Parrot interpreter

2001-09-03 Thread Dave Mitchell


Simon Cozens <[EMAIL PROTECTED]> wrote:
> Firstly, a magic number is presented to identify the bytecode file as
> Parrot code.

Hopefully this is followed by a header that has a version number and
lots of useful stuff like flags and offsets and things, just like wot
real object files have :-)

> Next comes the fixup segment, which contains pointers to
> global variable storage and other memory locations required by the main
> opcode segment. On disk, the actual pointers will be zeroed out, and
> the bytecode loader will replace them by the memory addresses allocated
> by the running instance of the interpreter.

Er, are we really having a section of the file full of zeros? Surely
we just need a "BSS size" field in the header that tells the loader how
many bytes to mmap("/dev/zero")?
 
> Similarly, the next segment defines all string and PMC constants used in
> the code. The loader will reconstruct these constants, fixing references
> to the constants in the opcode segment with the addresses of the newly
> reconstructed data.

Will the constants in the file be stored as serialised PMCs or as raw values?

Re: LangSpec: Statements and Blocks

2001-09-03 Thread Davíð Helgason

A few, hopefully relevant thoughts (some of them).

Bryan C. Warnock wrote:
>--
>
>Perl 6 Reference - Statements and Blocks
>(0.1/2001-09-01)

A beauty to behold, this!

>Syntax Overview
>
>Keywords
>continue, do, else, elsif, for, foreach, given, goto, grep, if, last,
>map, next, redo, sort, sub, unless, until, when, while

We will be adding 'try' & 'catch'. 'finally' also? (or 'finalize' :-)

>Conditional Statement Modifiers
>
> 6. [ LABEL: ] expr if expr;
> 7. [ LABEL: ] expr until expr;

Should it be impossible to chain these? I wouldn't very much want to
maintain a program containing many of these, but I miss it occasionally when
hacking away, and I keep adding modifiers and checks. Eventually I want to
modify a modified statement, get annoyed, and make an enclosing block.

>Conditional Block Constructs
[...]
>16. [ LABEL: ] when expr : { block }   # Note 5

>[Note 5. 'when' is only a valid construct when directly within a 'given'
>   construct.]

Someone talked about given(expr) getting to be the same as for $_ (scalar
expr). (Couldn't find that mail in my mailbox, wierd?). Perhaps with an
added check that expr is actually a reference (though I can't really see
why, unless for a bit of good old B&D).

Wouldn't that be nice? That kind of simplicity I find particularly
attractive. I remember when learning perl several years ago, how glad I was
when I discovered how flat and simple everything was, that inheritance was
just qw(-> @ISA) (at least it seemed like that back then) and that parameter
just got passed in a global array (gee, was I wrong!). But I digress...

>Looping Block Constructs
>
>17. [ LABEL: ] while ( expr ) { block } [ continue { block } ]
>18. [ LABEL: ] until ( expr ) { block } [ continue { block } ]
>19. [ LABEL: ] for[each] ( expr; expr; expr )  # Note 4
> { block }

Makes me think how hard it would be to implement these in pure perl.
Something like:

sub my_while ((EXPR $expr) CODE $block ; CODE $continue) {
my $scope = new Scope;
my sub expr = $scope->eval_string($expr);
AGAIN:
goto END if ! expr();
$scope->eval_code($block);
$scope->eval_code($continue) if $continue;
goto AGAIN;
END:
}
sub continue (CODE $block) {
return $block;
}

Ok, I just wanted to demonstrate a funny prototype pattern, the
semi-reimplementation of while() is there because it is sunday and my
girlfriend is visiting her parents (- it is quite broken in regard to
lexical variable visibility. Does this absolutely require syntax filter or a
brand new parser, or can we do something clever?).

Also I find:

sub expr = $code_ref;

pretty cute, having p5 compiletime/runtime semantics similar to:

sub expr; *expr = $code_ref;

>Subroutine Code Blocks # Note 6
>
>21. sub identifier [ ( prototype ) ] [ :properties ] { block }
>22. sub [ ( prototype ) ] { block }# Note 7

Is it on purpose that coderefs can't have properties? This goes back to
variable/value properties. I never liked value properties very much, but
while it seems that ':inline' and ':multi' don't make a lot of sense here,
':memoized' does.

So we should have:
22. sub [ ( prototype ) ] [ :value_properties ] { block }
22. sub [ ( prototype ) ] { block } [ :value_properties ]  # Or better?

best regards,

d.

ps. I don't really know how I can explain, but it filles me with joy to see
the activity on this list!
--
davíð helgason
co-founder & -der
panmedia aps

Re: LangSpec: Statements and Blocks

2001-09-03 Thread Bryan C . Warnock

On Monday 03 September 2001 01:06 pm, Davíð Helgason wrote:

> We will be adding 'try' & 'catch'. 'finally' also? (or 'finalize' :-)

I've not heard anything definite on this.

>
> >16. [ LABEL: ] when expr : { block }   # Note 5
> >
> >[Note 5. 'when' is only a valid construct when directly within a 'given'
> >   construct.]
>
> Someone talked about given(expr) getting to be the same as for $_ (scalar
> expr). (Couldn't find that mail in my mailbox, wierd?). Perhaps with an
> added check that expr is actually a reference (though I can't really see
> why, unless for a bit of good old B&D).

If I understand you correctly, the discussion was that the expression in 
'given ( expr )' will be evaluated in scalar context.

> >
> >21. sub identifier [ ( prototype ) ] [ :properties ] { block }
> >22. sub [ ( prototype ) ] { block }# Note 7
>
> Is it on purpose that coderefs can't have properties? This goes back to
> variable/value properties. I never liked value properties very much, but
> while it seems that ':inline' and ':multi' don't make a lot of sense here,
> ':memoized' does.

Yes, it does make sense to have properties there.  I omitted them because I 
couldn't think of any.  I haven't seen any decision to the contrary, so I 
may add them back in.  (The may not do anything, but they'll be there.  Much 
like the prototypes for it.)

>
> So we should have:
> 22. sub [ ( prototype ) ] [ :value_properties ] { block }
> 22. sub [ ( prototype ) ] { block } [ :value_properties ]  # Or
> better?

The first, as it's more consistent with # 21.  

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: An overview of the Parrot interpreter

2001-09-03 Thread Uri Guttman

> "KF" == Ken Fox <[EMAIL PROTECTED]> writes:

  KF> Simon Cozens wrote:
  >> To be more specific about the software CPU, it will contain a large
  >> number of registers.

  KF> The register frames store values, not pointers to values? If
  KF> there's another level of indirection with registers, I'm not sure
  KF> what the advantage is over just pointing into the heap. Also, heap
  KF> based activation records might be faster and more compact than
  KF> register files.

the registers are the way to make op code args be simple and small
integers which is portable and clean. with a direct heap, you need
pointers or large integers and complex code generation.

the point of registers is to make the op code tree simple to generate
and to use all the older compiler tricks for register machines. there
are no classic heap based compilers to steal from.

  >> As in Perl, Parrot ops will return the pointer to the next operation in
  >> the bytecode stream. Although ops will have a predetermined number and
  >> size of arguments, it's cheaper to have the individual ops skip over
  >> their arguments returning the next operation, rather than looking up in
  >> a table the number of bytes to skip over for a given opcode.

  KF> This seems to limit the implementation possibilities a lot. Won't a
  KF> TIL use direct goto's instead of returning the next op address?

TIL is a different beast. the code layer directly called by the op code
dispatch will be gone in TIL. how goto's and stuff will be handled is
not clear yet. and TIL is not an issue with the register machine, it is
a different back end effectively. it will use the real op functions to
do most of its work along with in line real machine code to call them.

  KF> I'd like to see a description of *just* the opcode stream and have
  KF> a second section describe the protocol for implementing the ops.
  KF> Won't we have separate implementations of the opcode interpreter
  KF> that are optimized for certain machines? (I'd at least like to see
  KF> that possibility!)

there will be support for custom op code dispatch loops. in particular a
event loop version will be made without checking the event flag between
each op. instead it will do that check when event callbacks return. it
should be somewhat faster and more tuned for event style design.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Search or Offer Perl Jobs  --  http://jobs.perl.org

Re: An overview of the Parrot interpreter

2001-09-03 Thread Sam Tregar


On Sun, 2 Sep 2001, Simon Cozens wrote:

> For instance, the Parrot VM will have a register architecture, rather
> than a stack architecture.

s/rather than/as well as/;  # we've got a stack of register frames, right?

> There will be global and private opcode tables; that is to say, an area
> of the bytecode can define a set of custom operations that it will use.
> These areas will roughly map to compilation units of the original
> source; each precompiled module will have its own opcode table.

Side note: this isn't making sense to me.  I'm looking forward to further
explanation!

> If our PMC is a string and has a vtable which implements Perl-like
> string operations, this will return the length of the string. If, on the
> other hand, the PMC is an array, we might get back the number of
> elements in the array. (If that's what we want it to do.)

Ok, so one example of a PMC is a Perl string...

> Parrot provides a programmer-friendly view of strings. The Parrot string
> handling subsection handles all the work of memory allocation,
> expansion, and so on behind the scenes. It also deals with some of the
> encoding headaches that can plague Unicode-aware languages.

Or not!  Are Perl strings PMCs or not?  Why does Parrot want to handle
Unicode?  Shouldn't that go in a specific language's string PMC vtables?

-sam

RE: An overview of the Parrot interpreter

2001-09-03 Thread Brent Dax


Note: this is my understanding of the situation; I could be wrong.

Sam Tregar:
# On Sun, 2 Sep 2001, Simon Cozens wrote:
#
# > For instance, the Parrot VM will have a register
# architecture, rather
# > than a stack architecture.
#
# s/rather than/as well as/;  # we've got a stack of register
# frames, right?

IIRC, that's mostly for when we run out of registers or we're changing
scopes or whatever.  For the most part, it's a register architecture.

# > There will be global and private opcode tables; that is to
# say, an area
# > of the bytecode can define a set of custom operations that
# it will use.
# > These areas will roughly map to compilation units of the original
# > source; each precompiled module will have its own opcode table.
#
# Side note: this isn't making sense to me.  I'm looking
# forward to further
# explanation!

In other words, when you have sub foo {} in your code, it will be
assigned an opcode number in the 'private' section.  The global section
is for things that are built-in to Parrot, while the private section is
for stuff you write.  (Right?)

# > If our PMC is a string and has a vtable which implements Perl-like
# > string operations, this will return the length of the
# string. If, on the
# > other hand, the PMC is an array, we might get back the number of
# > elements in the array. (If that's what we want it to do.)
#
# Ok, so one example of a PMC is a Perl string...

>From what I've seen, PMCs will represent SVs, AVs, and HVs at the opcode
level.  (When you get down to C, it's really SVs and AVs and HVs, but in
bytecode it's all PMCs.)

# > Parrot provides a programmer-friendly view of strings. The
# Parrot string
# > handling subsection handles all the work of memory allocation,
# > expansion, and so on behind the scenes. It also deals with
# some of the
# > encoding headaches that can plague Unicode-aware languages.
#
# Or not!  Are Perl strings PMCs or not?  Why does Parrot want to handle
# Unicode?  Shouldn't that go in a specific language's string
# PMC vtables?

Perl *scalars* are PMCs.  Those PMCs may hold strings within them.
However, string manipulation is done in special string registers, which
are *not* PMCs.

--Brent Dax
[EMAIL PROTECTED]

"...and if the answers are inadequate, the pumpqueen will be overthrown
in a bloody coup by programmers flinging dead Java programs over the
walls with a trebuchet."

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Brent Dax wrote:
> Ken Fox:
> # Lexicals are fundamentally different from Perl's package (dynamically
> # scoped) variables.
> 
> *How* are they "fundamentally different"?

Perl's "local" variables are dynamically scoped. This means that
they are *globally visible* -- you never know where the actual
variable you're using came from. If you set a "local" variable,
all the subroutines you call see *your* definition.

Perl's "my" variables are lexically scoped. This means that they
are *not* globally visible. Lexicals can only be seen in the scope
they are introduced and they do not get used by subroutines you
call. This is safer and a bit easier to use because you can tell
what code does just by reading it.

> But in this case the pad is actually a full symbol table.  The
> concept is the same, the data structure is different.

The concept isn't the same. "local" variables are globals. You
only have *one* of them no matter how many times you call a sub.
For example, threading or recursion might cause the same sub to
have several copies "running" at the same time. With global
variables (aka "local" variables) all the copies share the same
global variables. With "my" variables each copy of the sub gets
a brand new set of variables. This is known as an activation
record. THIS IS COMPLETELY UNRELATED TO A SYMBOL TABLE!

In both cases you have one symbol table. However, in
the case of "my" variables you have *many* values for each
variable. The particular value being used depends upon which
activation record is being used.

> There *is* run-time lookup in some contexts, such as a string eval.

String eval isn't a run-time lookup. The code is *compiled* and
then run. Also notice that string eval can't change the current
lexical scope. It can create a new inner scope, but it can't
introduce variables into the outer scope.

Basically anything that "breaks" scoping barriers goes against
the grain of lexical scoping. If an inner scope can modify its'
parent, you've just destroyed one of the major advantages of
lexical scoping.

We tolerate symbol table globs with "local" variables because
we've already admitted there's no hope of understanding what
something does just by reading the code. We haven't corrupted
"my" yet -- and I don't want to start!

> In the end, all I'm doing is suggesting an alternate implementation
> which should reduce our workload and make many concepts which currently
> don't work with lexicals work correctly.

Your proposal to use "temp" with flags to implement "my" doesn't
even work, let alone achieve either of your goals.

- Ken

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 05:44 PM 9/3/2001 -0400, Ken Fox wrote:
>Brent Dax wrote:
> > What I'm suggesting is that, instead of the padlist's AV containing
> > arrays, it should contain stashes, otherwise indistinguishable from
> > the ones used for global variables.
>
>Lexicals are fundamentally different from Perl's package (dynamically
>scoped) variables. Even if you could somehow hack package variables
>to implement lexicals, it would waste space duplicating a symbol table
>for each instance of a lexical scope.

No, actually, they're not.

The big difference between lexical variables and package variables is that 
lexicals are looked up by stash offset and package variables are looked up 
by name. (Okay, there are a few minor details beyond that, but that's 
really the big one) There really isn't anything special about a stash. All 
it is is a hash perl thinks contains variable names. (And it has GVs 
instead of [SHA]Vs, but once again a trivial difference, and one going away 
for perl 6)

The real question, as I see it, is "Should we look lexicals up by name?" 
And the answer is Yes. Larry's decreed it, and it makes sense. (I'm 
half-tempted to hack up something to let it be done in perl 5--wouldn't 
take much work)

The less real question, "Should pads be hashes or arrays", can be answered 
by "whichever is ultimately cheaper". My bet is we'll probably keep the 
array structure with embedded names, and do a linear search for those rare 
times you're actually looking by name.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Brent Dax wrote:
> What I'm suggesting is that, instead of the padlist's AV containing
> arrays, it should contain stashes, otherwise indistinguishable from
> the ones used for global variables.

Lexicals are fundamentally different from Perl's package (dynamically
scoped) variables. Even if you could somehow hack package variables
to implement lexicals, it would waste space duplicating a symbol table
for each instance of a lexical scope.

> The simple way to emulate this is to make sure that no subroutine
> can see another's MY:: stash.

Right. Sounds a lot like a pad to me -- each instance of a scope (sub)
gets its' own copy of the variables. (By instance I mean activation
record, not the symbol table definition.)

> There is a possible caveat with inner blocks--how does an outer block
> get, er, blocked from accessing an inner block's my() variables?
> However, I think this isn't really that big a problem, and can easily be
> solved with properties:

You don't understand lexically scoped variables. There isn't
any run-time name lookup on a variable -- the compiler resolves all
access to a specific memory location (or offset). All your fancy
symbol table flag twiddling is slow, unreliable and unnecessary.

- Ken

Re: An overview of the Parrot interpreter

2001-09-03 Thread Dan Sugalski

At 06:37 PM 9/3/2001 -0400, Sam Tregar wrote:
>On Sun, 2 Sep 2001, Simon Cozens wrote:
>
> > For instance, the Parrot VM will have a register architecture, rather
> > than a stack architecture.
>
>s/rather than/as well as/;  # we've got a stack of register frames, right?

Well, register in the sense that most cpus are register machines. They've 
still got stacks, but...

> > There will be global and private opcode tables; that is to say, an area
> > of the bytecode can define a set of custom operations that it will use.
> > These areas will roughly map to compilation units of the original
> > source; each precompiled module will have its own opcode table.
>
>Side note: this isn't making sense to me.  I'm looking forward to further
>explanation!

Basically chunks of perl code can define opcodes on the fly--they might be 
perl subs that meet the proper critera, or opcode functions defined by C 
code with magic stuck in the parser, or wacky optimizer extensions or 
whatever. There won't be a single global table of these, since we can 
potentially be loading in precompiled code. (Modules, say) Each 
"compilation unit" has its own table of opcode number->function maps.

If you want to think of it C-ishly, each object module would have its own 
opcode table.

> > If our PMC is a string and has a vtable which implements Perl-like
> > string operations, this will return the length of the string. If, on the
> > other hand, the PMC is an array, we might get back the number of
> > elements in the array. (If that's what we want it to do.)
>
>Ok, so one example of a PMC is a Perl string...

Nope. Perl scalar. Strings are lower-level, and a little different.

> > Parrot provides a programmer-friendly view of strings. The Parrot string
> > handling subsection handles all the work of memory allocation,
> > expansion, and so on behind the scenes. It also deals with some of the
> > encoding headaches that can plague Unicode-aware languages.
>
>Or not!  Are Perl strings PMCs or not?  Why does Parrot want to handle
>Unicode?  Shouldn't that go in a specific language's string PMC vtables?

Strings are a step below PMCs. And Parrot knows about Unicode because it's 
the least sucky common denominator.

We're not forcing unicode everywhere on purpose. Tried that with perl 5, 
worked OK, but it has some expense, and forces some rude cultural issues.

Most of the non-US world has their own private data encoding methods they 
like just fine. Big5 Traditional, Big5 Simplified, Shift-JIS, EBCDIC, and 
any of a half-zillion variants of ASCII (all with a different set of 
characters in the high 128 slots) work fine for people. If we abstract 
things out a bit so that perl doesn't have to care much, we win in places 
we don't know. This code:

   while (<>) {
 s/$foo/$bar/;
print;
   }

is cheap on an ASCII machine, but imagine how expensive it'll be if ARGV 
has shift-JIS data sources. We need to transform to Unicode then back 
again, and risk mangling the data in the process. Bleah.

"Everything Unicode" more or less says to the non-Unicode world (i.e. 
everyone except maybe Western Europe and North America) "Boy you suck, but 
I guess we'll make some token accommodations for you."

You can imagine how well that one will go over...

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Internals Plan

2001-09-03 Thread Bryan C . Warnock


(From this week's summary, plus some additional info)
---
The next phase of Parrot will be a code review - for the Perl internals
community to poke and prod and make sense of what Dan and Simon have done.
The community will provide feedback, and Dan and Simon will disappear for
a brief period, before the code is opened up for development.
 
After going public, work will mostly progress according to Dan's To Do list.
---

The code review will begin once we have the necessary CVS and bug-tracking 
infrastructure in place.  (We are not going to use SourceForge.)  Once that 
is done, the code-review phase will begin with anonymous read-only access to 
the Perl CVS repository, routine snapshots on dev.perl.org, and the 
occasional build (if and when it actually does:) on CPAN.

The plan for when development has been opened up is a bridge that we'll 
cross once we build it.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: An overview of the Parrot interpreter

2001-09-03 Thread Nathan Torkington

Sam Tregar writes:
> > If our PMC is a string and has a vtable which implements Perl-like
> > string operations, this will return the length of the string. If, on the
> > other hand, the PMC is an array, we might get back the number of
> > elements in the array. (If that's what we want it to do.)
> 
> Ok, so one example of a PMC is a Perl string...

If you grok vtables, think of a PMC as the thing a vtable hangs off.

Another way to think of it is that a PMC is an object.  To the outside
(the interpreter that is manipulating data values) its contents are
opaque.  All you can do is call methods (vtable entries) on it.

So if you have an object/PMC that implements a string, the "length"
method/vtable-entry will return the length of the string.  An
object/PMC that implements an array, the "length" method/vtable-entry
will return the number of things in the array.

Nat

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Brent Dax


Note: some parts of this may seem a bit like a flame.  This is
unintentional.

Ken Fox:
# Brent Dax wrote:
# > What I'm suggesting is that, instead of the padlist's AV containing
# > arrays, it should contain stashes, otherwise indistinguishable from
# > the ones used for global variables.
#
# Lexicals are fundamentally different from Perl's package (dynamically
# scoped) variables. Even if you could somehow hack package variables
# to implement lexicals, it would waste space duplicating a symbol table
# for each instance of a lexical scope.

*How* are they "fundamentally different"?  You keep saying this, but I
don't see it.  In *functionality* what is the "fundamental difference"
between package and lexical variables?

# > The simple way to emulate this is to make sure that no subroutine
# > can see another's MY:: stash.
#
# Right. Sounds a lot like a pad to me -- each instance of a scope (sub)
# gets its' own copy of the variables. (By instance I mean activation
# record, not the symbol table definition.)

But in this case the pad is actually a full symbol table.  The concept
is the same, the data structure is different.

# > There is a possible caveat with inner blocks--how does an
# outer block
# > get, er, blocked from accessing an inner block's my() variables?
# > However, I think this isn't really that big a problem, and
# can easily be
# > solved with properties:
#
# You don't understand lexically scoped variables. There isn't
# any run-time name lookup on a variable -- the compiler resolves all
# access to a specific memory location (or offset). All your fancy
# symbol table flag twiddling is slow, unreliable and unnecessary.

There *is* run-time lookup in some contexts, such as a string eval.
(Currently symbolic references are not another case--symrefs only work
with globals--but I think they should work with lexicals too.)

In the end, all I'm doing is suggesting an alternate implementation
which should reduce our workload and make many concepts which currently
don't work with lexicals work correctly.  If there are big, huge
problems with the alternate implementation I'm proposing, please explain
them to me.

--Brent Dax
[EMAIL PROTECTED]

"...and if the answers are inadequate, the pumpqueen will be overthrown
in a bloody coup by programmers flinging dead Java programs over the
walls with a trebuchet."

Re: An overview of the Parrot interpreter

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> For those of you worrying that parrot will be *just* low-level ops,
> don't. There will be medium and high level ops in the set as well.

I was going to cite ,
but you guys have already read that, eh? ;)

One thing I was expecting was "bytecode subroutines are given opcodes."
Will there be a difference? Can we replace "built-in" opcodes with Perl
subs?

- Ken

Re: An overview of the Parrot interpreter

2001-09-03 Thread Ken Fox

Thanks for the info. If you guys maintain this level of documentation
as the code develops, Perl 6 will be easy for first-timers to work on.
One goal down, N-1 to go... ;)

Simon Cozens wrote:
> To be more specific about the software CPU, it will contain a large
> number of registers.

The register frames store values, not pointers to values? If
there's another level of indirection with registers, I'm not sure
what the advantage is over just pointing into the heap. Also, heap
based activation records might be faster and more compact than
register files.

> As in Perl, Parrot ops will return the pointer to the next operation in
> the bytecode stream. Although ops will have a predetermined number and
> size of arguments, it's cheaper to have the individual ops skip over
> their arguments returning the next operation, rather than looking up in
> a table the number of bytes to skip over for a given opcode.

This seems to limit the implementation possibilities a lot. Won't a
TIL use direct goto's instead of returning the next op address?

I'd like to see a description of *just* the opcode stream and have a
second section describe the protocol for implementing the ops.
Won't we have separate implementations of the opcode interpreter
that are optimized for certain machines? (I'd at least like to see
that possibility!)

> =head1 Vtables
> 
> The way we achieve this abstraction is to assign to each PMC a set of
> function pointers that determine how it ought to behave when asked to do

Damian's proposals for multi-methods have got me thinking there
should be one very fast implementation of multi-method dispatching
used at the opcode level. It might help solve some of our math and
string problems dealing with mixed operand types.

- Ken

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 08:09 PM 9/3/2001 -0400, Ken Fox wrote:
>Dan Sugalski wrote:
> > At 05:44 PM 9/3/2001 -0400, Ken Fox wrote:
> > > Lexicals are fundamentally different from Perl's package (dynamically
> > > scoped) variables.
> >
> > No, actually, they're not.
>
>How can you possibly think that lexical scoping and dynamic scoping
>are not fundamentally different?!

I don't. However, stashes and pads aren't (or don't have to be) 
fundamentally different.

> > > Even if you could somehow hack package variables
> > > to implement lexicals, it would waste space duplicating a symbol table
> > > for each instance of a lexical scope.
>
> > The big difference between lexical variables and package variables is that
> > lexicals are looked up by stash offset and package variables are looked up
> > by name.
>
>Right. So how is a hash table smaller than an array?

It's not. They're bigger, pretty much by definition. So?

> > The real question, as I see it, is "Should we look lexicals up by name?"
> > And the answer is Yes. Larry's decreed it, and it makes sense. (I'm
> > half-tempted to hack up something to let it be done in perl 5--wouldn't
> > take much work)
>
>Where is the sense in this? Certainly the compiler will look things
>up by name, but the run-time doesn't need to.

The runtime does have to. For several reasons.

First, of course, runtime and compiletime are mixed on perl. String eval 
has to go walk back up the pads *at runtime* and resolve the variable names.

Second, Larry's decreed you'll be able to look up lexicals by name using 
the MY hash. And look up at outer levels. How do you plan to look up 
variables by name when you're peering outside your compilation unit? With a 
key that can be resolved only at runtime?

> > The less real question, "Should pads be hashes or arrays", can be answered
> > by "whichever is ultimately cheaper". My bet is we'll probably keep the
> > array structure with embedded names, and do a linear search for those rare
> > times you're actually looking by name.
>
>That doesn't sound like we're looking up by name at all... It
>sounds like the compiler is emiting frame pointer offsets, but
>there's a pointer to the symbol table stored in the frame just
>in case something wants to see the names.

It sounds like "Pads might be arrays with names embedded in the slots, or a 
hash, whichever's more efficient". I'm not sure which would be, but I've a 
good idea. :)

>That's a huge difference over emulating "my" with "temp" like
>what was originally proposed!

That, oddly enough, is doable with enough compiler work. A silly thing, but 
doable. It's irrelevant to the question as it had evolved, namely "what's 
the difference between a stash and a pad?" We could, after all, associate a 
new stash with each level of lexical scope if we were so inclined. Or make 
stashes and pads identical under the hood and just reference them differently.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: An overview of the Parrot interpreter

2001-09-03 Thread Bryan C . Warnock


On Monday 03 September 2001 08:06 pm, Sam Tregar wrote:
> I think I understand this.  What I don't understand is how this relates to
> the next section about Parrot's special relationship with strings.  If
> Parrot has a "string" type and string handling functions, why use a PMC
> to implement a string?  What does it mean to have a PMC that "implements a
> string" and also have a "string type" in Parrot?

An opcode may call a vtable method on a PMC that is string manipulation 
intensive.  That vtable code could then use the string registers for 
efficiency.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Brent Dax


# -Original Message-
# From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
# Sent: Monday, September 03, 2001 4:31 PM
# To: Ken Fox; Brent Dax
# Cc: Simon Cozens; [EMAIL PROTECTED]
# Subject: Re: Should MY:: be a real symbol table?
#
# >Lexicals are fundamentally different from Perl's package
...
# No, actually, they're not.
#
# The big difference between lexical variables and package
# variables is that
# lexicals are looked up by stash offset and package variables
# are looked up
# by name. (Okay, there are a few minor details beyond that, but that's
# really the big one) There really isn't anything special about
# a stash. All
# it is is a hash perl thinks contains variable names. (And it has GVs
# instead of [SHA]Vs, but once again a trivial difference, and
# one going away
# for perl 6)
#
# The real question, as I see it, is "Should we look lexicals
# up by name?"
# And the answer is Yes. Larry's decreed it, and it makes sense. (I'm
# half-tempted to hack up something to let it be done in perl
# 5--wouldn't
# take much work)
#
# The less real question, "Should pads be hashes or arrays",
# can be answered
# by "whichever is ultimately cheaper". My bet is we'll
# probably keep the
# array structure with embedded names, and do a linear search
# for those rare
# times you're actually looking by name.

Yay, someone understood what I was saying!  :^)

As far as expensiveness, I think this can be just as fast as our current
offset-into-the-pad method.

If we allocate the stash at compile time (so the HEs don't change), we
can resolve lexicals down to the HE.  In essence, the HE would be
serving the job a GV does in Perl 5 for globals, or an offset does for
lexicals on array-of-array pads--indirection.  (Obviously this would be
in the fixup section in pre-compiled code.)

For those who don't understand my ravings:

sub foo { my($bar, @baz); ... }

becomes:

CV {
refcount --> 1
opcodes ---> ...
padstash --+
...|
}  |
   |
STASH { <+
HE (Hash Entry) { (0x1)
key -> '$bar'
value ---> SV *
...
}

HE { (0x2)
key > '@baz'
value --> SV *
...
}
...
}

At compile-time, we can allocate and fill the stash.  Then, _still in
compile time_, we determine which HE will contain the value.  For
example, we know that the value slot of the hash entry at 0x1 will
contain the SV currently representing $bar.

Now, we can change the actual SV containing the current value of $bar at
will.  As long as the HE doesn't change, we're safe.

Since we're now looking up our variable names in a hash instead of an
array (remember, Perl hashes are only about 15% slower than arrays),
when we do have to look up a lexical at runtime we avoid an expensive
linear search.  (I don't know how the offsets are determined at
compile-time in Perl 5, but if they're also determined by a linear
search, we'll make compilation more efficient too.)

Obviously, the current array-of-array pads are more compact than a
stash; however, I doubt that will be a serious issue.

~~

As far as the temp() thing I mentioned earlier, compare these two pieces
of code:

sub factorial {
my($x)=shift;
return 1 if($x==1);
return $x * factorial($x-1);
}

sub factorial {
temp($x)=shift;
return 1 if($x==1);
return $x * factorial($x-1);
}

These subroutines recurse.  However, neither sub gets confused and tries
to modify another stack frame's $x.  In the second sub, *temp() is just
a mechanism to get a new $x*.  That's what I was talking about--I was
trying to draw an analogy between existing functionality and my
proposal.

If this point is still confusing, contact me privately and I can explain
it in more detail; if I get a bunch of requests I'll post it to the
group.

--Brent Dax
[EMAIL PROTECTED]

"...and if the answers are inadequate, the pumpqueen will be overthrown
in a bloody coup by programmers flinging dead Java programs over the
walls with a trebuchet."

Re: An overview of the Parrot interpreter

2001-09-03 Thread Sam Tregar

On Mon, 3 Sep 2001, Dan Sugalski wrote:

> Basically chunks of perl code can define opcodes on the fly--they might be
> perl subs that meet the proper critera, or opcode functions defined by C
> code with magic stuck in the parser, or wacky optimizer extensions or
> whatever. There won't be a single global table of these, since we can
> potentially be loading in precompiled code. (Modules, say) Each
> "compilation unit" has its own table of opcode number->function maps.
>
> If you want to think of it C-ishly, each object module would have its own
> opcode table.

Ok, I think I understand.  This is some kind of code-compression hack to
avoid using a "call" opcode all over the place, right?

Speaking of soubroutines, what is Parrot's calling conventions?  Obviously
we're no long in PUSH/POP land...

> >Or not!  Are Perl strings PMCs or not?  Why does Parrot want to handle
> >Unicode?  Shouldn't that go in a specific language's string PMC vtables?
>
> Strings are a step below PMCs.

I feel like I almost understand this.  So when you call the length()
vtable method on a PMC representing a Perl scalar, the length op is
eventually going to call another length() op, this time on an underlying
Parrot string.  Right?

I'm still not sure I understand why Parrot is doing string ops at all.  Do
all our target languages have identical semantics for string operations?
When you bring Unicode into the mix I start to wonder.

-sam

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 05:30 PM 9/3/2001 -0700, Brent Dax wrote:
>As far as expensiveness, I think this can be just as fast as our current
>offset-into-the-pad method.

I was speaking in both speed and memory use when I was talking about 
expense. We'd need to maintain a hash structure for each pad, plus we'd 
need to either design the hash structure such that it didn't need absolute 
addresses (so we could build it at compile time, which could be a long time 
before runtime with a disk freeze or two and an FTP in the interim), or 
we'd need to patch the addresses up at runtime when we allocated a new pad.

I'm not convinced the memory usage, and corresponding time to clone and/or 
set up the hash-based pad, is worth the relatively infrequent by-name 
access to variables in the pad. I could be wrong, though. We'll have to try 
it and see. (Shouldn't affect the bytecode, however, so we can try 
different methods and benchmark them as need be)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Where's da code?

2001-09-03 Thread Nathan Torkington


Simon and Dan have been teasing you, I know.  I'm holding them back
from releasing what they have until:

1) We have a CVS server running.  (Ask has done this and is now
   working on anonymous access).

2) We have a bugtracking system.  (We're currently thinking of
   Bugzilla).

3) We have a build tracking system.  I believe Ask is working on
   Tinderbox.

4) We have some "bugs" to put into the bugtracking system.  That is,
   some initial projects that people can hack on.  Simon's been coming
   up with a good To Do list that we'll seed the bugtracking system
   with.

I want to avoid the situation where we announce code and then chaos
erupts.  If there's going to be lots of frenzied hacking, I want to
make sure it is appropriately focused.  Think of this as the lesson
I've learned from bootstrap (and from watching how Mono started up).

There'll be a CVS server for those who want bleeding edge source.
We'll put daily build snapshots onto an FTP/http server, and any time
we have a particularly stable/featureful build, that'll go onto CPAN.

Let me know if I've forgotten anything.  If all goes well, we should
see the first source release towards the end of this week.  Fingers
crossed!

Nat

Re: An overview of the Parrot interpreter

2001-09-03 Thread Dan Sugalski

At 08:19 PM 9/3/2001 -0400, Sam Tregar wrote:
>On Mon, 3 Sep 2001, Dan Sugalski wrote:
>
> > Basically chunks of perl code can define opcodes on the fly--they might be
> > perl subs that meet the proper critera, or opcode functions defined by C
> > code with magic stuck in the parser, or wacky optimizer extensions or
> > whatever. There won't be a single global table of these, since we can
> > potentially be loading in precompiled code. (Modules, say) Each
> > "compilation unit" has its own table of opcode number->function maps.
> >
> > If you want to think of it C-ishly, each object module would have its own
> > opcode table.
>
>Ok, I think I understand.  This is some kind of code-compression hack to
>avoid using a "call" opcode all over the place, right?

No, more a "try and leave the bytecode sections read-only" hack.

Imagine, if you will, building LWP and bytecode compiling it. It uses 
private opcodes 1024-1160. Then you later build, say, MIME::Lite, which 
uses opcodes 1024-1090. (Since they know nothing of one another there's no 
reason they should coordinate their opcode # usage, and whatever case 
someone might make for coordinating I can shoot too-large holes in, alas) 
If there's only one table, you collide. Or you renumber the opcodes, which 
means the module bytecode can't be read-only. Both are icky.

>Speaking of soubroutines, what is Parrot's calling conventions?  Obviously
>we're no long in PUSH/POP land...

Up until now, I didn't know, so consider yourself the first to find out. :)

* Integer, String, and Number registers 0-x are used to pass parameters 
when the compiler calls routines.

* PMC registers 0-x are used to pass parameters *if* the sub has a 
prototype. If the sub does *not* have a prototype, a list is created and 
passed in PMC register 0.

* Subs may have variable number, or unknown number, of PMC parameters. 
(Basically Parrot variables) They may *not* take a variable or unknown 
number of integer, string, or number parameters.

* Subs may not change prototypes

* Sub prototypes must be known at compile time. (I.e. by the end of the 
primary compilation phase, and before mainline run time. Basically the 
equivalent to the end of BEGIN or beginning of CHECK phase)

* Methods get their parameters passed in as a list in PMC register 0, 
unless we can unambiguously figure out their prototype at compilation time

* The callee is responsible for saving any registers he/she/it messes with

* If the return is a list, it goes into PMC register 0. If the return is 
prototyped it uses the same rules as for calling.

Don't consider this list final until I've had a chance to run it past 
Larry. He might be thinking of allowing prototypes to change, or spring 
into existance relatively late in the game. (In which case we probably get 
a call_in_list and call_in_registers form of sub call)

> > >Or not!  Are Perl strings PMCs or not?  Why does Parrot want to handle
> > >Unicode?  Shouldn't that go in a specific language's string PMC vtables?
> >
> > Strings are a step below PMCs.
>
>I feel like I almost understand this.  So when you call the length()
>vtable method on a PMC representing a Perl scalar, the length op is
>eventually going to call another length() op, this time on an underlying
>Parrot string.  Right?

Right. Not an op, really, just the length function in the string's table of 
functions.

>I'm still not sure I understand why Parrot is doing string ops at all.  Do
>all our target languages have identical semantics for string operations?
>When you bring Unicode into the mix I start to wonder.

All our target languages don't have identical semantics, hence the 
abstraction. Generally speaking there are a few things that can be 
abstracted (comparison and casing, say), and a lot where the details of the 
string data is mostly irrelevant (regex matching is pretty much string 
neutral--a character is a character, and all it needs is the info on how 
big each character is).

We aren't going to have a perfect interface. That much I'm positive of. 
(This is engineering--there is no perfection. You weigh the costs and 
tradeoffs, and get the mix that works as best you can for your target 
problem) We will have a reasonably OK one, though, and one that'll deal 
with string data as well as, or better than, what we have now.

I could rant about the fun inherent in string encoding, character sets, and 
language issues for a while if you like. :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: An overview of the Parrot interpreter

2001-09-03 Thread Sam Tregar

On Mon, 3 Sep 2001, Dan Sugalski wrote:

> >avoid using a "call" opcode all over the place, right?
>
> No, more a "try and leave the bytecode sections read-only" hack.
>
> Imagine, if you will, building LWP and bytecode compiling it. It uses
> private opcodes 1024-1160. Then you later build, say, MIME::Lite, which
> uses opcodes 1024-1090.

I was referring to the practice of having compilation units create private
opcodes.  Am I wrong in thinking this is a new technique deserving of an
excuse for existence?

> Up until now, I didn't know, so consider yourself the first to find out. :)

I'm honored...

> * Integer, String, and Number registers 0-x are used to pass parameters
> when the compiler calls routines.

s/compiler/interpreter/, right?

> * Subs may have variable number, or unknown number, of PMC parameters.
> (Basically Parrot variables) They may *not* take a variable or unknown
> number of integer, string, or number parameters.

I don't understand this restriction.  Won't it make implementing variadic
functions more difficult?

> Don't consider this list final until I've had a chance to run it past
> Larry. He might be thinking of allowing prototypes to change, or spring
> into existance relatively late in the game. (In which case we probably get
> a call_in_list and call_in_registers form of sub call)

Or those other language designers you're wooing, right?  The prototype
stuff sounds pretty Perl specific.

-sam

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> First, of course, runtime and compiletime are mixed on perl. String eval
> has to go walk back up the pads *at runtime* and resolve the variable names.

Sure, if you use eval, the symbol table for the current scope
needs to be available. There's no reason to have more than
one though -- all instances of an activation record share
the same symbol table.

For the common case when we aren't debugging and a sub doesn't
use string eval, it would be nice to use less memory and drop
the scope's symbol table.

> Second, Larry's decreed you'll be able to look up lexicals by name using
> the MY hash. And look up at outer levels. How do you plan to look up
> variables by name when you're peering outside your compilation unit? With a
> key that can be resolved only at runtime?

I've searched for the definition of %MY, but all I can find is
a reference to a pseudo class MY. I don't read those as being the
same thing. It seems like "pseudo class MY" is intended as a
compiler extension API. The %MY hash could sort of do that (if
we wrapped all the access with BEGIN blocks), but it doesn't
feel very consistent with lexical variables.

Is %MY (with attributes presumably) really going to be the API
to the symbol table? Why don't we just have *one* API to the
symbol table and use attributes to differentiate between dynamic
and lexical scoping?

> >That's a huge difference over emulating "my" with "temp" like
> >what was originally proposed!
> 
> That, oddly enough, is doable with enough compiler work. A silly thing, but
> doable. It's irrelevant to the question as it had evolved, namely "what's
> the difference between a stash and a pad?" We could, after all, associate a
> new stash with each level of lexical scope if we were so inclined. Or make
> stashes and pads identical under the hood and just reference them differently.

These things behave totally different in a closure though.
A "temp" variable must have its' global binding restored when
returning to the caller. But that leaves closures referencing
the wrong variable.

Anyways, since you say "oddly enough" and "silly thing", I suspect
that you're aren't doing it this way. ;)

- Ken

Re: An overview of the Parrot interpreter

2001-09-03 Thread Dan Sugalski

At 09:26 PM 9/3/2001 -0400, Sam Tregar wrote:
>On Mon, 3 Sep 2001, Dan Sugalski wrote:
>
> > >avoid using a "call" opcode all over the place, right?
> >
> > No, more a "try and leave the bytecode sections read-only" hack.
> >
> > Imagine, if you will, building LWP and bytecode compiling it. It uses
> > private opcodes 1024-1160. Then you later build, say, MIME::Lite, which
> > uses opcodes 1024-1090.
>
>I was referring to the practice of having compilation units create private
>opcodes.  Am I wrong in thinking this is a new technique deserving of an
>excuse for existence?

Not as such, no. Larry's idea is that calling most subs should be pretty 
much the same as calling most ops, in which case there's no reason for them 
to be different. It also means that we can yank out great chunks of what 
perl currently considers ops (socket functions are the big example) into 
loadable modules (read: extensions) that can be loaded and used as needed.

It also means that, if sub calls still end up being more expensive than op 
calls, that modules which add new ops are possible. Load 'em in and they 
patch the parser and add their op functions to the op table.

I'm not entirely sure how much this'll be used, but I really, *really* want 
to be able to call any sub that qualifies as an op rather than as a sub.

> > * Integer, String, and Number registers 0-x are used to pass parameters
> > when the compiler calls routines.
>
>s/compiler/interpreter/, right?

Yup. Thinko there.

> > * Subs may have variable number, or unknown number, of PMC parameters.
> > (Basically Parrot variables) They may *not* take a variable or unknown
> > number of integer, string, or number parameters.
>
>I don't understand this restriction.  Won't it make implementing variadic
>functions more difficult?

Varadic functions that take actual integers, floats, or strings, yes. 
Varadic functions that take parrot variables (i.e. PMCs) no. That's what 
many perl functions'll be doing anyway. Those are unprototyped and just get 
a list passed as their first (and only) parameter in PMC register 0.

> > Don't consider this list final until I've had a chance to run it past
> > Larry. He might be thinking of allowing prototypes to change, or spring
> > into existance relatively late in the game. (In which case we probably get
> > a call_in_list and call_in_registers form of sub call)
>
>Or those other language designers you're wooing, right?  The prototype
>stuff sounds pretty Perl specific.

The prototype stuff's actually non-perl-specific in a lot of ways, 
'specially considering how lax perl tends to be about calling conventions 
and such. Most languages define a fixed set of named parameters rather than 
a varadic list. (Which they tend to handle badly--like, say, C, whose 
vararg handling sucks dead badgers through 12" metal conduit)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: An overview of the Parrot interpreter

2001-09-03 Thread Sam Tregar


On Mon, 3 Sep 2001, Dan Sugalski wrote:

> I'm not entirely sure how much this'll be used, but I really, *really* want
> to be able to call any sub that qualifies as an op rather than as a sub.

What would a sub have to do (be?) to qualify?

> >I don't understand this restriction.  Won't it make implementing variadic
> >functions more difficult?
>
> Varadic functions that take actual integers, floats, or strings, yes.
> Varadic functions that take parrot variables (i.e. PMCs) no.

Right, so why make the former hard?  Is there an upside to the
restriction?

-sam

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

"Bryan C. Warnock" wrote:
> Except that Perl 6 will have the ability to inject lexical variables in its
> scope, and in any dynamic parent's scope.  (It isn't clear whether that is
> write-only access or not - which it probably should be for lexicals.)
> 
> That, invariably, forces at least some run-time lookup by name, since the
> lexicals aren't there at compile time for the early resolution.

Please, please, please say that this is not like Tcl's upvar/uplevel
stuff.

I was imagining the "injection" to happen *only* at compile time
when a "use" statement replaced the scope management object the
compiler talks to.

I *really* don't like the idea of a called sub modifying its' parent's
scope at run time. This fundamentally breaks the feature of understanding
code just by reading it. When I see a lexical, I want to trust the
definition.

- Ken

Re: An overview of the Parrot interpreter

2001-09-03 Thread Dan Sugalski

At 09:41 PM 9/3/2001 -0400, Sam Tregar wrote:
>On Mon, 3 Sep 2001, Dan Sugalski wrote:
>
> > I'm not entirely sure how much this'll be used, but I really, *really* want
> > to be able to call any sub that qualifies as an op rather than as a sub.
>
>What would a sub have to do (be?) to qualify?

It'd have to have a single return value, and a fixed, known number of 
parameters, probably less than three, though I suppose there's no real 
reason not to allow any number that'd fit in the register file.

> > >I don't understand this restriction.  Won't it make implementing variadic
> > >functions more difficult?
> >
> > Varadic functions that take actual integers, floats, or strings, yes.
> > Varadic functions that take parrot variables (i.e. PMCs) no.
>
>Right, so why make the former hard?  Is there an upside to the
>restriction?

Sure. We already have an aggregate PMC data type--the list. (The four base 
PMC types are scalar, array, hash, and list. Lists can hold multiple lists, 
arrays, hashes, and scalars, while arrays and hashes can only hold scalars) 
We don't have anything similar for integer, string, or float registers, so 
we'd have to build some thing or other to do it. Might as well just promote 
the things to PMCs and pass in a list of them.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 09:58 PM 9/3/2001 -0400, Ken Fox wrote:
>"Bryan C. Warnock" wrote:
> > Except that Perl 6 will have the ability to inject lexical variables in its
> > scope, and in any dynamic parent's scope.  (It isn't clear whether that is
> > write-only access or not - which it probably should be for lexicals.)
> >
> > That, invariably, forces at least some run-time lookup by name, since the
> > lexicals aren't there at compile time for the early resolution.
>
>Please, please, please say that this is not like Tcl's upvar/uplevel
>stuff.

Okay, I won't. It is, though.

There is use for this. Right now, try writing a sub that alters the lexical 
warning state of its caller. Go ahead, I'll wait. :) At the moment you have 
to resort to XS code, since calling an XS sub doesn't get you new scope. 
(And yes, this is useful--it was how the vmsish pragma was implemented 
until the bug that allowed it to work was fixed)

>I was imagining the "injection" to happen *only* at compile time
>when a "use" statement replaced the scope management object the
>compiler talks to.
>
>I *really* don't like the idea of a called sub modifying its' parent's
>scope at run time. This fundamentally breaks the feature of understanding
>code just by reading it. When I see a lexical, I want to trust the
>definition.

Oh, it gets better. Imagine injecting a lexically scoped sub into the 
caller's lexical scope. Overriding one that's already there. (Either 
because it was global, or because it was lexically defined at the same or 
higher level)

Needless to say, this makes the optimizer's job... interesting. On the 
other hand, it does allow for some really powerful things to be done by 
code at runtime.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Brent Dax


# -Original Message-
# From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
# Sent: Monday, September 03, 2001 5:50 PM
# To: Brent Dax; Ken Fox
# Cc: Simon Cozens; [EMAIL PROTECTED]
# Subject: RE: Should MY:: be a real symbol table?
#
#
# At 05:30 PM 9/3/2001 -0700, Brent Dax wrote:
# >As far as expensiveness, I think this can be just as fast as
# our current
# >offset-into-the-pad method.
#
# I was speaking in both speed and memory use when I was talking about
# expense. We'd need to maintain a hash structure for each pad,
# plus we'd
# need to either design the hash structure such that it didn't
# need absolute
# addresses (so we could build it at compile time, which could
# be a long time
# before runtime with a disk freeze or two and an FTP in the
# interim), or
# we'd need to patch the addresses up at runtime when we
# allocated a new pad.

I assume we're willing to have more fixup time for runtime performance,
correct?  Then consider this:

array-of-array pad:
curpad => a pointer to the current pad; same as in Perl 5
offs => an offset into the current pad, representing a variable

Accessing the address of the variable:
curpad[offs]

stash pad:
hvaddr => the address of an HE, representing a variable

Accessing the address of the variable:
hvaddr->value

Is either of these likely to be faster than the other?  (Although I'm
not an assembler hacker, I can't see the first being faster than the
second.)  If so, does the possible speed benefit outweigh any increased
startup overhead?

# I'm not convinced the memory usage, and corresponding time to
# clone and/or
# set up the hash-based pad, is worth the relatively infrequent by-name
# access to variables in the pad. I could be wrong, though.
# We'll have to try
# it and see. (Shouldn't affect the bytecode, however, so we can try
# different methods and benchmark them as need be)

By using something similar to temp() (where the SV* is temporarily
replaced), cloning should only be necessary for situations in which two
threads are running the same function at the same time.  Similarly,
setting up the hash shouldn't take any more abstract operations than
setting up padlist[0]; the actual hash internals may take more time to
do their job, however.

--Brent Dax
[EMAIL PROTECTED]

"...and if the answers are inadequate, the pumpqueen will be overthrown
in a bloody coup by programmers flinging dead Java programs over the
walls with a trebuchet."

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski


At 09:42 PM 9/3/2001 -0400, Ken Fox wrote:
>Dan Sugalski wrote:
> > First, of course, runtime and compiletime are mixed on perl. String eval
> > has to go walk back up the pads *at runtime* and resolve the variable 
> names.
>
>Sure, if you use eval, the symbol table for the current scope
>needs to be available. There's no reason to have more than
>one though -- all instances of an activation record share
>the same symbol table.

Well, we don't really keep around a single symbol table. We walk up the 
pads until we find things that resolve, or so it looked the last time I 
dove into the code.

>For the common case when we aren't debugging and a sub doesn't
>use string eval, it would be nice to use less memory and drop
>the scope's symbol table.

Sure, except for the whole "MY into the caller's package" thing. Or 
scopeless subs, though I don't know if they've been talked about yet.

>Is %MY (with attributes presumably) really going to be the API
>to the symbol table?

Yes.

>Why don't we just have *one* API to the
>symbol table and use attributes to differentiate between dynamic
>and lexical scoping?

Beats me--ask a language guy. The two (global and lexical access) don't 
feel the same, so I can't say I'd do them the same if I were designing it, 
though.

> > That, oddly enough, is doable with enough compiler work. A silly thing, but
> > doable. It's irrelevant to the question as it had evolved, namely "what's
> > the difference between a stash and a pad?" We could, after all, associate a
> > new stash with each level of lexical scope if we were so inclined. Or make
> > stashes and pads identical under the hood and just reference them 
> differently.
>
>These things behave totally different in a closure though.

Ah. Closures. Forgot about those.

Yeah, they're trickier.

>A "temp" variable must have its' global binding restored when
>returning to the caller. But that leaves closures referencing
>the wrong variable.
>
>Anyways, since you say "oddly enough" and "silly thing", I suspect
>that you're aren't doing it this way. ;)

Definitely not. Global variable sets with runtime global local-equivalent 
overrides are a lame hack, and we won't go there. Doesn't mean it can't be 
done, just that it shouldn't. ;-P

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> At 05:44 PM 9/3/2001 -0400, Ken Fox wrote:
> > Lexicals are fundamentally different from Perl's package (dynamically
> > scoped) variables.
> 
> No, actually, they're not.

How can you possibly think that lexical scoping and dynamic scoping
are not fundamentally different?!

> > Even if you could somehow hack package variables
> > to implement lexicals, it would waste space duplicating a symbol table
> > for each instance of a lexical scope.

> The big difference between lexical variables and package variables is that
> lexicals are looked up by stash offset and package variables are looked up
> by name.

Right. So how is a hash table smaller than an array?

> The real question, as I see it, is "Should we look lexicals up by name?"
> And the answer is Yes. Larry's decreed it, and it makes sense. (I'm
> half-tempted to hack up something to let it be done in perl 5--wouldn't
> take much work)

Where is the sense in this? Certainly the compiler will look things
up by name, but the run-time doesn't need to.

> The less real question, "Should pads be hashes or arrays", can be answered
> by "whichever is ultimately cheaper". My bet is we'll probably keep the
> array structure with embedded names, and do a linear search for those rare
> times you're actually looking by name.

That doesn't sound like we're looking up by name at all... It
sounds like the compiler is emiting frame pointer offsets, but
there's a pointer to the symbol table stored in the frame just
in case something wants to see the names.

That's a huge difference over emulating "my" with "temp" like
what was originally proposed!

- Ken

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Bryan C . Warnock


On Monday 03 September 2001 09:57 pm, Dan Sugalski wrote:
> Oh, it gets better. Imagine injecting a lexically scoped sub into the
> caller's lexical scope. Overriding one that's already there. (Either
> because it was global, or because it was lexically defined at the same or
> higher level)
>
> Needless to say, this makes the optimizer's job... interesting. On the
> other hand, it does allow for some really powerful things to be done by
> code at runtime.

This is more or less how you will be able to write your own lexically scoped 
pragmas.

And yes, I'm sure it will be abused.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: An overview of the Parrot interpreter

2001-09-03 Thread Sam Tregar

On Mon, 3 Sep 2001, Nathan Torkington wrote:

> > Ok, so one example of a PMC is a Perl string...
>
> If you grok vtables, think of a PMC as the thing a vtable hangs off.
>
> Another way to think of it is that a PMC is an object.  To the outside
> (the interpreter that is manipulating data values) its contents are
> opaque.  All you can do is call methods (vtable entries) on it.
>
> So if you have an object/PMC that implements a string, the "length"
> method/vtable-entry will return the length of the string.  An
> object/PMC that implements an array, the "length" method/vtable-entry
> will return the number of things in the array.

I think I understand this.  What I don't understand is how this relates to
the next section about Parrot's special relationship with strings.  If
Parrot has a "string" type and string handling functions, why use a PMC
to implement a string?  What does it mean to have a PMC that "implements a
string" and also have a "string type" in Parrot?

-sam

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Bryan C . Warnock

On Monday 03 September 2001 08:09 pm, Ken Fox wrote:
> Dan Sugalski wrote:
> > At 05:44 PM 9/3/2001 -0400, Ken Fox wrote:
> > > Lexicals are fundamentally different from Perl's package (dynamically
> > > scoped) variables.
> >
> > No, actually, they're not.
>
> How can you possibly think that lexical scoping and dynamic scoping
> are not fundamentally different?!

Scoping is different.  Not the variables themselves.  (Or the storage of 
those variables.)

> > The real question, as I see it, is "Should we look lexicals up by name?"
> > And the answer is Yes. Larry's decreed it, and it makes sense. (I'm
> > half-tempted to hack up something to let it be done in perl 5--wouldn't
> > take much work)
>
> Where is the sense in this? Certainly the compiler will look things
> up by name, but the run-time doesn't need to.

Except that Perl 6 will have the ability to inject lexical variables in its 
scope, and in any dynamic parent's scope.  (It isn't clear whether that is 
write-only access or not - which it probably should be for lexicals.)

That, invariably, forces at least some run-time lookup by name, since the 
lexicals aren't there at compile time for the early resolution.

>
> > The less real question, "Should pads be hashes or arrays", can be
> > answered by "whichever is ultimately cheaper". My bet is we'll probably
> > keep the array structure with embedded names, and do a linear search for
> > those rare times you're actually looking by name.
>
> That doesn't sound like we're looking up by name at all... It
> sounds like the compiler is emiting frame pointer offsets, but
> there's a pointer to the symbol table stored in the frame just
> in case something wants to see the names.

As above, a variable could very well be looked up in lexical storage by name 
first, and then in the global stashes.

>
> That's a huge difference over emulating "my" with "temp" like
> what was originally proposed!

I thought the *original* proposal was simply to make MY:: an actual symbol 
table, vice its current array-based pad, to reduce the number of different 
table-lookups (whether compile- or run-time) that we'd have to code and 
maintain.  There's no reason you couldn't do that.  Yes, it'd be a waste of 
space.  Yes, it'd be slower, unless there were a lot of lexicals being 
introduced.  But there's no other reason why it couldn't work.

Brent just went wayward when trying to further explain.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Prototypes

2001-09-03 Thread Dan Sugalski


At 10:17 PM 9/3/2001 -0400, Bryan C. Warnock wrote:
>On Monday 03 September 2001 09:30 pm, Dan Sugalski wrote:
> > A clever idea, and one I'd not though of. That's probably the best way to
> > do it. Has some other issues, like do we allow prototypes like:
> >
> >sub foo ($$) {};
> >
> > to be called as:
> >
> >foo(@bar)
> >
> > if @bar has two elements in it?
>
>To me, that seems only a language decision.  This could certainly handle
>that.

Ah, but calling in the first way has two PMCs in as parameters, while the 
second has only one. Potentially at least. A world of difference there.


Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Prototypes

2001-09-03 Thread Bryan C . Warnock

On Monday 03 September 2001 10:27 pm, Dan Sugalski wrote:
> >To me, that seems only a language decision.  This could certainly handle
> >that.
>
> Ah, but calling in the first way has two PMCs in as parameters, while the
> second has only one. Potentially at least. A world of difference there.

A single PMC?  (A list of pointers to PMCs?)

Or, to think of it another way, how are you going to pass two scalars, or an 
array of two scalars, to a sub with *no* prototype?

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Prototypes

2001-09-03 Thread Dan Sugalski


At 10:32 PM 9/3/2001 -0400, Bryan C. Warnock wrote:
>On Monday 03 September 2001 10:27 pm, Dan Sugalski wrote:
> > >To me, that seems only a language decision.  This could certainly handle
> > >that.
> >
> > Ah, but calling in the first way has two PMCs in as parameters, while the
> > second has only one. Potentially at least. A world of difference there.
>
>A single PMC?  (A list of pointers to PMCs?)
>
>Or, to think of it another way, how are you going to pass two scalars, or an
>array of two scalars, to a sub with *no* prototype?

We create a list, stuff our bits into the list, and pass the list. @_ will 
point to the list.

Lists can access their contents as if they were a list of scalars even if 
they're not. (So we can avoid flattening in cases we don't need to)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Prototypes

2001-09-03 Thread Bryan C . Warnock

On Monday 03 September 2001 10:46 pm, Dan Sugalski wrote:
> At 10:32 PM 9/3/2001 -0400, Bryan C. Warnock wrote:
> >On Monday 03 September 2001 10:27 pm, Dan Sugalski wrote:
> > > >To me, that seems only a language decision.  This could certainly
> > > > handle that.
> > >
> > > Ah, but calling in the first way has two PMCs in as parameters, while
> > > the second has only one. Potentially at least. A world of difference
> > > there.
> >
> >A single PMC?  (A list of pointers to PMCs?)
> >
> >Or, to think of it another way, how are you going to pass two scalars, or
> > an array of two scalars, to a sub with *no* prototype?
>
> We create a list, stuff our bits into the list, and pass the list. @_ will
> point to the list.
>
> Lists can access their contents as if they were a list of scalars even if
> they're not. (So we can avoid flattening in cases we don't need to)
>

Well, then that's how.  Remember, this prototype idea, besides allowing 
automatic variable setting and such, was really just a musing on how to get 
prototype checking at runtime, where compile-time checking isn't possible.

{
my $a = sub ($$) { code };
gork($a);
}

sub gork {
my ($a) = shift;
$a->(@some_list);  # <- Here
}

The reason prototypes aren't checked at "Here" is because there really isn't 
a way to know what the prototype was.  So you're just passing the list like 
you normally would.  Having the prototype checking code as part of the code
doesn't change anything above at all.

In other words, you can't think about how a prototype would affect the 
calling convention, because, by definition, you don't know of any prototype 
unless is a direct named sub call that you've checked at compile time.

Of course, I suppose that by default the runtime checking would be done even 
for sub calls that have been compiled-time checked...  Which defeats the 
second purpose of compile-time checking (saving time during the run), but I 
think we can find away around that...

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Prototypes

2001-09-03 Thread Dan Sugalski

At 10:11 PM 9/2/2001 -0400, Bryan C. Warnock wrote:
>On Sunday 02 September 2001 07:49 pm, Dan Sugalski wrote:
> > On Sun, 2 Sep 2001, Bryan C. Warnock wrote:
> > > Are prototypes going to be checked at runtime now?
> >
> > For methods, at least. Dunno about subs, that's Larry's call. I could make
> > a good language case for and against it. It adds overhead on sub calls,
> > which is a bad thing generally. (I'd be OK with the declaration "All
> > prototyped subs must have their prototypes known before the BEGIN phase is
> > done"... :)
>
>Well, here's a simple, yet stupid, idea.  (And I'd be content with having
>prototype declarations being a compile-time only requirement, too.)
>
>Code injection, a la Perl's -p and -n switch.
>
>sub foo (protoype) {
> code
>}
>
>is compiled as
>
>sub foo { {
> my $proto = prototype;
> # prototype checking code here.  Must be non-destructive on @_
> }
> code
>}

A clever idea, and one I'd not though of. That's probably the best way to 
do it. Has some other issues, like do we allow prototypes like:

   sub foo ($$) {};

to be called as:

   foo(@bar)

if @bar has two elements in it?

We'll still have the case where we use prototypes to define calling 
conventions, in which case we won't allow subs to change their calling 
signature. (No sense checking registers 0-4 if the caller thought it was 
0-3, and if we pass enough info to know how many were used we might as well 
go and I'm OK with that, though. :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Multiple-dispatch on functions

2001-09-03 Thread Dan Sugalski


At 07:46 AM 9/2/2001 +0100, Piers Cawley wrote:
>Dan Sugalski <[EMAIL PROTECTED]> writes:
> > Nope, the cost will be paid on all sub calls. We at least need to
> > check on every sub call to see if there are multiple versions of the
> > functions. (We can't tell at compile time if it's a single or
> > multi-method sub call, since it can change at runtime) Granted, it's
> > not a huge expense for non-multi-method calls, but it does still
> > impose an overhead everywhere.
>
>Can't you do it with a scary polymorphic function object? (Handwaving
>starts which could be *completely* off base.) Then you just have to
>rely on the 'call_this_function' vtable method to DTRT, which, unless
>it's a 'multi' function, will simply do what function calls have
>always done. You only have to do the more complex stuff if the
>function object is a 'multi' function, in which case it'll have a
>different handler in the vtable.

Why yes, yes in fact we could. In fact, I think we will. (And thank you 
very much :)

Yet another change to the assembly PDD coming up...

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: My, our, thems-over-theres.....

2001-09-03 Thread Bryan C . Warnock


A while ago, I posted some inconsistencies with scope declarators.
(http:[EMAIL PROTECTED]/msg08028.html)

A few more related notions


$a = "test";
{
chop(my $a = our $a = "hello");
  # The lexical is chopped
print "1-$a\n";
  # But the global is printed
}
print "2-$a\n";

# The assignments are evaluated right-to-left, resulting in the lexical 
# being chomped.  But the rightmost declaration is the one propogated 
# through the rest of the scope.

$b = "test";
{
chop($b = my $b = "hello");
  # The global is chopped
print "1-$b\n";
  # But the lexical is printed
}
print "2-$b\n";

# This one makes sense.  The lexical is created and initialized with 
# "hello".  That value is then passed to whatever the current $b in
# use is - the global (by default).  The lexical doesn't come fully into
# view until the next statement.

$c = "test";
{
chop(our $c = my $c = "hello");
  # Bizarre warning... but the global is chopped.
print "1-$c\n";
  # The lexical is printed
}
print "2-$c\n";

# Here, Perl recognizes that there were two variable declarations, 
# (since 'our' currently affects all subsequent variables on the same line)
# The global is still chopped, because it was the last one assigned, but
# the lexical takes affect, since it was the last one (in left-to-right
# order) declared.


Obviously, this is most heinous code to write.  And except for problems 
centered around when the declarations take affect, it warns appropriately.

'our' is a true accessor, meaning you can query the value, or you can set it.
'my' and 'local' aren't.  They set the value to 'undef' or the value you 
pass in.  (Perhaps it'd be easier to think of it as 'our $a = 
$__PACKAGE__::a')

In either case, that means an assignment is ultimately involved.  
Assignments are handled right-to-left, so I think the scope declarators 
should be too.  (Once all the other problems are fixed.)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Prototypes

2001-09-03 Thread Bryan C . Warnock

On Monday 03 September 2001 09:30 pm, Dan Sugalski wrote:
> A clever idea, and one I'd not though of. That's probably the best way to
> do it. Has some other issues, like do we allow prototypes like:
>
>sub foo ($$) {};
>
> to be called as:
>
>foo(@bar)
>
> if @bar has two elements in it?

To me, that seems only a language decision.  This could certainly handle 
that.  The internals problem is figuring out how to signify all the 
different variations you could allow.  :-) (And how they could affect what 
is being passed in.)

For instance, I knew the autoreferencing of an array with a (\@) prototype 
was a bust (as in Perl 5).  But, IIRC, Perl 6 is passing everything by 
reference, with the user explicitly flattening any lists.  You can always go 
from references to lists, just not the other way around, so this still works.

There are a couple of other prototype tricks that we'd have to work out...
Unseparated bare code blocks for (&) prototypes come to mind.  

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 10:50 PM 9/3/2001 -0400, Ken Fox wrote:
>Dan Sugalski wrote:
> > Oh, it gets better. Imagine injecting a lexically scoped sub into the
> > caller's lexical scope. Overriding one that's already there. (Either
> > because it was global, or because it was lexically defined at the same or
> > higher level)
> >
> > Needless to say, this makes the optimizer's job... interesting. On the
> > other hand, it does allow for some really powerful things to be done by
> > code at runtime.
>
>Frankly this scares the hell out of me. I like my safe little
>world of predictable lexical variables. That's the main reason I
>use them. (So I'm boring. I use strict too.)

Good. It should. It's a scary feature, and hopefully that same fear will 
strike anyone else who uses it, so they think twice (or maybe three times) 
before they do it. However, it *is* a powerful feature, and one that loses 
a good deal of its power if it's restricted to compile time only.

Besides, I'm not the guy to talk to about restricting this. Take it up with 
the language guys. :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Nathan Torkington


Dan Sugalski writes:
> Needless to say, this makes the optimizer's job... interesting. On the 
> other hand, it does allow for some really powerful things to be done by 
> code at runtime.

The big thing I want it for is so I can write nats_settings.pm:

  # nats_settings.pm - turn on strict and warnings in caller
  magically_affect_my_caller_with {
use strict;
use warnings;
  };

then at the top of my programs I can write:

  use nats_settings;

Nat

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> Good. It should. It's a scary feature, and hopefully that same fear will
> strike anyone else who uses it

But all it takes is one fool^Wbrave person to mess up somebody
else's perfectly working code. Basically I think this throws us
back to the bad old days when we only had "local".

Confining it to compile time would allow people to define
custom pragmas and import lexicals. Nat would be happy. Who would
be unhappy?

> Besides, I'm not the guy to talk to about restricting this. Take it up with
> the language guys. :)

I haven't seen an Apocalypse mention anything about it, so I'm
hoping this is just a misunderstanding. ;)

- Ken

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 11:35 PM 9/3/2001 -0400, Ken Fox wrote:
>Dan Sugalski wrote:
> > Good. It should. It's a scary feature, and hopefully that same fear will
> > strike anyone else who uses it
>
>But all it takes is one fool^Wbrave person to mess up somebody
>else's perfectly working code. Basically I think this throws us
>back to the bad old days when we only had "local".

It's not like you can't now do:

   *LWP::get = return_random_function;

and screw things up in many interesting ways.

>Confining it to compile time would allow people to define
>custom pragmas and import lexicals. Nat would be happy. Who would
>be unhappy?

It's not a matter of who'd be unhappy, it'd be a matter of what couldn't we do?

Not that, speaking strictly as a compiler guy, I'd mind restricting it to 
compile-time only. But I can see nifty things to be done at runtime, so I'm 
OK with it.

> > Besides, I'm not the guy to talk to about restricting this. Take it up with
> > the language guys. :)
>
>I haven't seen an Apocalypse mention anything about it, so I'm
>hoping this is just a misunderstanding. ;)

Don't think so. Damian's been talking about it as he goes around. I think 
you'll find that this goes in, unless there are performance reasons against 
it. (I can think of a few)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: An overview of the Parrot interpreter

2001-09-03 Thread Uri Guttman


> "BD" == Brent Dax <[EMAIL PROTECTED]> writes:

  BD> # s/rather than/as well as/;  # we've got a stack of register
  BD> # frames, right?

  BD> IIRC, that's mostly for when we run out of registers or we're changing
  BD> scopes or whatever.  For the most part, it's a register architecture.

as dan has said there will be multiple stacks but the architecture is
not stack based. :)

that means the opcodes doesn't operate directly with data on the/a
stack. the ops (other than stack related ones) only operate directly on
registers. but there will be a call stack, a register stack (for
register usage overflow), etc. and all the stacks will be segmented so
they can grow without limit, unlike typical thread stacks which have
preallocated limits.

an official list of the stacks is not done yet. dan mentioned a list of
them a couple of times but it is not in concrete (as nothing else is
either). i assume that the segmented stack management code would be
shared among all the stacks.

a nice side benefit of the segemnted stack design is that you don't have
to reclaim any old stacke frames. all you do is rest the stack pointer
to the previous frame (probably linked from the current frame) and let
the GC reclaim the memory later. since stacks are non-contiguous this
works out well.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Search or Offer Perl Jobs  --  http://jobs.perl.org

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 07:05 PM 9/3/2001 -0700, Brent Dax wrote:
># From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
># At 05:30 PM 9/3/2001 -0700, Brent Dax wrote:
># >As far as expensiveness, I think this can be just as fast as
># our current
># >offset-into-the-pad method.
>#
># I was speaking in both speed and memory use when I was talking about
># expense. We'd need to maintain a hash structure for each pad,
># plus we'd
># need to either design the hash structure such that it didn't
># need absolute
># addresses (so we could build it at compile time, which could
># be a long time
># before runtime with a disk freeze or two and an FTP in the
># interim), or
># we'd need to patch the addresses up at runtime when we
># allocated a new pad.
>
>I assume we're willing to have more fixup time for runtime performance,
>correct?

Yes. But fixup is a runtime cost, so we need to weigh what the fixup costs 
versus the return we get from it.

>   Then consider this:
>
>array-of-array pad:
> curpad => a pointer to the current pad; same as in Perl 5
> offs => an offset into the current pad, representing a variable
>
> Accessing the address of the variable:
> curpad[offs]
>
>stash pad:
> hvaddr => the address of an HE, representing a variable
>
> Accessing the address of the variable:
> hvaddr->value
>
>Is either of these likely to be faster than the other?  (Although I'm
>not an assembler hacker, I can't see the first being faster than the
>second.)  If so, does the possible speed benefit outweigh any increased
>startup overhead?

Yes, and it's not just startup overhead.

At runtime, whenever we enter a scope we need to allocate a new pad. Every 
time. (Well, OK, we can cache the last one we allocated in case we're not 
entering recursively, but that's an optimization hack and ignorable in this 
case) All the variables we might access need to be created and allocated 
and suchlike stuff. Additionally the pad structure itself needs to be created.

Now, there's nothing we can do to skip variable creation. That's a fixed 
cost. Pad creation, on the other hand...

If a pad is just an array of:

struct pad_entry {
 perl_string *name;
 IV type;
 PMC *contents;
 }

with maybe some other stuff thrown in, we can prebuild the whole pad, stick 
it in the fixup section, and do a quick allocate/memcpy when we enter the 
block. (The name is a pointer to a constant string, so copying it is safe) 
On the other hand, if the pad is a hash, there's no way around having 
pointers in it. Which means that allocating a new pad means both copying 
and fixing up the template pad. Plus the pad is bigger, since we need to 
keep the hash structure in there.

So going with the array-of-hashes means we have more data to allocate, more 
data to copy, and more data to fix up, all of which make scope entry 
slower. The only time we'd actually *access* the data by name is if we're 
looking things up via MY in some way. (read or write) That, I expect, is 
going to be a rather rare situation, in which case I can't see the extra 
cost on each scope entry being worth it. Could be wrong, of course, but I'm 
not seeing the win here.

># I'm not convinced the memory usage, and corresponding time to
># clone and/or
># set up the hash-based pad, is worth the relatively infrequent by-name
># access to variables in the pad. I could be wrong, though.
># We'll have to try
># it and see. (Shouldn't affect the bytecode, however, so we can try
># different methods and benchmark them as need be)
>
>By using something similar to temp() (where the SV* is temporarily
>replaced), cloning should only be necessary for situations in which two
>threads are running the same function at the same time.

Nope, I'm talking about recursion. When you do:

   sub foo {
 foo();
   }

we need to clone foo's pad from the template, because we need a new one. 
Otherwise that whole lexical variable/recursion thing doesn't work, which 
is A Bad Thing. :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Brent Dax




--Brent Dax
[EMAIL PROTECTED]

"...and if the answers are inadequate, the pumpqueen will be overthrown
in a bloody coup by programmers flinging dead Java programs over the
walls with a trebuchet."

# -Original Message-
# From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
# Sent: Monday, September 03, 2001 7:25 PM
# To: Brent Dax; Ken Fox
# Cc: Simon Cozens; [EMAIL PROTECTED]
# Subject: RE: Should MY:: be a real symbol table?
#
#
# At 07:05 PM 9/3/2001 -0700, Brent Dax wrote:
# ># From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
# ># At 05:30 PM 9/3/2001 -0700, Brent Dax wrote:
# ># >As far as expensiveness, I think this can be just as fast as
# ># our current
# ># >offset-into-the-pad method.
# >#
# ># I was speaking in both speed and memory use when I was
# talking about
# ># expense. We'd need to maintain a hash structure for each pad,
# ># plus we'd
# ># need to either design the hash structure such that it didn't
# ># need absolute
# ># addresses (so we could build it at compile time, which could
# ># be a long time
# ># before runtime with a disk freeze or two and an FTP in the
# ># interim), or
# ># we'd need to patch the addresses up at runtime when we
# ># allocated a new pad.
# >
# >I assume we're willing to have more fixup time for runtime
# performance,
# >correct?
#
# Yes. But fixup is a runtime cost, so we need to weigh what
# the fixup costs
# versus the return we get from it.

But it's a one-time runtime cost, unlike, say, a string eval in a loop.

(sub-entry overhead complaints cut--they'll be addressed at the end of
the e-mail.)

# ># I'm not convinced the memory usage, and corresponding time to
# ># clone and/or
# ># set up the hash-based pad, is worth the relatively
# infrequent by-name
# ># access to variables in the pad. I could be wrong, though.
# ># We'll have to try
# ># it and see. (Shouldn't affect the bytecode, however, so we can try
# ># different methods and benchmark them as need be)
# >
# >By using something similar to temp() (where the SV* is temporarily
# >replaced), cloning should only be necessary for situations
# in which two
# >threads are running the same function at the same time.
#
# Nope, I'm talking about recursion. When you do:
#
#sub foo {
#  foo();
#}
#
# we need to clone foo's pad from the template, because we need
# a new one.
# Otherwise that whole lexical variable/recursion thing doesn't
# work, which
# is A Bad Thing. :)

Now is where the temp() stuff I was talking about earlier comes in.

sub foo {
my($bar);
foo();
}

is basically equivalent to

sub foo {
temp($MY::bar);
foo();
}

(I mentioned to Ken Fox in private that this isn't too different than
temp()ing globals when each sub is in its own package.)

If we did this, I don't think the cost would be greater to recurse than
it would be for array-of-arrays.  (Especially since we'd make sure to
optimize the hell out of temp.)  This would also lead to less code to
write and a smaller binary.  Plus a simple way to do static: don't
temp()orize the variable on entry.

--Brent Dax
[EMAIL PROTECTED]

"...and if the answers are inadequate, the pumpqueen will be overthrown
in a bloody coup by programmers flinging dead Java programs over the
walls with a trebuchet."

Re: An overview of the Parrot interpreter

2001-09-03 Thread Uri Guttman

> "ST" == Sam Tregar <[EMAIL PROTECTED]> writes:

  >> * Integer, String, and Number registers 0-x are used to pass parameters
  >> when the compiler calls routines.

  ST> s/compiler/interpreter/, right?

nope. he means when the compiler generates op codes. the compiler will
generate code to put the op code params in registers (this is done with
other op codes). then the interpreter will get an op code and fetch the
registers and actually call the op code function via the vtable.

it is tricky to keep track of what the compiler does and generates and
then what the interpreter sees and does.

  >> * Subs may have variable number, or unknown number, of PMC parameters.
  >> (Basically Parrot variables) They may *not* take a variable or unknown
  >> number of integer, string, or number parameters.

  ST> I don't understand this restriction.  Won't it make implementing
  ST> variadic functions more difficult?

nope. that would be handled by multiple dispatch. it will compare the
subprototypes and call the best/right sub. someone else said that we
could do it just as easily for regular subs as for methods.

for variable list of args, then either all the args are in one list
which is passed (by ref) in register 0. or if the sub has a prototype
with the last arg being an array (which will slurp all the rest of the
args) then that param will be an array ref and get the rest as a single
list. this is so all calls will have a known and fixed number of args at
the opcode level.

uri

-- 
Uri Guttman  -  [EMAIL PROTECTED]  --  http://www.sysarch.com
SYStems ARCHitecture and Stem Development -- http://www.stemsystems.com
Search or Offer Perl Jobs  --  http://jobs.perl.org

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski


At 09:57 PM 9/3/2001 -0400, Bryan C. Warnock wrote:
>On Monday 03 September 2001 09:57 pm, Dan Sugalski wrote:
> > Oh, it gets better. Imagine injecting a lexically scoped sub into the
> > caller's lexical scope. Overriding one that's already there. (Either
> > because it was global, or because it was lexically defined at the same or
> > higher level)
> >
> > Needless to say, this makes the optimizer's job... interesting. On the
> > other hand, it does allow for some really powerful things to be done by
> > code at runtime.
>
>This is more or less how you will be able to write your own lexically scoped
>pragmas.
>
>And yes, I'm sure it will be abused.

There are days I think wanton abuse is the sign of a useful and powerful 
feature. (Or the sign that Damian's at the keyboard again... :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Deoptimizations

2001-09-03 Thread Dan Sugalski

At 01:57 PM 9/2/2001 -0400, Ken Fox wrote:
>"Bryan C. Warnock" wrote:
> > I think the only way your going to be able to detect dynamic redefinitions
> > is dynamically.  :-)
>
>Not really. Does the program include string eval? That throws out
>optimizations like folding and dead code elimination that use *any*
>global. If the program re-defines a single sub using an assignment to
>a symbol table slot, then only that sub limits optimization. The
>optimizer starts with a set of assumptions and alters those assumptions
>as it sees more of the program.

The more I think about the potential for runtime redefinition, the more I 
think having an "if optmized" op is of limited use. Certainly useful for 
the "use inlined/use non-inlined" case (to some extent) but there's a lot 
of redefinition that can really screw with some of the things that perl can 
potentially do.

We do, after all, have the possibility of having *all* our source (or all 
our bytecode, which is enough for the optimizer) at hand, something most 
other languages don't have. (Try building perl 5 as one big source file. 
You get around a 5% speedup last I knew, because the compiler could see all 
the source) That means we can sort of maybe more aggressive in our 
optimizations, since there's less entropy in the system. (Well, from the 
invisible source front, at least)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
> Oh, it gets better. Imagine injecting a lexically scoped sub into the
> caller's lexical scope. Overriding one that's already there. (Either
> because it was global, or because it was lexically defined at the same or
> higher level)
> 
> Needless to say, this makes the optimizer's job... interesting. On the
> other hand, it does allow for some really powerful things to be done by
> code at runtime.

Frankly this scares the hell out of me. I like my safe little
world of predictable lexical variables. That's the main reason I
use them. (So I'm boring. I use strict too.)

IMHO it's a bad idea to allow run-time re-definition of the
dynamic caller's scope. I see value in allowing *compile* time
changes to the current lexical scope. For example:

  sub f {
 use Database::Connection qw($db);
 ...
  }

I see the value in creating a lexical variable named "$db"
in the current scope. (By "current" I mean the scope the BEGIN
block appears in, not the BEGIN block itself.) Since this is
done at compile time, the "use" kicks off a warning telling
me that magical stuff is going to happen. (Not a literal
warning -- just a warning when I read the code.) I even like
this as an alternative to using macros.

However, contrast that with this example:

  $next = 0; # this just shuts up the compiler. I don't
 # know why. Maybe they'll fix it in Perl 7.
  sub iter {
 begin;
 while ($next->());
 end;
  }

  sub begin {
 upvar("$next") = sub { ...; upvar("$next") = sub { ... } };
  }

  sub end {
 ...
  }

Replace upvar with whatever syntax you want. You still have
a complete mess. Where did "$next" come from? Why did the code
stop working because I put the "begin" statement in a block?

This isn't power. This is TMTOWTDI gone mad. (Ok, ok. Yeah,
that's a little harsh. But everybody has their limits, right? ;)

- Ken

Re: LangSpec: Statements and Blocks [first,last]

2001-09-03 Thread Damian Conway


iVAN wrote:

   > As we read in Damian Conway- Perl6-notes, there will be

"...may be..."

(Remember, I'm only the shambling henchman ;-)


   > a var-iterator that can be used to see how many times the cycle has
   > been "traversed" i.e.
   > 
   > foreach my $el (@ary) {
   >.. do something 
   >  print $#;  <--- print the index (or print $i )
   > }

Current thinking is that this functionality may be better provided by bestowing
it on a lexical via a property/trait:

foreach my $el (@ary) {
my $i: index;
...
print $i;
}

This seems preferable since it avoids reintroducing a punctuation variable and
allows nested loop counters to work consistently:

foreach my $el (@ary) {
my $i: index;
for my $el2 (@ary2) {
my $j : index;
@table[$i][$j] = $el * $el2;
}
}


   > shall we have :
   > 
   > foreach my $el (@ary) {
   >  print $# if $#.first(); 
   >.. do something 
   >  print $# if $#.last();  
   > i.e. $#ary
   > };

I very much doubt it.

Damian

Re: Prototypes

2001-09-03 Thread Dan Sugalski

At 11:47 PM 9/3/2001 -0400, Ken Fox wrote:
>"Bryan C. Warnock" wrote:
> > {
> > my $a = sub ($$) { code };
> > gork($a);
> > }
> >
> > sub gork {
> > my ($a) = shift;
> > $a->(@some_list);  # <- Here
> > }
> >
> > The reason prototypes aren't checked at "Here" is because there really
> > isn't a way to know what the prototype was.
>
>Um, that's not true. ML can do stuff like that -- all automatically and
>without any type declarations.

We can't, at least not at compile time. Something like:

   sub foo::bar($$);
   sub baz::bar(\@);

   $ref = rand > .5 ? \&foo::bar : \&&baz::bar;
   $ref->(@array);

makes it difficult. I think we're going to need a "call as list" form of 
the sub preamble for the "I can't check at compile time" cases.

>What happens is the type of gork's $a is determined, which cascades
>to the type of gork's $_[0], which cascade's to your first block's $a.
>ML even has polymorphic functions where the output type depends on the
>input type.
>
>It is possible. It's just a question of whether we want to do it.

Oh, sure, but it's mostly runtime checking. Personally I'd prefer to avoid 
it where possible since we can pick up speed wins by avoiding it. I'm now 
convinced we can't always avoid it, so we'll just have two function entry 
points for prototyped functions and do runtime type 
checking/dispatch/whatever at runtime where we have to.

I'm beginning to miss the days when a function call was just a function 
call... :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Prototypes

2001-09-03 Thread Damian Conway


   > But since the current prototyping system... has a highly positive
   > pressure gradient compared to the surrounding air,

Well...I think it's more a problem of "I do no' thin' dat word means
wha' you thin' it means".

People want prototypes to be parameter type specifiers,
when they're actually argument context specifiers.

   
   > hopefully we won't be saddled with it in Perl 6 

Here. Here. I'm sure for Perl 6 we can do much worse^H^H^H^H^Hbetter.

;-)

Damian

Re: Prototypes

2001-09-03 Thread Damian Conway


   > > Are prototypes going to be checked at runtime now?
   > 
   > For methods, at least. Dunno about subs, that's Larry's call. I
   > could make a good language case for and against it. It adds
   > overhead on sub calls, which is a bad thing generally.

I would strongly like to see a guarantee that any
subroutine/method/multimethod with a parameter list has that parameter
list checked before it is called -- at compile-time if possible; at
run-time, if necessary.

And, in the absence of strong typing (which I suspect will remain the
exception, rather than the norm), run-time parameter list checking seems
inevitable.

Of course, if you can't afford the overhead, you will still be able to
turn the whole process off simply by not giving a subroutine/method a
parameter list.

   
   > (I'd be OK with the declaration "All prototyped subs must have
   > their prototypes known before the BEGIN phase is done"... :)

Yeah. Right. That's gonna happen.

;-)

Damian


PS: can we please *not* refer to the Perl 6 parameter lists as "prototypes".
The use of that term causes enough problems in Perl 5.
See: http://dev.perl.org/rfc/128.html#Banishment_of_the_term_prototyp

Re: What's up with %MY?

2001-09-03 Thread Damian Conway



   > I haven't seen details in an Apocalypse, but Damian's
   > Perl 6 overview has a bit about it. The Apocalypse
   > specifically mentions *compile-time* scope management,
   > but Damian is, uh, Damian. (DWIMery obviously. ;)

Hmm.

It would seem *very* odd to allow every symbol table *except*
%MY:: to be accessed at run-time.


   > Is stuff like:
   > 
   >   %MY::{'$lexical_var'} = \$other_var;
   > 
   > supposed to be a compile-time or run-time feature?

Run-time.


   > Modifying the caller's environment:
   > 
   >   $lexscope = caller().{MY};
   >   $lexscope{'&die'} = &die_hard;
   > 
   > is especially annoying because it means that I can't
   > trust lexical variables anymore.

You can't trust them now.

Between source filters and Inline I can do pretty much whatever I like
to your lexicals without your knowledge. ;-)

   
   > The one good thing about Damian's caller() example is that it
   > appears in an import() function. That implies compile-time, but
   > isn't as clear as Larry's Apocalypse.

I would envisage that mucking about with symbol tables would be no more
common in Perl 6 than it is in Perl 5. But I certainly wouldn't want to
restrict the ability to do so.

How am I expected to produce fresh wonders if you won't let me warp the
(new) laws of the Perl universe to my needs?

;-)

Damian

Re: Prototypes

2001-09-03 Thread Ken Fox

"Bryan C. Warnock" wrote:
> {
> my $a = sub ($$) { code };
> gork($a);
> }
> 
> sub gork {
> my ($a) = shift;
> $a->(@some_list);  # <- Here
> }
> 
> The reason prototypes aren't checked at "Here" is because there really
> isn't a way to know what the prototype was.

Um, that's not true. ML can do stuff like that -- all automatically and
without any type declarations.

What happens is the type of gork's $a is determined, which cascades
to the type of gork's $_[0], which cascade's to your first block's $a.
ML even has polymorphic functions where the output type depends on the
input type.

It is possible. It's just a question of whether we want to do it.

- Ken

Re: What's up with %MY?

2001-09-03 Thread Dan Sugalski


At 04:11 PM 9/4/2001 +1100, Damian Conway wrote:
>I would envisage that mucking about with symbol tables would be no more
>common in Perl 6 than it is in Perl 5. But I certainly wouldn't want to
>restrict the ability to do so.
>
>How am I expected to produce fresh wonders if you won't let me warp the
>(new) laws of the Perl universe to my needs?

That's easy--you slip the pumpking or internals designer a 10-spot. Amazing 
what it'll do... :)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Prototypes

2001-09-03 Thread Bryan C . Warnock


A few more ideas to put down, lest I lay wake all night, thoughts churning...

One.

Presumably, there will be an op for the actual calling of the subroutine.
That op can take an (extra) argument, with one of three values, that the 
prototype checking can get to.

The first value indicates that these arguments were checked at compile time, 
and there is no variable wrangling that needs to be done.  The runtime 
checking would be skipped.  So foo($a,$b) calls for sub foo ($$) {} would be 
skipped, but not calls by reference, or calls if the prototype was sub foo 
(my $a, my $b) {}.

The second value indicates that these arguments have not been checked, check 
them now.  Inject those variables, etc., etc.  This is for calls by 
reference, calls where magic assignments need to be made, etc.

The third value is a "peek" value.  Do the runtime checking, but don't do 
any magic variable stuff.  As a matter of fact, don't run any user-code at 
all.  Simply return a true or false value if the arguments *would* match.
(This allows us to check incoming coderefs, to see that they take the 
arguments that *they* expect.  Similar to the whole "pointer to a function 
that takse a pointer to a function, and an int."  Of course, no checking the 
return value.  But they're supposed to handle your want()s.)

Two.

Secondly, there's another issue with the prototyping code - magic.  
Specifically, tied variables, (which some will say aren't magical.  They're 
pretty magical to me.)

Anyway, to do some checking, one could conceivably (depends on how specific 
prototyping could go - you could say something as detailed as (my $a < 4;) 
to handle bounds checking on your input as well) call FETCH on a 
variable for which the FETCH is what's the word I'm thinking of? - the 
FETCH just to check the code could change what the actual value to be used 
later on would have been.  Like a self-incrementing counter.

Anyway, the tie interfaces need to add a PEEK, which would return what the 
next FETCH will return, when it is called.   That will allow us to safely 
test tied variables without screwing things up.

(Generally speaking, that should probably be done now for exactly the same 
reason.)

Three.

We need to decide what happens when someone attempts to replace a prototyped 
sub with another sub - whether that sub is not-prototyped, prototyped 
exactly the same, or prototyped differently.  Multiple dispatch on functions 
could alter our approach to the third.  Direct calls have already been 
attested to at compile time.  The call has just changed...

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

What's up with %MY?

2001-09-03 Thread Ken Fox


I haven't seen details in an Apocalypse, but Damian's
Perl 6 overview has a bit about it. The Apocalypse
specifically mentions *compile-time* scope management,
but Damian is, uh, Damian. (DWIMery obviously. ;)

Is stuff like:

  %MY::{'$lexical_var'} = \$other_var;

supposed to be a compile-time or run-time feature?

Modifying the caller's environment:

  $lexscope = caller().{MY};
  $lexscope{'&die'} = &die_hard;

is especially annoying because it means that I can't
trust lexical variables anymore. The one good thing
about Damian's caller() example is that it appears
in an import() function. That implies compile-time,
but isn't as clear as Larry's Apocalypse.

This feature has significant impact on all parts of
the implementation, so it would be nice if a little
more was known. A basic question: how much performance
is this feature worth?

- Ken

Re: LangSpec: Statements and Blocks

2001-09-03 Thread Damian Conway


Some feedback.

   > Syntax Overview
   > 
   > Keywords
   > continue, do, else, elsif, for, foreach, given, goto, grep, if, last,
   > map, next, redo, sort, sub, unless, until, when, while 

C and C

(C is not nearly so certain.)


   > Conditional Statement Modifiers
   > 
   >  6. [ LABEL: ] expr if expr;
   >  7. [ LABEL: ] expr unless expr;

I'm not at all sure modifiers will be stackable, as this grammar implies.


   > Iterative Block Constructs
   > 
   > 20. [ LABEL: ] for[each] [ scalar ] ( list ) { block } # Note 4

I am hoping that Larry will also give us:

 [ LABEL: ] for[each] (scalar, scalar ...) ( list ) { block }


   > Subroutine Code Blocks # Note 6
   > 
   > 21. sub identifier [ ( prototype ) ] [ :properties ] { block }
   > 22. sub [ ( prototype ) ] { block }# Note 7

Currently:

 21. sub identifier [ ( prototype ) ] [ is properties ] { block }
 22. sub [ ( prototype ) ] [ is properties] { block } [is properties]

Though I would *much* prefer to see:

 21. sub identifier [ ( prototype ) ] [ :traits ] { block }
 22. sub [ ( prototype ) ] [ :traits] { block } [is properties]

   
   > A list consists of zero or more expressions. List members may
   > either be an explicit expression, separated via a comma (','), or
   > may be interpolated from two expressions via either of the two
   > range operators ( ('..') and ('...') ). A list of zero elements
   > must be delimited by parenthesis.

May also have a redundant comma after the last element.


   > A statement consists of zero or more expressions, followed by an optional
   > modifier and its expression, and either a statement terminator (';') or a
   > block closure ('}' or EOF).

Need to recast this in terms of statement separators and null statements.

   
   > A block consists of zero or more blocks and statements. A file is
   > considered a block, delimited by the file boundaries.   Semantically, I
   > will define a block only in terms of its affect on scoping.


"its effect on scoping"
(we probably don't care about its pyschological demeanor ;-)

   

Damian

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Sam Tregar

On Mon, 3 Sep 2001, Brent Dax wrote:

> Now is where the temp() stuff I was talking about earlier comes in.
>
>   sub foo {
>   my($bar);
>   foo();
>   }
>
> is basically equivalent to
>
>   sub foo {
>   temp($MY::bar);
>   foo();
>   }

Oh, you're pitching softballs to yourself.  Try a hard one:

  my @numbers;
  for (0 .. 10) {
my $num = $_;
push(@numbers, sub { $num });
  }
  for (0 .. 10) {
local $num = $_;
push(@numbers, sub { $num });
  }
  print join(', ', map { $_->() } @numbers), "\n";

It's in Perl5, but the analogy to Perl6 should be clear enough.  This is a
good example of the different natures of lexical and dynamic variables.

-sam

RE: Should MY:: be a real symbol table?

2001-09-03 Thread Dan Sugalski

At 09:12 PM 9/3/2001 -0700, Brent Dax wrote:
>From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
># At 07:05 PM 9/3/2001 -0700, Brent Dax wrote:
># ># From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
># ># At 05:30 PM 9/3/2001 -0700, Brent Dax wrote:
># ># >As far as expensiveness, I think this can be just as fast as
># ># our current
># ># >offset-into-the-pad method.
># >#
># ># I was speaking in both speed and memory use when I was
># talking about
># ># expense. We'd need to maintain a hash structure for each pad,
># ># plus we'd
># ># need to either design the hash structure such that it didn't
># ># need absolute
># ># addresses (so we could build it at compile time, which could
># ># be a long time
># ># before runtime with a disk freeze or two and an FTP in the
># ># interim), or
># ># we'd need to patch the addresses up at runtime when we
># ># allocated a new pad.
># >
># >I assume we're willing to have more fixup time for runtime
># performance,
># >correct?
>#
># Yes. But fixup is a runtime cost, so we need to weigh what
># the fixup costs
># versus the return we get from it.
>
>But it's a one-time runtime cost, unlike, say, a string eval in a loop.

People who do string eval in a loop deserve what they get. Probably even 
more than that. (If you invoke the compiler that often at runtime, well, 
tough--your performance is probably going to suck... :)

># ># I'm not convinced the memory usage, and corresponding time to
># ># clone and/or
># ># set up the hash-based pad, is worth the relatively
># infrequent by-name
># ># access to variables in the pad. I could be wrong, though.
># ># We'll have to try
># ># it and see. (Shouldn't affect the bytecode, however, so we can try
># ># different methods and benchmark them as need be)
># >
># >By using something similar to temp() (where the SV* is temporarily
># >replaced), cloning should only be necessary for situations
># in which two
># >threads are running the same function at the same time.
>#
># Nope, I'm talking about recursion. When you do:
>#
>#sub foo {
>#  foo();
>#}
>#
># we need to clone foo's pad from the template, because we need
># a new one.
># Otherwise that whole lexical variable/recursion thing doesn't
># work, which
># is A Bad Thing. :)
>
>Now is where the temp() stuff I was talking about earlier comes in.

No. Doesn't work. Closures are screwed--you *need* a separate pad for each 
scope entry, because you need to keep a handle on those pads as far back as 
we have to for things to function properly.

This also makes scope entry and exit costlier, since you need to make a 
savestack entry and restore, respectively, for each lexical. I don't think 
it'd be a win, even if closures weren't getting in your way.

It also means more per-thread data, since then the pointer to "the scope's 
pad" (and each scope *still* needs a pad--you can't have a single global 
structure here) would need to be tied to a single fixed location, which 
means one more pointer for the sub in threadspace. (Though that's less of a 
big deal, I expect)

>If we did this, I don't think the cost would be greater to recurse than
>it would be for array-of-arrays.  (Especially since we'd make sure to
>optimize the hell out of temp.)  This would also lead to less code to
>write and a smaller binary.  Plus a simple way to do static: don't
>temp()orize the variable on entry.

Nope, just won't work. Not to mention that peeking back outside your 
current scope via MY tricks wouldn't work right when recursing. We'd either 
need to walk back out the savestack (ick) or you'd end up peering at 
yourself when you wanted the previous recursion.

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: What's up with %MY?

2001-09-03 Thread Damian Conway


Dan revealed:

   > > How am I expected to produce fresh wonders if you won't let me
   > > warp the (new) laws of the Perl universe to my needs?

   > That's easy--you slip the pumpking or internals designer a 10-spot.
   > Amazing what it'll do... :)

And how do you think I got five of my modules into the 5.8 core???

;-)

Damian

Re: What's up with %MY?

2001-09-03 Thread Ken Fox

Damian Conway wrote:
> It would seem *very* odd to allow every symbol table *except*
> %MY:: to be accessed at run-time.

Well, yeah, that's true. How about we make it really
simple and don't allow any modifications at run-time to
any symbol table?

Somehow I get the feeling that "*very* odd" can't be
fixed by making the system more normal. ;)

>> Is stuff like:
>>
>>   %MY::{'$lexical_var'} = \$other_var;
>>
>> supposed to be a compile-time or run-time feature?
> 
> Run-time.

A definition of run-time would help too since we have
things like BEGIN blocks. I consider it run-time if the
compiler has already built a symbol table and finished
compiling code for a given scope. Is that an acceptable
definition of run-time? This allows BEGIN blocks to
modify their caller's symbol tables even if we prohibit
changes at run-time.

Can we have an example of why you want run-time
symbol table manipulation? Aliases are interesting,
but symbol table aliases don't seem very friendly.
It would be simple to write:

  %MY::{'@point'} = [ $x, $y ];

But that probably won't work and using [ \$x, \$y ]
doesn't make sense either. What seems necessary is:

  %MY::{'$x'} = \$point[0];
  %MY::{'$y'} = \$point[1];

If the alias gets more complicated, I'm not sure the
symbol table approach works well at all.

>> Modifying the caller's environment:
>> 
>>   $lexscope = caller().{MY};
>>   $lexscope{'&die'} = &die_hard;

This only modifies the caller's scope? It doesn't modify
all instances of the caller's scope, right? For example,
if I have an counter generator, and one of the generated
closures somehow has its' symbol table modified, only that
*one* closure is affected even though all the closures
were cloned from the same symbol table.

What about if the symbol doesn't exist in the caller's scope
and the caller is not in the process of being compiled? Can
the new symbol be ignored since there obviously isn't any
code in the caller's scope referring to a lexical with that
name?

> Between source filters and Inline I can do pretty much whatever I like
> to your lexicals without your knowledge. ;-)

Those seem more obvious. There will be a "use" declaration
I wrote and I already know that "use" can have side-effects on
my current name space. IMHO this could become a significant problem
as we continue to make Perl more expressive. Macros, filters,
self-modifying code, mini-languages ... they all make expressing
a solution easier, and auditing code harder. Do we favor
expression too much over verification? I'm not qualified to
answer because I know I'm biased towards expression. (The %MY
issues I'm raising mostly because of performance potential.)

> I would envisage that mucking about with symbol tables would be no more
> common in Perl 6 than it is in Perl 5. But I certainly wouldn't want to
> restrict the ability to do so.

We also want Perl 6 to be fast and cleanly implemented.

This particular issue is causing trouble because it has a big
impact on local variable analysis -- which then causes problems
with optimization. I'd hate to see lots of pragmas for turning
features on/off because it seems like we'll end up with a more
fragmented language that way.

> How am I expected to produce fresh wonders if you won't let me warp the
> (new) laws of the Perl universe to my needs?

You constantly amaze me and everyone else. That's never
been a problem.

One of the things that I haven't been seeing is the exchange
of ideas between the implementation side and the language side.
I've been away for a while, so maybe it's just me.

It vaguely worries me though that we'll be so far down the
language side when implementation troubles arise that it will
be hard to change the language. Are we going to end up with
hacks in the language because certain Very Cool And Important
Features turned out too hard to implement?

- Ken

71 matches

Mail list logo