date:20040901

Re: Pipeline Performance

2004-09-01 Thread Sean O'Rourke

At Tue, 31 Aug 2004 13:23:04 -0400,
[EMAIL PROTECTED] (Aaron Sherman) wrote:
> I would think you actually want to be able to define grep, map, et al.
> in terms of the mechanism for unraveling, and just let the optimizer
> collapse the entire pipeline down to a single map.

Even for map and grep this is a bit trickier, since map can produce
zero or more values for each input value, and calls its body in list
context, whereas grep produces zero or one value, and gets called in
scalar context.  So you'd need something like a full call and return
prototype for each mapping function, e.g.:

FunctionReturn context  Argument context
--  
-> $a { $a + 2 }($y)($x)
grep(&block)($y is optional)($x)
map(&block) ([EMAIL PROTECTED])  ($arg)

Then your loop merging macro could deconstruct these into the
appropriate kind of loop (using foreach and pushing single items only
to make intention clear):

@a ==> map &b ==> @c
==>
foreach $a (@a) { foreach $b (map_item(&b, $a)) { push @c, $b } }

@a ==> (&b = -> $a { $a + 2 }) ==> @c
==>
foreach $a (@a) { push @c, b($a) }

@a ==> grep \&b ==> @c
==>
foreach $a (@a) { foreach $b (grep_item(&b, $a)) { push @c, $b } }

where "map_item" and "grep_item" are the single-element mapper
functions defining map and grep.  I think that both the context and
the number of items consumed/produced could be gathered from
prototypes, so the only restrictions for mapping functions would be
(1) having a prototype available at definition time, and (2) being
side-effect-free.

/s

Re: Compile op with return values

2004-09-01 Thread Leopold Toetsch

Steve Fink <[EMAIL PROTECTED]> wrote:

> ... Leo's @ANON implementation of
> your scheme works great for me (I have no problem wrapping that around
> my code.) All this does raise the question of garbage collection for
> packfile objects; is there any?

Not yet. We basically have two kinds of dynamically compiled code:

1) loaded modules - persistent code used until end of program
2) evaled "statements" - volatile code, maybe used once only

But the current implementation doesn't know about that difference. The
compiled code is always appended to the list of code segments. There is
no interface yet to manipulate packfile segments.

We finally need a packfile PMC that is the owner of packfile segments.
If that PMC goes out of scope the compiled code structures can be freed.
This packfile PMC would also vastly eliminate the difference between 1)
and 2), the more when there is some interface to be able to append the
newly compiled code to existing code segments, so that you can e.g. dump
the combined code to disc.

But it would still be useful to differentiate between 1) and 2). For 1)
we could do global constant folding (if a constant already exists in the
main contant table just use it, or, if not, append to the main constant
table).

For 2) a distinct constant table is needed.

leo

Cross Compiling parrot?

2004-09-01 Thread Robert Schwebel

Hi, 

Did anybody try to crosscompile parrot? It doesn't seem to work. I tried
it with parrot_2004-08-26_23 by setting

--cc=arm-softfloat-linux-gnu-gcc
--ld=arm-softfloat-linux-gnu-gcc

on configure, but that fails with: 

--8<--
[EMAIL PROTECTED]:~/tmp/parrot> perl Configure.pl --cc=arm-softfloat-linux-gnu-gcc 
--ld=arm-softfloat-linux-gnu-gcc
Parrot Version 0.1.0 Configure 2.0
Copyright (C) 2001-2003 The Perl Foundation.  All Rights Reserved.

Hello, I'm Configure.  My job is to poke and prod your system to figure out
how to build Parrot.  The process is completely automated, unless you passed in
the `--ask' flag on the command line, in which case it'll prompt you for a few
pieces of info.

Since you're running this script, you obviously have Perl 5--I'll be pulling
some defaults from its configuration.

Checking MANIFEST.done.
Setting up Configure's data structuresdone.
Tweaking settings for miniparrot..done.
Loading platform and local hints filesdone.
Enabling optimization.done.
Determining nongenerated header files.done.
Determining what C compiler and linker to use.done.
Determining what types Parrot should use..done.
Determining what opcode files should be compiled in...done.
Setting up experimental systems...done.
Determining what pmc files should be compiled in..done.
Determining your minimum pointer alignmentC compiler 
failed (see test.cco) at lib/Parrot/Configure/Step.pm line 332
Parrot::Configure::Step::cc_build() called at config/auto/alignptrs.pl line 37
Configure::Step::runstep('undef', 'undef') called at 
lib/Parrot/Configure/RunSteps.pm line 110
Parrot::Configure::RunSteps::runsteps('Parrot::Configure::RunSteps', 'cc', 
'arm-softfloat-linux-gnu-gcc', 'ld', 'arm-softfloat-linux-gnu-gcc', 'debugging', 1) 
called at Configure.pl line 376
[EMAIL PROTECTED]:~/tmp/parrot>
--8<--

Robert
-- 
 Dipl.-Ing. Robert Schwebel | http://www.pengutronix.de
 Pengutronix - Linux Solutions for Science and Industry
   Handelsregister:  Amtsgericht Hildesheim, HRA 2686
 Hornemannstraße 12,  31137 Hildesheim, Germany
Phone: +49-5121-28619-0 |  Fax: +49-5121-28619-4

Re: NCI test 2 failing - but I know why

2004-09-01 Thread Joshua Gatcomb

--- Bernhard Schmalhofer
<[EMAIL PROTECTED]> wrote:
> Printing a initialised ParrotLibrary currently gives
> you the '_filename' 
> property. This is highly platform dependent, and
> therefore hard to test.
> 
> I could rewrite the test and check only, that the
> stringified 
> ParrotLibrary contains the substring 'nci'. My guess
> is, that this 
> should work on all platforms so far.
> 
> CU, Bernhard

Well, I seem to be the only one noticing it and the
good thing is that it is the test itself, and not what
is being tested that is b0rk.  I would think
portability is a good thing but don't go changing
things on my account yet.  When I get the time, I will
investigate.

Cheers
Joshua Gatcomb
a.k.a. Limbic~Region

__
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail

Re: Cross Compiling parrot?

2004-09-01 Thread Dan Sugalski

At 7:32 PM +0200 9/1/04, Robert Schwebel wrote:
Hi,
Did anybody try to crosscompile parrot? It doesn't seem to work.
That doesn't surprise me. We still pull information out of the local 
perl install (which'll be wrong, of course, in a cross-compilation 
environment) and I'm pretty sure we don't pass in the right flags in 
the right places to cross-compile properly.

Part of the problem with this is that we just don't have people with 
a need or experience doing cross-compilation, though I'd be thrilled 
if we can find someone. (Any and all patches to make things 
cross-compile friendly, or even less cross-compile unfriendly, will 
be greatly appreciated)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Pipeline Performance

2004-09-01 Thread Aaron Sherman

On Tue, 2004-08-31 at 14:11, Sean O'Rourke wrote:
> At Tue, 31 Aug 2004 13:23:04 -0400,
> [EMAIL PROTECTED] (Aaron Sherman) wrote:
> > I would think you actually want to be able to define grep, map, et al.
> > in terms of the mechanism for unraveling, and just let the optimizer
> > collapse the entire pipeline down to a single map.
> 
> Even for map and grep this is a bit trickier, since map can produce
> zero or more values for each input value, and calls its body in list
> context, whereas grep produces zero or one value, and gets called in
> scalar context.

You're confusing two stages of grep's operation. grep's body is called
in boolean context, but that has NOTHING to do with the return value,
which is a list of zero or one elements like so:

sub grep (&code, [EMAIL PROTECTED]) {
map { if code($_) { $_ } else { () } } @list;
}

That said, I skipped over a MAJOR point in my original reply, and it
must be said that in the case of map, you need iterators in order to do
the job correctly. map actually needs to return an object which, while
it can be treated as a list, is actually just an iterator object which
remembers the input list and transform closure.

Once you do that, you get cheap pipelining, as the major cost isn't the
construction of the temporary lists in Perl 5, it's POPULATING the
temporary lists. Constructing an iterator in Perl 6 means no population.

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: Last bits of the basic math semantics

2004-09-01 Thread Gopal V

Hi,

> fixed sizes of integer, so I'd aim some ops at low-level types of
> known size and leave it at that. 

Quite a while back, I did add a few opcodes for fixed size integer operations 
for Parrot .. But they were added for a totally different HLL :)

> matter what you do with the high bits.  I suppose another way to
> look at it is that they'll just want ops that'll JIT well, which
> usually means to make the ops work on the natural datatype sizes
> of the machine.  But that fights against the fact that most crypto
> algorithms do their commutations based on a known number of bits.

Maybe the dotgnu.ops needs to be renamed as fixedsize.ops and add
a few more fixed size int operations ?. int , uint, long and ulong should
suffice for most crypto folks , but those ops are not JIT'd AFAIK.

Gopal

A question about attribute functions

2004-09-01 Thread Aaron Sherman

How do you declare attribute functions? Specifically, I was thinking
about map and what kind of object it would return, and I stumbled on a
confusing point:

class mapper does iterator {
has &.transform;
...
}

Ok, that's fine, but what kind of accessor does it get?

my mapper $x .= new(transform => ->{(1,2,3)});
$x.transform()

would imply that you're calling method transform, not invoking the
accessor function which does not have a method signature. Would you have
to do this:

class mapper does iterator {
has Code $transform;
...
}
...
$x.transform.();

?

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: perl6 garbage collector?

2004-09-01 Thread Aaron Sherman

On Mon, 2004-08-30 at 14:40, Ozgun Erdogan wrote:
> > > Currently, we're using perl-5.6.1 and are having problems with memory
> > > leaks - thanks to reference counting.
> > 
> > You'll have to break reference loops explicitely.
> 
> If only I had known where those circular references are. I have a
> circular ref. detector tool, but it still doesn't get them. The thing
> is, you could do an SvREFCNT_inc, and boom you have a memory leak.

Ok, you're no longer talking about Perl (the language) but rather about
Perl 5's internals. Different beast.

This is not the right list for debugging that kind of thing, so I won't
go into it, but suffice to say that if you have trouble managing your
references through XS, incorporating Parrot's GC into Perl 5 would be
near impossible. That's not intended as a slight, believe me, I put
myself in the same category (reference counting in Perl 5 is very
difficult to grok from the docs, as the docs make some assumptions about
how much you know about how Perl constructs scopes).

All that aside, Ponie is your friend. As Ponie matures, it will provide
what you need, and your XS could be transitioned over into Parrot
bytecode.

For now, if I were you I would upgrade to 5.8.x and try to make sure
that every value that you move between your XS and Perl is properly
mortal (see the perlapi, perlguts and perlxs man pages).

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: NCI test 2 failing - but I know why

2004-09-01 Thread Clayton O'Neill

On Tue, 31 Aug 2004 05:56:27 -0700 (PDT), Joshua Gatcomb
<[EMAIL PROTECTED]> wrote:
> Obviously the test is passing, but the expected result
> is different:
> loaded runtime/parrot/dynext/libnci.so
> vs
> loaded libnci.so

I'm getting the same thing on Solaris 8 using GCC 3.4.1 with solaris binutils:

 t/pmc/nci..NOK 2# Failed test (t/pmc/nci.t at line 59) 
 #  got: 'loaded libnci.so 
 # 8.00 
 # ' 
 # expected: 'loaded runtime/parrot/dynext/libnci.so 
 # 8.00 
 # ' 
 t/pmc/nci..ok 35/35# Looks like you failed 1 tests of 35. 
 t/pmc/nci..dubious 
 Test returned status 1 (wstat 256, 0x100) 
 DIED. FAILED test 2 
 Failed 1/35 tests, 97.14% okay

Re: Library loading

2004-09-01 Thread Aaron Sherman

On Sat, 2004-08-28 at 16:17, Dan Sugalski wrote:
> Time to finish this one and ensconce the API into the embedding interface.

That reminds me, I was reading P6&PE yesterday, and I came across a
scary bit on loading of shared libraries. The statement was made that
Parrot would search the current directory first.

Perhaps this was an over-simplification, but if not, PLEASE,
re-consider. Security implications aside (and they're huge), Parrot
should probably be searching its installation area (possibly overridden
by an environment variable) followed by whatever system path (e.g.
LD_LIBRARY_PATH, ldconfig or whatever your OS uses) is given to Parrot
externally, so as not to modify the behavior of a program based on the
current directory of the user running it.

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: Library loading

2004-09-01 Thread Dan Sugalski

At 11:00 AM -0400 9/1/04, Aaron Sherman wrote:
On Sat, 2004-08-28 at 16:17, Dan Sugalski wrote:
 Time to finish this one and ensconce the API into the embedding interface.
That reminds me, I was reading P6&PE yesterday, and I came across a
scary bit on loading of shared libraries. The statement was made that
Parrot would search the current directory first.
It does? Urk. No, not by default. We need to work out some library 
loading stuff, but this is *definitely* not going to be the default.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Synopsis 2 draft 1

2004-09-01 Thread Michele Dondi

On Sat, 14 Aug 2004, Smylers wrote:

> > could reparse the result.  XXX .repr is what Python calls it, I think.
> > Is there a better name?
> 
> Yes; I've no suggestions as to what it might be, but surely there's
> _got_ to be a better name than C<.repr>.

.repr is fine for me. An alternative that springs to mind could be .dump

> > XXX We could yet replace <$foo> with $foo.more or $foo.iter or
> > $foo.shift or some such (but not $foo.next or $foo.readline),
> 
> That sounds good to me -- C<< while (<$file>) >> is one of the
> least-intuitive bits of syntax to get across to people learning Perl;

Gawd, no! It's so deeply perlish... could hardly do without it! Of course 
the above mentioned alternatives would be ok too, but only *as 
alternatives*...


Michele
-- 
Have you noticed that people whose parents did not have children, also 
tend not to have children.
- Robert J. Kolker in sci.math, "Re: Genetics and Math-Ability"

Proposal for a new PMC layout and more

2004-09-01 Thread Leopold Toetsch

Below is a pod document describing some IMHO worthwhile changes. I hope 
I didn't miss some issues that could inhibit the implementation.

Comments welcome,
leo
=head1 1. Proposal for a new PMC layout and more

=head2 1.1. Current state - PMC size and structure

PMCs are using too much memory (5 words for an Integer PMC, 11 + n
words plus two indirection for an object with n attributes). The
reduction of IIRC 9 words to the current 5 words almost doubled
execution speed for not too small amounts of allocated PMCs.

OTOH PMCS are rather rigid structures. There is only a fixed amount of
data fields available. If the data the PMC should hold won't fit,
custom malloced extensions (or Buffers) have to be added to the PMC,
which take up more memory and (at least) one more indirection to get
at these data.

=head2 1.2. Current state - STRING buffer headers

STRING buffer headers are kept in separate registers, have their own
opcodes and PMCs have vtable variants dealing with STRINGs. This is
using up a lot of memory and resources. There are currently 480
opcodes dealing with STRINGs and 46 vtable/MMD methods that have STRING*
arguments. But a lot of PMC opcodes and vtables dealing with strings
are still missing.

But a STRING itself is already a rather fat structure (and it'll
still need charset and encoding or such). Thus there isn't really
an advantage to have a distinct STRING type, the more that all HLLs
except Perl6 don't have a notion for such a type - they'll just have
objects aka PMCs.

Further STRINGs will need a vtable to deal with unicode or rather with
different access levels (e.g. length in bytes, codepoints, or chars).

And finally: we currently have STRING* hash keys only, a scheme that
doesn't work for hashing arbitrary objects. In Python a hash key is
just an object that provides a I vtable method.

=head2 1.3. Current state - STRING memory

In a multi-threaded parrot string memory could move beyond the
interpreter at any time due to the copying collection of variable
sized memory. E.g.:

  Thread 1  Thread 2

  cursor = s->strstart
collect string memory
s->strstart moved
  end = s->strstart + s->bufused
  while (cursor < end)  // boom

Such code snippets are used all over the place in F.
This would either need a lock for reading two or better a non-copying
collection of string memory.

The same problems currently arise with hashes and list-based arrays,
which both are using buffer headers and movable memory.

=head1 2. Proposed changes

=over 4

=item PMCs are variable sized.

We just allocate as much as needed to accomodate the object. This
reduces memory usage vastly for small types, simplifies complexer PMCs
and eliminates the additional indirections to access the object's
data.

=item STRINGs become PMCs

This will reduce opcode count by almost one third and simplify the
whole interpreter.

=item Buffers become PMCs

The final unification of Buffers and PMCs.

=back

=head2 2.1 The new PMC layout

A simple PMC consists of a vtable pointer and a data portion.

=head2 2.2 Example: Integer, Float, Object with 2 attributes, Ref

   ++++
  pmc->|  vtable|   pmc->|  vtable|
   ++++
   |  INTVAL||  FLOATVAL  |
   ++||
 ||
 ++

   ++++
  pmc->|  vtable|   pmc->|  vtable|
   ++++
   |  attrib_count  ||  pmc_pointer   |
   ++++
   |  attribute #1  |
   ++
   |  attribute #2  |
   ++

A String PMC is a similar structure with all the needed data items
just like the current STRING structure. So eliminating the STRING type
doesn't impose any overhead at all, except the vtable access - but
that will be needed anyway.

=head2 2.3. Where are the flags?

Flags currently take one word per PMC. This is a lot of overhead. But
we basically only need two types of flags:

=over 4

=item What kind is that PMC

This information is just the vtable or additional information (flags)
in the vtable. All PMCs of one kind share one vtable anyway, so its
much cheaper to use the vtable for that information then to provide
one additional word per PMC. Some flags are also eliminated by getting
rid of the distinction between PMCs and Buffers.

=item Flags used during garbage collection

GC flags are garbage collector-specific. A stop-the-world allocator
can e.g. use the 2 low bits of the vtable pointer for the I and
I flags. With I a nibble in a separate
memory region is used. An implicit reclamation GC scheme has additonal
pointers to manage the g

Re: A question about attribute functions

2004-09-01 Thread Larry Wall

On Wed, Sep 01, 2004 at 10:41:37AM -0400, Aaron Sherman wrote:
: How do you declare attribute functions? Specifically, I was thinking
: about map and what kind of object it would return, and I stumbled on a
: confusing point:
: 
:   class mapper does iterator {
:   has &.transform;
:   ...
:   }
: 
: Ok, that's fine, but what kind of accessor does it get?
: 
:   my mapper $x .= new(transform => ->{(1,2,3)});
:   $x.transform()
: 
: would imply that you're calling method transform, not invoking the
: accessor function which does not have a method signature. Would you have
: to do this:
: 
:   class mapper does iterator {
:   has Code $transform;
:   ...
:   }
:   ...
:   $x.transform.();
: 
: ?

That might not work either.  This will, though:

($x.transform)();

Larry

Minor makefile fix

2004-09-01 Thread Jonathan Worthington

Hi,

Here's a small fix to the root.in makefile; this fix is needed to get Parrot
building again on Win32 and probably in some other places too.

Jonathan



makefile.diff
Description: Binary data

Re: Proposal for a new PMC layout and more

2004-09-01 Thread Dan Sugalski

At 5:17 PM +0200 9/1/04, Leopold Toetsch wrote:
Below is a pod document describing some IMHO worthwhile changes. I 
hope I didn't miss some issues that could inhibit the implementation.
Interesting. But... no. Things are the way they are on purpose -- a 
lot of thought, a not-incosiderable amount of pain, and a lot of 
harsh experience went into precursor designs, the current design, and 
the current implementation.

We're going to leave it as-is.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Proposal for a new PMC layout and more

2004-09-01 Thread Aaron Sherman

On Wed, 2004-09-01 at 11:17, Leopold Toetsch wrote:

> Comments welcome,

Honestly, much of this goes beyond my meager understanding of Parrot
internals, but I've read it, and most of it seems reasonable. Just on
point where you may not have considered a logical alternative:

> =head2 2.6. Morphing Undefs
> 
> Currently all binary (and other) opcodes need an existing destination
> PMC. The normal sequence a compiler emits is something like this:
> 
>   $P0 = new Undef
>   $P0 = a + b

Since you've lopped a lot of space off of PMCs, Undefs could be made
large enough to fit a basic buffer PMC (3 words). In that case, they
could always be upgraded in-place to integer PMCs, float PMCs, very
simple objects, references and buffers. Everything else would need to go
through a copy-upgrade.

The trade-off is that all PMCs would be 3 words unless special code was
emitted that avoided this for smaller (integer, float, reference) PMCs.

I'm not saying that this is a BETTER plan, just an idea to think about
and a different set of trade-offs.

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: Proposal for a new PMC layout and more

2004-09-01 Thread Nicholas Clark

On Wed, Sep 01, 2004 at 05:17:55PM +0200, Leopold Toetsch wrote:

> PMCs are using too much memory (5 words for an Integer PMC, 11 + n
> words plus two indirection for an object with n attributes). The
> reduction of IIRC 9 words to the current 5 words almost doubled
> execution speed for not too small amounts of allocated PMCs.

I would be much happier if we got to a functionally complete implementation
of parrot with stable, useful APIs first.

And then put effort into optimising the implementation behind the scenes.
Based on more complete knowledge of how things performed on code generated
from real language compilers.

It may well turn out that your proposals make sense then, as well as now.
But I feel what's holding things up is not lack of speed, but lack of
completeness.

Nicholas Clark

Re: A question about attribute functions

2004-09-01 Thread Larry Wall

On Wed, Sep 01, 2004 at 08:02:33AM -0700, Larry Wall wrote:
: That might not work either.  This will, though:
: 
: ($x.transform)();

So will

$x.transform()();

for that matter...

Larry

Re: A question about attribute functions

2004-09-01 Thread Juerd

Larry Wall skribis 2004-09-01  8:02 (-0700):
> : $x.transform.();
> That might not work either.  This will, though:
> ($x.transform)();

This is surprising. Can you please explain why .() won't work? I have
methods return subs quite often, and like that I can just attach ->() to
it to make them work for me. 

I dislike parens.  If $object.method.() will really not work, is there a
way to call it without adding parens? Adding parens for someone who
doesn't plan an entire line of code before typing it, means going back
(for me, this is the most important reason for using statement
modifiers; it's not just linguistically pleasing).


Juerd

Re: Proposal for a new PMC layout and more

2004-09-01 Thread Steve Fink

On Sep-01, Leopold Toetsch wrote:
> Below is a pod document describing some IMHO worthwhile changes. I hope 
> I didn't miss some issues that could inhibit the implementation.

Overall, I like it, although I'm sure I haven't thought of all of the
repercussions.

The one part that concerns me is the loss of the flags -- flags just
seem generally useful for a number of things. In the limit, each flag
could be replaced by an equivalent vtable entry that just returned true
or false, but that will only work for rarely-used flags due to the extra
levels of indirection. I suppose we could also have a large class of
PMCs that contained a flag word, and only the primitive PMCs would lack
it, but then the flags cannot be used without knowing the type of PMC.

It all comes down to the specific current and future uses of flags.
You've dealt with the GC flags; what about the rest?

The proposal would also expand the size of the vtable by a bit due to
the string vtable stuff. I don't know how much that is, percentage-wise.
And I suppose that increase is dwarfed by the decrease due to
eliminating the S variants. (Although that's another part of the
proposal that makes me nervous -- will MMD really take care of all of
the places where we care that we're going to a string, specifically,
rather than any other random PMC type? Strings are a pretty widespread
concept throughout the code base, and this is the only highly
user-visible part of the change.)

I also view the proposal as being comprised of several fairly
independent pieces. Something like:

 * Merging PMCs and Buffers
 * Merging STRINGs and PMCs
 * Removing GC-related flags and moving them to GC implementations
 * Removing the rest of the flags
 * Using Null instead of Undef
 * Moving "extra" stuff to before the PMC pointer
 * Using Refs to expand PMCs
 * Using DOD to remove the Ref indirection
 * Shrinking the base PMC size

..and whatever else I forgot. Not all of these are dependent on each
other, and could be implemented separately. And some are only dependent
in the sense that you'll make space or time performance worse until you
make the rest of the related changes. You could call those
design-dependent, rather than implementation-dependent.

Re: Proposal for a new PMC layout and more

2004-09-01 Thread Dan Sugalski

At 5:17 PM +0200 9/1/04, Leopold Toetsch wrote:
Below is a pod document describing some IMHO worthwhile changes. I 
hope I didn't miss some issues that could inhibit the implementation.
Okay, the "No" warrants more explanation.
First off, the current structure of PMCs, Buffers, and Strings is 
definitely a mess, what with the multiple nested structs, semi-shared 
data, and weird smallobject overlap. A lot of stuff that is, in 
retrospect, crap has been layered on, so if this gets beaten up and 
cleaned out I won't mind in the least.

The PMC scheme -- where PMCs are an immovable header with a vtable 
slot, cache slot, and flag slot -- stays. It's this way on purpose, 
and matches normal usage patterns (nicely efficiently) for perl 5 as 
well as (oddly) most python and ruby  usage. (Where there's a 
preponderance of low-level types)

Buffers and strings are special-purpose constructs, or at least they 
*should* be. They're segregated off for GC purposes. While they could 
be unified with PMCs, I don't want them to be. They've specific, 
special purposes, and as such they're staying the way they are.

Strings, FWIW, are *not* a perl 6 specific thing. The current string 
design is sufficient, and *will* be used, for perl 5, python, and 
ruby, as well as any other language that wants to live on parrot and 
handle string data. While there's stuff to be added still, there's no 
reason that I can see to mess with them.

Finally, Nicholas is right -- this is messing around with stuff that 
already works. We're better off working on things that don't exist 
yet, and leave this to later.

If you want, we can hash out the changes to sub calling (with the 
swapping interpreter structs we've been arguing over), moving the 
return continuation/calling object/called sub into the interp 
structure, and fixing up the JIT and exception handling stuff to deal 
with it. That, at least, will be visible to bytecode programs and 
worth getting done.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

[perl #31419] PATCH: Fix for solaris platform asctime_r argument number mismatch

2004-09-01 Thread via RT

# New Ticket Created by  [EMAIL PROTECTED] 
# Please include the string:  [perl #31419]
# in the subject line of all future correspondence about this issue. 
# http://rt.perl.org:80/rt3/Ticket/Display.html?id=31419 >


On Solaris 8, asctime_r takes 3 parameters instead of 2.  The
prototype looks like this:

char *asctime_r(const struct tm *tm, char *buf, int buflen);

This patch adds a time.c file to the solaris platform directory.  The
only change in this time.c file and the stock generic/time.c is that
the call to asctime_r passes 26 as the buflen.

The Solaris man page for asctime_r says that the result will always be
26 characters exactly.  I'm guessing the POSIX asctime assumes that
the buffer is at least 26  characters, so assuming the buffer that
Parrot_asctime_r gets is at least 26 characters isn't a new risk.

Without this patch, the latest version of parrot from CVS will not
compile on Solaris 8.


parrot.solaris-asctime_r.patch
Description: Binary data

Semantics for regexes

2004-09-01 Thread Dan Sugalski

I promised Patrick this a while back but never got it, so here it is.
This is a list of the semantics that I see as needed for a regex 
engine. When we have 'em, we'll map them to string ops, and may well 
add in some special-case code for faster access.

*) extract substring
*) exact string compare
*) find string in string
*) find first character of class X in string
*) find first character not of class X in string
*) find boundary between X and not-X
*) Find boundary defined by arbitrary code (mainly for word breaks)
*) create new class X
*) add or subtract character to class X
*) create union|intersection|difference of two classes
I think this about does it, and we do some of this already. Are there 
semantics people see as missing, or need more explanation? If so, 
pipe up, we'll nail them down, then get the op mapping (with 
implementation) and go from there.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: A question about attribute functions

2004-09-01 Thread Larry Wall

On Wed, Sep 01, 2004 at 07:08:57PM +0200, Juerd wrote:
: Larry Wall skribis 2004-09-01  8:02 (-0700):
: > :   $x.transform.();
: > That might not work either.  This will, though:
: > ($x.transform)();
: 
: This is surprising. Can you please explain why .() won't work? I have
: methods return subs quite often, and like that I can just attach ->() to
: it to make them work for me. 

Because in Perl 5, we haven't defined ->() to be the long form of
a "subscripty" ().  But in Perl 6, we have

$a[$x]  $a .[$x]# same thing
$a{$x}  $a .{$x}# same thing
$a($x)  $a .($x)# same thing

That says to me that we also have

$a.b($x)$a.b .($x)  # same thing

In the particular case of an attribute, there are no arguments, so the
parens are optional.  But the fact that they're optional means that if
you do put parens, they belong to the method call, not the returned
value.  (And we have to make them optional rather than mandatorily
missing, since we require the parens to interpolate an attribute
in double-quote context.)

: I dislike parens.  If $object.method.() will really not work, is there a
: way to call it without adding parens? Adding parens for someone who
: doesn't plan an entire line of code before typing it, means going back
: (for me, this is the most important reason for using statement
: modifiers; it's not just linguistically pleasing).

Well, it's still extra parens, but my other solution:

$object.method()()

has the benefit of not forcing you to (horrors!) plan in advance.

I suppose you're one of those clever people who prefer to rewrite
the OS by typing

cat >/dev/kmem
[EMAIL PROTECTED]@[EMAIL PROTECTED]@7¥&$^@ [EMAIL PROTECTED]@...
^D

:-)

Larry

[perl #31423] [PATCH] two tests for NCI

2004-09-01 Thread via RT

# New Ticket Created by  Bernhard Schmalhofer 
# Please include the string:  [perl #31423]
# in the subject line of all future correspondence about this issue. 
# http://rt.perl.org:80/rt3/Ticket/Display.html?id=31423 >


Hi,

this patch adds two tests to t/pmc/nci.t.

The first new test should be a platform independent check of get_string() of
the
ParrotLibrary PMC.

The second new test is a callback test ported from PASM to PIR.

CU, Bernhard

-- 
/* [EMAIL PROTECTED] */

NEU: Bis zu 10 GB Speicher fïr e-mails & Dateien!
1 GB bereits bei GMX FreeMail http://www.gmx.net/de/go/mail

nci_tests_20040901.patch
Description: Binary data

Re: Semantics for regexes

2004-09-01 Thread Larry Wall

On Wed, Sep 01, 2004 at 01:57:32PM -0400, Dan Sugalski wrote:
: I promised Patrick this a while back but never got it, so here it is.
: 
: This is a list of the semantics that I see as needed for a regex 
: engine. When we have 'em, we'll map them to string ops, and may well 
: add in some special-case code for faster access.
: 
: *) extract substring

Mostly you don't want to go to the trouble of extracting substrings
until you're forced to, because you're continually creating and
destroying backrefs into your string, and you don't want to be
copying characters around for that.  As long as there's some kind
of COWish semantics to keep around an original copy of the string
being searched, that's probably all the regex engine itself wants.
It's generally the outer code that wants to make a copy of $1 et al.

: *) exact string compare
: *) find string in string

And maybe case insensitive variants of these, unless it turns out to
be better to combine it with other required composition/decomposition.
But then matching against the contents of a variable repeatedly is
likely to induce repeated canonicalizations.

: *) find first character of class X in string
: *) find first character not of class X in string

We're gonna run into the "what's a character?" issue here, especially
at higher Unicode levels where what the user things of as a single
character is really a sequence of codepoints.  From perspective of a
positive match, the character lengths can be largely self-defining.
>From the perspective of a negative match, the engine has to know what
"." means so you can skip one when the class doesn't match.  (And the
length of "." doesn't necessarily map to the length of all the entries
in the class...)

In particular, \n in Perl 6 has to match a set of weird sequences.
Which arguably could be matched by a routine if character classes
aren't up to it.

(Also, the Perl 6 parser may be based on the notion of a set of hash keys
being treated a bit like a lex grammar, and if we can use character
class lookup for that, it might need to have some "longest token
first" semantics.  Which you might need for ordinary classes anyway
as soon as you admit sequences.)

(Also, minor nit, I'm not sure "find" is the right verb here and
elsewhere.  Mostly the regex engine just wants to check the "current"
location.  The rest is control flow.)

: *) find boundary between X and not-X
: *) Find boundary defined by arbitrary code (mainly for word breaks)

We might have to use arbitrary code to match arrays and hashes as well,
if the opcodes support only scalar string matches.

: *) create new class X
: *) add or subtract character to class X
: *) create union|intersection|difference of two classes

Not sure you really need opcodes for those if character classes are
just real objects with a particular interface.

: I think this about does it, and we do some of this already. Are there 
: semantics people see as missing, or need more explanation? If so, 
: pipe up, we'll nail them down, then get the op mapping (with 
: implementation) and go from there.

I think that most of the other issues revolve around control flow
and remembering your current state, and being able to backtrack out
of that state, all of which Parrot can presumably handle with existing
ops, though perhaps not as efficiently as we might like.

I see one other potential gotcha with respect to backtracking and
closures.  In P6, a closure can declare a hypothetical variable
that is restored only if the closure exits "unsuccessfully".  Within
a rule, an embedded closure is unsuccessful if it is backtracked over.
But that implies that you can't know whether you have a successful
return until the entire regex is matched, all the way down, and all the
way back out the top, or at least out far enough that you know you
can't backtrack into this closure.  Abstractly, the closure doesn't
return until the entire rest of the match is decided.  Internally,
of course, the closure probably returns as soon as you run into the
end of it.  So we may have to jimmy the meaning of hypotheticality in
that context to defer undoing such variables until we hit a failure
continuation of some sort.  That's *probably* doable with the current
opcodes, but maybe not optimally.  In any event, we have to do all that
anyway for $1, $2, et al. whether they're inside or outside of closures.

Larry

Re: Semantics for regexes

2004-09-01 Thread Larry Wall

On Wed, Sep 01, 2004 at 01:07:49PM -0700, Larry Wall wrote:
: We might have to use arbitrary code to match arrays and hashes as well,
: if the opcodes support only scalar string matches.

I really wasn't being very clear about this.  For efficiency we may
need "trie" support (or something like it) to match various strings
in parallel.  My point is that it could very well be the case that
character classes are just a specific application of this.

Larry

Re: Semantics for regexes

2004-09-01 Thread Aaron Sherman

On Wed, 2004-09-01 at 16:07, Larry Wall wrote:

> I see one other potential gotcha with respect to backtracking and
> closures.  In P6, a closure can declare a hypothetical variable
> that is restored only if the closure exits "unsuccessfully".  Within
> a rule, an embedded closure is unsuccessful if it is backtracked over.
> But that implies that you can't know whether you have a successful
> return until the entire regex is matched, all the way down, and all the
> way back out the top, or at least out far enough that you know you
> can't backtrack into this closure.  Abstractly, the closure doesn't
> return until the entire rest of the match is decided.  Internally,
> of course, the closure probably returns as soon as you run into the
> end of it.

Let's get concrete:

rule foo { a $x:=(b*) c }
"abbabc"

So, if I understand Parrot and Perl 6 correctly (heh, fat chance), a
slight modification to the calling convention of the closure that
represents a rule (possibly even a raw .Closure) could add a pad that
the callee is expected to fill in with any hypotheticals defined during
execution. The following would happen in the example above:

store_lex "bb" into hypopad("$x") after "abb"
find "a" and fail the rule, backtracking (clear hypopad("$x"))
store_lex "b" into hypopad("$x") after backtracking over one "b"
find "b" next and fail the rule, backtracking again (clear)
store_lex "b" into hypopad("$x") after second "ab"
find "c" and succeed rule foo, return hypopad

Essentially every close-paren triggers binding, and every back-track
over a close-paren triggers clearing.

Because this is all part of the calling convention for a rule, there's
no difference between a rule "passing" back hypotheticals to its caller
and a sub-rule doing so to the rule which called IT.

Is that workable? Does it address your concern, Larry, or did I miss
your point?

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: Semantics for regexes

2004-09-01 Thread Aaron Sherman

On Wed, 2004-09-01 at 16:33, Aaron Sherman wrote:

>   rule foo { a $x:=(b*) c }

In the rest of my message I acted as if that read:

rule foo { a $x:=(b+) c }

so, we may as well pretend that that's what I meant to say ;-)

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: Semantics for regexes

2004-09-01 Thread Patrick R. Michaud

On Wed, Sep 01, 2004 at 01:07:49PM -0700, Larry Wall wrote:
> On Wed, Sep 01, 2004 at 01:57:32PM -0400, Dan Sugalski wrote:
> : I promised Patrick this a while back but never got it, so here it is.
> : 
> : This is a list of the semantics that I see as needed for a regex 
> : engine. When we have 'em, we'll map them to string ops, and may well 
> : add in some special-case code for faster access.
> : 
> : *) extract substring
> 
> Mostly you don't want to go to the trouble of extracting substrings
> until you're forced to, because you're continually creating and
> destroying backrefs into your string, and you don't want to be
> copying characters around for that.  As long as there's some kind
> of COWish semantics to keep around an original copy of the string
> being searched, that's probably all the regex engine itself wants.

Indeed, what I've been working towards in my head is that a rule
would consist of a set of "components" which can nest and can refer to
other rules, and each component would simply keep track of its
current start and end positions of what it matches in the string,
as well as what to do if that component needs to backtrack/continue 
because a successive component is unable to achieve its part of the match
(or if we're looking for an C<:exhaustive> set of matches).  As Larry
mentioned in a previous conversation, the rules engine will need to
need to remember its state without keeping a large stack (fortunately
there appear to be a number of possible optimizations here).

So, what a rule component really wants to do is to be able to perform
comparisons and tests of substrings in-place, as opposed to extracting
them to perform the match.  Opcodes to support that would be most
helpful (they may already be there--I just haven't looked at that part
yet).

And yes, my current state of thinking is that each rule component
is a Parrot object of some sort that knows where it fits in the
rule and can pass information and control flow to/from its neighbors 
in trying to get the rule(s) to match the target string.

> : *) find first character of class X in string
> : *) find first character not of class X in string
> We're gonna run into the "what's a character?" issue here, especially
> at higher Unicode levels where what the user things of as a single
> character is really a sequence of codepoints.  
> : [...]
> : *) create new class X
> : *) add or subtract character to class X
> : *) create union|intersection|difference of two classes
> Not sure you really need opcodes for those if character classes are
> just real objects with a particular interface.

Indeed, I was thinking that character classes would just be more
instances of rule component objects, and these would have methods
for building unions/intersections/differences.  Or, the rules compiler
itself can do much of this when it constructs the character class 
components, rather than try to build them dynamically in Parrot.

> (Also, the Perl 6 parser may be based on the notion of a set of hash keys
> being treated a bit like a lex grammar, 

Yes, I think this is likely.

> (Also, minor nit, I'm not sure "find" is the right verb here and
> elsewhere.  Mostly the regex engine just wants to check the "current"
> location.  The rest is control flow.)

Yes, I'm (pardon the pun) finding that advanced "string find" operations
may not be of great importance until we start looking at some special-case
optimizations.  I'd say to wait for those opcodes until we find (sorry!)
we need them.

> I see one other potential gotcha with respect to backtracking and
> closures.  In P6, a closure can declare a hypothetical variable
> that is restored only if the closure exits "unsuccessfully".  [...]

Indeed, but I'm gonna cross that bridge when I get to it.

Pm

Re: Semantics for regexes

2004-09-01 Thread Larry Wall

On Wed, Sep 01, 2004 at 04:33:24PM -0400, Aaron Sherman wrote:
: On Wed, 2004-09-01 at 16:07, Larry Wall wrote:
: 
: > I see one other potential gotcha with respect to backtracking and
: > closures.  In P6, a closure can declare a hypothetical variable
: > that is restored only if the closure exits "unsuccessfully".  Within
: > a rule, an embedded closure is unsuccessful if it is backtracked over.
: > But that implies that you can't know whether you have a successful
: > return until the entire regex is matched, all the way down, and all the
: > way back out the top, or at least out far enough that you know you
: > can't backtrack into this closure.  Abstractly, the closure doesn't
: > return until the entire rest of the match is decided.  Internally,
: > of course, the closure probably returns as soon as you run into the
: > end of it.
: 
: Let's get concrete:
: 
:   rule foo { a $x:=(b*) c }
:   "abbabc"
: 
: So, if I understand Parrot and Perl 6 correctly (heh, fat chance), a
: slight modification to the calling convention of the closure that
: represents a rule (possibly even a raw .Closure) could add a pad that
: the callee is expected to fill in with any hypotheticals defined during
: execution.

Okay, except that hypotheticality is an attribute of a variable's
value, not of the pad it's in.  As you wrote it above, $x would refer
to an external variable, which might well be in the outer lexical pad.
You can write $?x instead, which makes it automatically scoped to
the current rule (that is, it lives in the $0 object).  But again,
that's largely independent of whether it's hypothetical.  The binding
you did implies hypotheticality, but within an embedded closure it
wouldn't be hypothetical unless you said "let".  That is,

my $x;
rule foo { a $x:=(b+) c }

is shorthand for something like

my $x;
rule foo { a (b+) { let $x := $1 } c }

: The following would happen in the example above:
: 
:   store_lex "bb" into hypopad("$x") after "abb"
:   find "a" and fail the rule, backtracking (clear hypopad("$x"))
:   store_lex "b" into hypopad("$x") after backtracking over one "b"
:   find "b" next and fail the rule, backtracking again (clear)
:   store_lex "b" into hypopad("$x") after second "ab"
:   find "c" and succeed rule foo, return hypopad
: 
: Essentially every close-paren triggers binding, and every back-track
: over a close-paren triggers clearing.

Yes, that's essentially correct.  My quibble was simply that it may be
hard to keep track of what to clear out in the case of calling a
failure continuation.

: Because this is all part of the calling convention for a rule, there's
: no difference between a rule "passing" back hypotheticals to its caller
: and a sub-rule doing so to the rule which called IT.

Again, hypotheticality is (these days) independent of scope, though
a variable scoped to a rule certainly cannot live longer than its $0.

: Is that workable? Does it address your concern, Larry, or did I miss
: your point?

Well, kind of, but it's the "how" that gets interesting...

Larry

[perl #31424] PATCH: Fix for parrot linking issue on Solaris 8

2004-09-01 Thread via RT

# New Ticket Created by  [EMAIL PROTECTED] 
# Please include the string:  [perl #31424]
# in the subject line of all future correspondence about this issue. 
# http://rt.perl.org:80/rt3/Ticket/Display.html?id=31424 >


The attached patch fixes the solaris hints file to force the use of
'c++' for linking if Configure.pl finds gcc.  Without this patch, it
links with gcc which fails since it apparently can't find some of the
c++ symbols from icu.

This patch also changes the order of the Configure.pl tests so that
the gcc detection step is run before the hints step.  This has the
side effect of fixing the test that was already in the solaris hints
file.  As near as I can tell, this doesn't break anything, at least on
solaris.

I'm not sure this is the correct fix for this issue, but it fixes it
for me.  If anyone has a suggestions for a better way to handle this,
let me know and I'll redo it.


parrot.solaris-c++-link.patch
Description: Binary data

Re: Minor makefile fix

2004-09-01 Thread Dan Sugalski

At 5:03 PM +0100 9/1/04, Jonathan Worthington wrote:
Hi,
Here's a small fix to the root.in makefile; this fix is needed to get Parrot
building again on Win32 and probably in some other places too.
Applied, thanks.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: [perl #31419] PATCH: Fix for solaris platform asctime_r argument number mismatch

2004-09-01 Thread Dan Sugalski

At 8:37 AM -0700 9/1/04, [EMAIL PROTECTED] (via RT) wrote:
On Solaris 8, asctime_r takes 3 parameters instead of 2.  The
prototype looks like this:
char *asctime_r(const struct tm *tm, char *buf, int buflen);
This patch adds a time.c file to the solaris platform directory.  The
only change in this time.c file and the stock generic/time.c is that
the call to asctime_r passes 26 as the buflen.
The Solaris man page for asctime_r says that the result will always be
26 characters exactly.  I'm guessing the POSIX asctime assumes that
the buffer is at least 26  characters, so assuming the buffer that
Parrot_asctime_r gets is at least 26 characters isn't a new risk.
Without this patch, the latest version of parrot from CVS will not
compile on Solaris 8.
Applied, thanks
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: [perl #31424] PATCH: Fix for parrot linking issue on Solaris 8

2004-09-01 Thread Dan Sugalski

At 4:16 PM -0700 9/1/04, [EMAIL PROTECTED] (via RT) wrote:
The attached patch fixes the solaris hints file to force the use of
'c++' for linking if Configure.pl finds gcc.  Without this patch, it
links with gcc which fails since it apparently can't find some of the
c++ symbols from icu.
This patch also changes the order of the Configure.pl tests so that
the gcc detection step is run before the hints step.  This has the
side effect of fixing the test that was already in the solaris hints
file.  As near as I can tell, this doesn't break anything, at least on
solaris.
I'm not sure this is the correct fix for this issue, but it fixes it
for me.  If anyone has a suggestions for a better way to handle this,
let me know and I'll redo it.
Good enough -- applied, thanks!
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: [perl #31423] [PATCH] two tests for NCI

2004-09-01 Thread Dan Sugalski

At 12:20 PM -0700 9/1/04, Bernhard Schmalhofer (via RT) wrote:
this patch adds two tests to t/pmc/nci.t.
The first new test should be a platform independent check of get_string() of
the
ParrotLibrary PMC.
The second new test is a callback test ported from PASM to PIR.
Applied, thanks.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk

Re: Test::Harness/prove: printing the test name when a test fails

2004-09-01 Thread Andy Lester

On Tue, Aug 31, 2004 at 06:24:55PM +1000, Andrew Savige ([EMAIL PROTECTED]) wrote:
> I told him to use verbose mode (prove -v) but he still complained.
> Actually, I agree with him that when a test fails (even when not
> in verbose mode) it makes sense to print out as much useful
> infomation as possible (including the test name).

I can't see changing it.  What if there are 1000 failed tests?

xoa

-- 
Andy Lester => [EMAIL PROTECTED] => www.petdance.com => AIM:petdance

Re: Semantics for regexes

2004-09-01 Thread Steve Fink

On Sep-01, Dan Sugalski wrote:
> 
> This is a list of the semantics that I see as needed for a regex 
> engine. When we have 'em, we'll map them to string ops, and may well 
> add in some special-case code for faster access.
> 
> *) extract substring
> *) exact string compare
> *) find string in string
> *) find first character of class X in string
> *) find first character not of class X in string
> *) find boundary between X and not-X
> *) Find boundary defined by arbitrary code (mainly for word breaks)

Huh? What do you mean by "semantics"? The only semantics needed are the
minimum necessary to answer the question "is the fred at offset i equal
to the fred X?" (Sorry, not sure if fred is actually character or
codepoint or whatever, and is probably all of them at different levels.)

We also almost certainly need to be able to do character class
comparisons, although if you assume that you can always transcode to
what the regex was compiled with, then you don't even need that --
instead, you need to be able to convert to something like a difference
list of numbered freds. But if we're talking about semantics, then yes
you need the character class manipulation.

Everything else in this list sounds like optimizations to me, and
probably not the right optimizations (I don't think it's possible to
predict what will be useful yet.)

For other things that parrot will be used for, I suspect that the first
3 will be needed.

I'm curious as to how you came up with that list; it seems to imply a
particular way of implementing the grammar engine. I would expect all of
that, barring certain optimizations, to be done directly with existing
pasm instructions.

There will be a need for saving a stack of former values of hypothetical
variables, which can also be done with pasm ops but might interact with
overloaded assignment or something wacky like that.

Re: Proposal for a new PMC layout and more

2004-09-01 Thread Leopold Toetsch

Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 5:17 PM +0200 9/1/04, Leopold Toetsch wrote:
>>Below is a pod document describing some IMHO worthwhile changes. I
>>hope I didn't miss some issues that could inhibit the implementation.

> Okay, the "No" warrants more explanation.

Thanks.

> First off, the current structure of PMCs, Buffers, and Strings is
> definitely a mess, what with the multiple nested structs, semi-shared
> data, and weird smallobject overlap. A lot of stuff that is, in
> retrospect, crap has been layered on, so if this gets beaten up and
> cleaned out I won't mind in the least.

Well, that's what the proposal is for. Cleaning up and unifying existing
mess.

> The PMC scheme -- where PMCs are an immovable header with a vtable
> slot, cache slot, and flag slot -- stays. It's this way on purpose,
> and matches normal usage patterns (nicely efficiently) for perl 5 as
> well as (oddly) most python and ruby  usage. (Where there's a
> preponderance of low-level types)

I don't see that "matches normal usage patterns". Just the opposite of
it. The current PMC structure doesn't easily allow to create e.g.
Pythons "all is an object" POV. An Integer just needs 2 words of
information and not more. The rest (3 words) is just wasted. No
interpreter I've looked at has fixed sized objects. OTOH aggregates have
artifical helper structures to store needed information. A variable
sized PMC covers that all and eliminates all indirections totally to
access these data. I don't see efficiency either, neither in execution
time nor in memory usage in the current scheme.

> Buffers and strings are special-purpose constructs, or at least they
> *should* be. They're segregated off for GC purposes.

Buffers and strings are different because the current PMC structure
doesn't allow or support an arbitrary object layout. This lack of
functionality creates the need for Buffers. Which leads to more
indirection in accessing an PMC's data and more overhead during GC.

The unificiation into one coherent object model just simplifies all that
stuff.

> ... While they could
> be unified with PMCs, I don't want them to be. They've specific,
> special purposes, and as such they're staying the way they are.

A PMC is specific enough. The vtable makes it special. The vtable
defines the functionality of that very object. There isn't any real
difference between an Buffer structure or an array-ish PMC. Both hold
some amount of data. But we currently treat these two totally
differently for no good reason.

> Strings, FWIW, are *not* a perl 6 specific thing. The current string
> design is sufficient, and *will* be used, for perl 5, python, and
> ruby, as well as any other language that wants to live on parrot and
> handle string data. While there's stuff to be added still, there's no
> reason that I can see to mess with them.

Well, the current need for a distinct STRING type arises just because of
a lack in PMCs to deal with strings. E.g.

  $P0 = a_func_returning_a_string()
  $S0 = $P0
  $I0 = length $S0

That's the way to get at the length of a string. Python doesn't have a
notion of a STRING, I'm not aware of anything like that in Perl5 either.
So functions are returning objects aka PMCS. Mostly all operations are
dealing with PMCs only. This is my experience coming from the Pie-thon
quest, but not alone.

The need for STRING opcodes and vtable/MMD functions just comes from a
lack of functionality in PMCs. Unifying or just having String PMCs
eliminates this lack and almost one third of opcodes.

STRING operations aren't the fastest anyway. I don't see any reason to
just provide all these in PMCs (which we need anyway) and eliminate the
duplication with a distinct type.

Native integers and numbers do warrant the specialization. Processors and
JIT can supprt these types natively. Nothing can be done with STRINGs*.
These are just overhead and code duplication currently.

> Finally, Nicholas is right -- this is messing around with stuff that
> already works. We're better off working on things that don't exist
> yet, and leave this to later.

That's of course true. There is a lot of stuff that needs to be done and
should be done before reworking internals deeply. OTOH a lot of
currently todo stuff could immediately be done much more easily.

We need some more PMCs e.g. to manage packfiles and code segments. The
rigid structure of the fixed-sized PMCs is always a PITA when
implementing new objects.

Unicode string vtables is another issue, albeit I don't know, if
some/all vtable slots are usable for string operations. But we got
already e.g. concatenate or bitwise string vtables.

> If you want, we can hash out the changes to sub calling (with the
> swapping interpreter structs we've been arguing over), moving the
> return continuation/calling object/called sub into the interp
> structure,

Of course, yes. That thread is BTW lacking another answer: what's the
difference between your derived proposal and mine.

> ... and fixing up the JIT

Re: Proposal for a new PMC layout and more

2004-09-01 Thread Leopold Toetsch

Steve Fink <[EMAIL PROTECTED]> wrote:
> On Sep-01, Leopold Toetsch wrote:
>> Below is a pod document describing some IMHO worthwhile changes. I hope
>> I didn't miss some issues that could inhibit the implementation.

> Overall, I like it, although I'm sure I haven't thought of all of the
> repercussions.

> The one part that concerns me is the loss of the flags -- flags just
> seem generally useful for a number of things. In the limit, each flag
> could be replaced by an equivalent vtable entry that just returned true
> or false,

I'm not thinking about vtable entries returning a flag bit. E.g. the
presence of PObj_custom_mark_FLAG could as well be tested as:

   if (pmc->vtable->mark)   // != NULL

Generally speaking the vtable mostly holds the information, that is
needed for one kind of PMC. More specialized PMCs can have their private
flags (for example a Key PMC). But they are normally not needed. An
Integer or Float PMC doesn't need any flags to perform its operation.
The proposed scheme doesn't of course forbid private flags in the PMCs
data section. But a lot of PMCs just don't need any flags.

> The proposal would also expand the size of the vtable by a bit due to
> the string vtable stuff.

No. The vtable would very likely shrink. 46 vtable (or MMD) entries are
currently used by STRING* operations. These would be just PMC
operations, which we have anyway.

>  ... will MMD really take care of all of
> the places where we care that we're going to a string, specifically,
> rather than any other random PMC type?

MMDs have to deal with that anyway. We have String PMCs. The vtables or
MMD functions that currently take STRING* ought to be optimized
shortcuts for STRING* arguments. But if a String PMC is passes, still
The Rigth Thing has to happen.

> I also view the proposal as being comprised of several fairly
> independent pieces. Something like:

 * allow/allocate variable sized PMCs

Then --yes.

>  * Merging PMCs and Buffers
>  * Merging STRINGs and PMCs

That's the same thing, mostly.

>  * Removing GC-related flags and moving them to GC implementations

We already have that. But it's not hidden or encapsulated.

>  * Removing the rest of the flags

Yep.

>  * Using Null instead of Undef

No. Undef is a totally different thing. There is no change here. The
Null PMC catches program errors (like using a C NULL pointer). The Undef
is just a placeholder that morphs to any other type.

>  * Moving "extra" stuff to before the PMC pointer
>  * Using Refs to expand PMCs
>  * Using DOD to remove the Ref indirection
>  * Shrinking the base PMC size

Yep. That is related. though.

> ... And some are only dependent
> in the sense that you'll make space or time performance worse until you
> make the rest of the related changes. You could call those
> design-dependent, rather than implementation-dependent.

Yes.

leo

42 matches

Mail list logo