Re: GC design

2002-05-28 Thread Jerome Vouillon

On Mon, May 27, 2002 at 08:41:59AM -0700, Sean O'Rourke wrote:
> But there are two kinds of "available" here: available without asking the
> operating system for more; and available period.  If we're in the first
> situation, it seems reasonable to just ask the OS for a new block and keep
> going, noting that collecting soon would be a Good Thing.  Our memory
> requirements have probably increased, so we'd be asking for more soon
> anyways, and if they haven't, we'll reclaim a lot of memory on the next
> collection, and only be "wasting" 8 or 16k.  We can even try to give some
> back, though the OS probably won't take it.  If we're in the second
> situation, then either (1) we've gone far too long without collectiong, or
> (2) we're in serious trouble anyways.  And even in this situation, we can
> wriggle out of it most of the time by keeping some amount of "emergency"
> memory stashed away.

I don't think (1) is true.  For performance reason, we should let
quite a lot of garbage accumulate before triggering a collection
(something like 40% of space overhead).  So, for instance, if the heap
is full and we need to allocate a huge string (10 Mbytes), it is quite
likely that performing a collection will free enough memory.  And I'm
not sure we are ready to always keep 10 Mbytes of "emergency" free
memory to be able to handle this case without collection.

[...]
> By the way, neither of string_grow()'s callers
> checks its return value now, which indicates to me that this may be
> error-prone.

Also, it seems that the return value of alloc_new_block is never
checked, even when really huge blocks are allocated, as in
compact_buffer_pool...

-- Jerome



Re: [netlabs #629] [PATCH] Memory manager/garbage collector -majorrevision

2002-05-28 Thread Mike Lambert

Okay. I have yet another idea for solving our infant mortality problem,
which I think Dan might like. :)

The neonate idea originally was intended to be set on *all* headers
returned by the memory system, and they'd be reset by a clear_neonate op.
At least, that's how I understood it. A straightforward implementation of
the above is about 50% slower than it was before, so I think that rules
this option out.

The current code (without this patch), adds neonate wherever it discovers
that it is needed, and turns it off when it is done. This was quite
efficient, but required the user to constantly think about what functions
could cause GC, etc. It was rather error-prone.

If I understood Dan correctly on IRC yesterday, he was proposing that our
current approach of handling infant mortality everywhere it can occur, is
the 'correct' approach. It definitely buys us speed, but as mentioned
above, it's somewhat error prone. The below is an attempt to try and
convince Dan that in lieu of hardcore GC-everywhere programming, there is
a middle ground. I believe we need a middle ground because forcing users
to learn the quirks of our GC system makes parrot programming less fun,
and raises Parrot's barrier to entry.

As I was working on my revised GC system, I came up with a relaxation of
the above that should be easier on programmers, and yet still be fast.
It's not revolutionary by any means, but rather grabbing bits and pieces
of different people's solutions. When you call new_*_header, the neonate
flag is automatically turned on for you. As a programmer writing a
function, you explicitly turn off the neonate flag when you attach it to
the root set, or let it die on the stack. If you return it, you don't do
anything, as it becomes the caller's job to handle.

Neonate guarantees that it won't be collected, avoiding infant mortality.
The programmer does not have to explicitly turn it on. Just turn it off.

>From a cursory glance over string.c, only string_concat and string_compare
create strings which die within the scope of that function, and thus need
to be modified.

This approach would complicate many of our string .ops, however. Stuff
like "$1 = s" needs to turn off the neonate flag. Perhaps we can encode
logic into the ops2c converter to turn off the neonate flag for things
that it can detect, or perhaps we can require the user to do it because
automated converters are guaranteed to fail. Core.ops requires a lot of
such modifications, however. Things like err, open, readline, print, read,
write, clone, set, set_keyed, the various string ops (substr, pack, etc),
and savec, all require modification.

I think these guidelines make it easy for non-GC-programmers to writ
GC-dafe code, since they do not need to be aware of what allocates memory,
and what does not.

What do people think of this approach?
Mike Lambert




Re: [netlabs #629] [PATCH] Memory manager/garbage collector - major revision

2002-05-28 Thread Jerome Vouillon

On Mon, May 27, 2002 at 04:33:07PM -, Peter Gibbs wrote:
> These changes do cause a slight performance degradation, but I believe it is
> worth it for the overall simplification of transparent protection of the
> newborn.
> Performance can only be a secondary goal, after correct behaviour.

What level of performance are we aiming at?

It seems to me that memory allocation is already quite slow.  So, I
find it worrying that this change make it noticably slower.

Another point to consider is that if we start to write code that
assumes that newborns are not garbage collected, then it will hard to
fix this code later if this turns out to be too costly.

-- Jerome



Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Jerome Vouillon

On Tue, May 28, 2002 at 04:45:06AM -0400, Mike Lambert wrote:
> When you call new_*_header, the neonate
> flag is automatically turned on for you. As a programmer writing a
> function, you explicitly turn off the neonate flag when you attach it to
> the root set, or let it die on the stack. If you return it, you don't do
> anything, as it becomes the caller's job to handle.

Suppose your C code builds a nested datastructure.  For instance,
it creates some strings and add them to a hash-table.  The hash-table
is then returned.  Should it clear the neonate flag of the strings?

-- Jerome



Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Jerome Vouillon


I propose the following alternate guidelines.

First, the code would look something like this:

  STRING * concat (STRING* a, STRING* b, STRING* c) {
PARROT_start();
PARROT_str_params_3(a, b, c);
PARROT_str_local_2(d, e);

d = string_concat(a, b);
e = string_concat(d, c);

PARROT_return(e);
  }

Then, the rules would be:
  (1)  start your functions with PARROT_start
  (2)  register all parameters of type STRING * with PARROT_str_params
  (2') register all parameters of type PMC * with PARROT_pmc_params
  (3)  declare the local variables of type STRING * with PARROT_str_local
  (3') declare the local variables of type PMC * with PARROT_pmc_local
  (4)  use PARROT_return to exit the function
  (5)  do not nest function calls
   (for instance, "e = string_concat (string_concat(a, b), c);"
would be forbidden)

The idea is to explicitly manage a stack of parrot objects, which can
be traversed by the GC.

This rules let a lot of freedom to the garbage collector:
- it can garbage collect anything which is not rooted;
- it can move objects (strings and PMCs) around;
- objects do not need any additional field/flag;
- exception can be implemented using longjmp; if an exception is
  raised, temporary allocated objects will be properly freed by the GC.

Do you think that these rules would be too error-prone, or too
cumbersome?

-- Jerome



RE: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Brent Dax

Jerome Vouillon:
# I propose the following alternate guidelines.
# 
# First, the code would look something like this:
# 
#   STRING * concat (STRING* a, STRING* b, STRING* c) {
# PARROT_start();
# PARROT_str_params_3(a, b, c);
# PARROT_str_local_2(d, e);
# 
# d = string_concat(a, b);
# e = string_concat(d, c);
# 
# PARROT_return(e);
#   }
# 
# Then, the rules would be:
#   (1)  start your functions with PARROT_start
#   (2)  register all parameters of type STRING * with PARROT_str_params
#   (2') register all parameters of type PMC * with PARROT_pmc_params
#   (3)  declare the local variables of type STRING * with 
# PARROT_str_local
#   (3') declare the local variables of type PMC * with PARROT_pmc_local

I assume the lack of mentions of Buffers are an oversight.

#   (4)  use PARROT_return to exit the function
#   (5)  do not nest function calls
#(for instance, "e = string_concat (string_concat(a, b), c);"
# would be forbidden)

I don't understand the reasoning behind (5).  Would you care to
elaborate?

# The idea is to explicitly manage a stack of parrot objects, 
# which can be traversed by the GC.
# 
# This rules let a lot of freedom to the garbage collector:
# - it can garbage collect anything which is not rooted;
# - it can move objects (strings and PMCs) around;
# - objects do not need any additional field/flag;
# - exception can be implemented using longjmp; if an exception is
#   raised, temporary allocated objects will be properly freed 
# by the GC.

Anything that lets us use longjmp is fine by me.  ;^)

# Do you think that these rules would be too error-prone, or 
# too cumbersome?

May I suggest an alternate version?

STRING * concat (STRING* a, STRING* b, STRING* c) {
  Enter_sub();  /* The macros don't really need a Parrot_ prefix, do
they? */
  Str_params_3(a, b, c);
  Decl_str(d);  /* These aren't needed anymore, actually... */
  Decl_str(e);
  
  Return(
string_concat(
  string_concat(a, b),
c
)
  );
}

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

blink:  Text blinks (alternates between visible and invisible).
Conforming user agents are not required to support this value.
--The W3C CSS-2 Specification




Re: [netlabs #629] [PATCH] Memory manager/garbage collector -majorrevision

2002-05-28 Thread Mike Lambert

>  STRING * concat (STRING* a, STRING* b, STRING* c) {
>PARROT_start();
>PARROT_str_params_3(a, b, c);
>PARROT_str_local_2(d, e);
>
>d = string_concat(a, b);
>e = string_concat(d, c);
>
>PARROT_return(e);
>  }

Yet more ideas. Woohoo! :)

I considered this kind of approach myself, but
discarded it due to the ton of extraneous code you have to write to do the
simplest of things. :( I'm not sure if the other people have considered
it, discarded it, or are still considering it.

As far as the pros/cons...

First, it requires you write in a pseudo-language to define your local PMC
headers and how to return data. I'm sure the macro freaks that have been
scarred by perl5 will jump on here and beat you down in a few hours or so.
:)

Can you provide an implementation of the macros you described above? I
have a few concerns which I'm not sure if they are addressed. For example:

PARROT_str_local(d)
I'm assuming it puts a reference to d onto the rooted stack. It would also
need to initialize d to NULL to avoid pointing at garbage buffers.

PARROT_str_params_3(a, b, c);
What's the point of this? With rule 5 that prevents function call nesting,
you're guaranteed of all your arguments being rooted. I think you can lose
either the nesting requirement or the str_params requirement.

PARROT_return(e);
I'm assuming this backs the stack up to the place pointed to by
PARROT_start(), right? This means during a longjmp, the stack won't be
backed up properly until another PARROT_return() is called, somewhere
farther up the chain, right?

Finally, I think Dan has already outlawed longjmp due to problems with
threading, but he'll have to elaborate on that. I agree my most recently
stated approach is not longjmp safe since it could leave neonate set on
certain buffers/pmcs.


Finally, in response to my original post, you asked:

> Suppose your C code builds a nested datastructure.  For instance,
> it creates some strings and add them to a hash-table.  The hash-table is
> then returned.  Should it clear the neonate flag of the strings?

I think I'd have to say...don't do that. Ops and functions shouldn't be
building large data structures, imo. Stuff like buliding large hashes
and/or arrays of data should be done in opcode, in perl code, or whatever
language is operating on parrot.

If you *really* need to operate on a nested datastructure, and you're
going to hold it against my proposal, then there are two options.

a) write code like:
base = newbasepmc #nenoate pmc
other = newchildpmc #also neonate
base->add(other) #even if collecting/dod'ing, can't collect above two
done_with_pmc(other) #un-neonates it, since it's attached to a root (neonate0 set
repeat...

It works, and then you just need to worry about what to do with your
'base' at the end of the function (to un-neonate it or not).

b) make a done_with_children_of_pmc() style function. it hijacks
onto the tracing functionality inherent in the DOD code, and searches for
a contiguous selection of neonate buffers and pointers eminating from
the pmc we pass in, and un-neonates them, leaving the passed-in-pmc
neonated. Since everything we do in the function is nenoate, everything we
construct into this base pmc should be contiguously neonate, if that
makes sense.

Granted, it's a little bit expensive to do the tracing, but you shouldn't
need to trace too deep at all, and its time is proportional to the size of
the nested data structure you are creating.

Does that help?
Mike Lambert

PS: Oh, and I forgot to mention in my previous proposal about the need for
nenonating pmc headers, and to look into what functions need to un-neonate
pmc headers. That should be localized to the vtable methods, which are
sort of a mess right now anyway with the transmogrification of vtables and
have other GC problems.





Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Jerome Vouillon

On Tue, May 28, 2002 at 04:57:01AM -0700, Brent Dax wrote:
> I assume the lack of mentions of Buffers are an oversight.

Right.  It would be great if there was only one kind of parrot objects...

> #   (5)  do not nest function calls
> #(for instance, "e = string_concat (string_concat(a, b), c);"
> # would be forbidden)
> 
> I don't understand the reasoning behind (5).  Would you care to
> elaborate?

Actually, the example I give is safe.  But consider:

   e = string_concat (string_concat(a, b), string_concat(c,d));

Let us assume that the argument of the functions are evaluated
rightmost first.  Then, the return value of "string_concat(a, b)" is
not "rooted" anywhere when "string_concat(c,d)" is executed.  So, it
will be wrongly freed by the GC if a collection occurs at this
time.

> The macros don't really need a Parrot_ prefix, do they?

Right, you can use whatever name you prefer.

-- Jerome



Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Jerome Vouillon

On Tue, May 28, 2002 at 08:30:52AM -0400, Mike Lambert wrote:
> Can you provide an implementation of the macros you described above? I
> have a few concerns which I'm not sure if they are addressed. For example:

#define PARROT_start() \
   frame * saved_top = stack_top;

> PARROT_str_local(d)
> I'm assuming it puts a reference to d onto the rooted stack. It would also
> need to initialize d to NULL to avoid pointing at garbage buffers.

#define PARROT_str_local(d)\
   STRING * d = NULL;  \
   frame frame_##d;\
   int dummy_##d = (   \
 (frame_##d.ptr = &d), \
 (frame_##d.next = stack_top), \
 (stack_top = &frame_##d), \
 0);

> PARROT_str_params_3(a, b, c);
> What's the point of this? With rule 5 that prevents function call nesting,
> you're guaranteed of all your arguments being rooted. I think you can lose
> either the nesting requirement or the str_params requirement.

Yes, you are right: we don't need this macro if all the arguments are
already rooted.

> PARROT_return(e);
> I'm assuming this backs the stack up to the place pointed to by
> PARROT_start(), right?

Right:

#define PARROT_return(e)   \
  do { \
stack_top = saved_top; \
return e;  \
  } while (0); \

> This means during a longjmp, the stack won't be
> backed up properly until another PARROT_return() is called, somewhere
> farther up the chain, right?

Right, the stack has to be backed up explicitely.  But the exception
handler can do it immediately.

> Finally, in response to my original post, you asked:
> 
> > Suppose your C code builds a nested datastructure.  For instance,
> > it creates some strings and add them to a hash-table.  The hash-table is
> > then returned.  Should it clear the neonate flag of the strings?
> 
> I think I'd have to say...don't do that. Ops and functions shouldn't be
> building large data structures, imo. Stuff like buliding large hashes
> and/or arrays of data should be done in opcode, in perl code, or whatever
> language is operating on parrot.

Still, it seems reasonable for a function to return a small
datastructure, such as a pair of strings.

> If you *really* need to operate on a nested datastructure, and you're
> going to hold it against my proposal, then there are two options.
> 
> a) write code like:
> base = newbasepmc #nenoate pmc
> other = newchildpmc #also neonate
> base->add(other) #even if collecting/dod'ing, can't collect above two
> done_with_pmc(other) #un-neonates it, since it's attached to a root (neonate0 set
> repeat...
> 
> It works, and then you just need to worry about what to do with your
> 'base' at the end of the function (to un-neonate it or not).

This sounds reasonable.

-- Jerome



Re: REGEX structure and Regex implementation

2002-05-28 Thread David M. Lloyd

On Sun, 26 May 2002, Steve Fink wrote:

> I implemented it that way once in my private tree. But I ended up
> replacing it with a couple of PerlArrays.
>
> I am now of the opinion that there's currently nothing for a regex PMC
> to do. At compile-time, you know what sort of beast you're matching
> against. If you want to incrementally match an input sequence of some
> sort, then you should probably be using the same continuation or
> coroutine mechanism that regular subs use.

I've done some thinking about it this weekend.  Upon reflection, what I
should have said was "Regexes should be objects", which is more of a perl6
thing.  What you say makes sense: how would the regex ops be called from
within the PMC?  It would be unneccesary overhead, and it doesn't make
much sense...

- D

<[EMAIL PROTECTED]>




Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Robert Spier


>#define PARROT_str_local(d)\
>   STRING * d = NULL;  \
>   frame frame_##d;\
>   int dummy_##d = (   \
> (frame_##d.ptr = &d), \
> (frame_##d.next = stack_top), \
> (stack_top = &frame_##d), \
> 0);

This triggers the perl5 macro-iitis alarm.  

Please don't do this :)

If you need to do things like that, then maybe we need to develop our
own function inliner -- there will still be debugging issues with the
optimizer, but less, if it looks like this:

 
   STRING * concat (STRING* a, STRING* b, STRING* c) {
 PARROT_str_local_2(d, e);

 foo;
   } 

Expands into:
 
#line sourcefile.pmc 50
  STRING * concat (STRING* a, STRING* b, STRING* c) {
#line macros.h 50
   STRING * d = NULL;  
   frame frame_d;
   frame_d.ptr = &d;
   frame_d.next = stack_top;
   stack_top = &frame_d; 
#line sourcefile.pmc 52


BUT--

We've got enough complicated preprocessor issues right now - I'm
not sure we want to add another one.  Defining perl5ish macros
will cause too many troubles down the road.

Or... since C99 supports C function inlining (iirc) - we could
just rely on a C99 compiler

-R


 




Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread mrjoltcola

On Tue, 28 May 2002 12:50:06 +0200 Jerome Vouillon <[EMAIL PROTECTED]> wrote:

>I propose the following alternate guidelines.
>
>
>  STRING * concat (STRING* a, STRING* b, STRING* c) {
>PARROT_start();
>PARROT_str_params_3(a, b, c);
>PARROT_str_local_2(d, e);
>
>d = string_concat(a, b);
>e = string_concat(d, c);

>PARROT_return(e);
>  }

If you search the archive, you'll find
I've already proposed this type of
solution in 2 different flavors and both
times the consensus has gone back to
setting flags.

It seems we are going around in a circles.

I also agree its funny we are worrying
about performance when its not apparent
the allocation overhead even has anything
to do with the current performance problem.

-Melvin



Re: [COMMIT] Added preprocessor layer to newasm.

2002-05-28 Thread mrjoltcola

On Tue, 28 May 2002 01:19:25 -0400 Jeff <[EMAIL PROTECTED]> wrote:

>newasm now handles constants, macros, and local >labels within. Here's a

Great work!


>expansion. Also, they don't expand >recursively. '.constant FOO
>"blah"n.constant BAR "Hey, .FOO"' won't do what >you want, sadly.

Thats exactly what I want. I don't think
the assembler should do any sort of interpolation
with string constants at all.

>that's what I'll work on next.

When Simon first committed it, I tested
newasm and noticed 2-3x speedup on assembly
speed. Is this still the case?

I've been running some tests with executing
Cola on the fly, and the assembly phase (slow)
is the big bug in the soup right now, however
I've been using the old assembler.

PS: Thanks (Simon and Jeff) for a lot of hard
work on newasm.

-Melvin



Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Jerome Vouillon

On Tue, May 28, 2002 at 03:45:58PM +0200, Jerome Vouillon wrote:
> On Tue, May 28, 2002 at 08:30:52AM -0400, Mike Lambert wrote:
> > PARROT_str_params_3(a, b, c);
> > What's the point of this? With rule 5 that prevents function call nesting,
> > you're guaranteed of all your arguments being rooted. I think you can lose
> > either the nesting requirement or the str_params requirement.
> 
> Yes, you are right: we don't need this macro if all the arguments are
> already rooted.

Well, actually we need this macro if we want to allow the garbage
collector to move objects: then, the values of the arguments may need
to be updated.

-- Jerome



Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Jerome Vouillon

On Tue, May 28, 2002 at 07:54:49AM -0700, Robert Spier wrote:
> 
> >#define PARROT_str_local(d)\
> >   STRING * d = NULL;  \
> >   frame frame_##d;\
> >   int dummy_##d = (   \
> > (frame_##d.ptr = &d), \
> > (frame_##d.next = stack_top), \
> > (stack_top = &frame_##d), \
> > 0);
> 
> This triggers the perl5 macro-iitis alarm.  
> 
> Please don't do this :)
> 
> If you need to do things like that, then maybe we need to develop our
> own function inliner -- there will still be debugging issues with the
> optimizer, but less, if it looks like this:
[...]

What are the debugging issues you mention?  Note that this macro will
never fail: there is no pointer deferencing, no memory allocation, ...

-- Jerome



Re: GC design

2002-05-28 Thread Jerome Vouillon

On Mon, May 27, 2002 at 08:41:59AM -0700, Sean O'Rourke wrote:
> Since our memory-allocating routines return NULL quite a ways back up the
> call chain (all the way through e.g. string_grow()), here's another way to
> do this -- if allocation returns NULL all the way to an op function, it
> can make the things it wants to keep reachable from somewhere, do a
> collection, and retry.  By the way, neither of string_grow()'s callers
> checks its return value now, which indicates to me that this may be
> error-prone.

That's an interesting point, actually.  What is the right thing to do
when we run out of memory?
- Abort immediately.
  This is not very user-friendly.
- Return a special value.
  But then we need to check the return value of almost all functions
  (including string_compare, for instance).
- Instead of returning a special value, we can set a global variable
  to signal the error.
  Again, we need to check this variable everywhere.
  Note that this is the solution adopted by Java.
- Raise an exception using longjmp.
  But then, if we start using locks all over the place like in Java,
  we are pretty sure to leave the program in an inconsistent state.
  (It is neither safe to release the locks nor to keep them.)
- Use a small amount of emergency memory to invoke a user-defined
  handler which can either release some memory or signal the problem
  to the user.  (If the handler returns, we do another collection and
  we try to reserve some emergency memory again.  If this succeeds,
  then we resume the program.  Otherwise, we abort.)
  But how much emergency memory is enough?

-- Jerome



Re: Parrot and Mono / .Net

2002-05-28 Thread Scott Smith

I believe that the idea is to make things flexible enough that FURTHER
changes to Perl, beyond Perl 6, will be easier too.

On Sun, 2002-05-26 at 06:10, Ask Bjoern Hansen wrote:
> [EMAIL PROTECTED] (Sebastian Bergmann) writes:
> 
> > Leon Brocard wrote:
> > > Oh, this happens to be a FAQ. The main reason is:
> > >
> > > http://www.parrotcode.org/faq/
> > 
> >   I know the technical reason for a new VM, but this could've been a new
> >   VM for Perl 6 only. What I'd like to know is the motivation to open up
> >   the architecture and allow for plugable parser, compilers, bytecode
> >   generators / optimizers, ...
> 
> Because if we can support [insert random language here] then we can
> support a very flexible Perl 6 language.  Or the other way around.
> 
> 
>  - ask
> 
> -- 
> ask bjoern hansen, http://ask.netcetera.dk/   !try; do();




Re: [netlabs #629] [PATCH] Memory manager/garbage collector -major revision

2002-05-28 Thread Robert Spier

>What are the debugging issues you mention?  Note that this macro will
>never fail: there is no pointer deferencing, no memory allocation, ...

Never is a bad word to use for anything more complicated than x=1+2.
(Which will hopefully get constant folded and optimized away anyway.)

It is impossible to single step (line by line.. statement by
statement)  through that macro in a debugger.  That is a general class
of things we want to avoid.  Computers can do weird things sometimes,
and if, for example, something is up with the stack, I'm going to want
to look at it right before it gets assigned, no matter that it
shouldn't have changed in the past 10 lines.

-R




Re: GC design

2002-05-28 Thread Sean O'Rourke

On Tue, 28 May 2002, Jerome Vouillon wrote:
> That's an interesting point, actually.  What is the right thing to do
> when we run out of memory?
> - Abort immediately.
>   This is not very user-friendly.
> - Return a special value.
>   But then we need to check the return value of almost all functions
>   (including string_compare, for instance).

I personally dislike this approach, as it requires a large amount of
programming discipline from everyone who works on the project.  The
current code indicates that if we took this approach, we would spend quite
a bit of time squashing bugs from not checking return values in wierd
places.  It probably also hurts common-case performance to litter your
code with redundant null-checks as well, but I don't have any data.

> - Instead of returning a special value, we can set a global variable
>   to signal the error.
>   Again, we need to check this variable everywhere.

This by itself seems worse than the above, since it makes problems even
easier to ignore.

>   Note that this is the solution adopted by Java.

Last time I wrote in Java, errors were entirely exception-based.  Have
things changed because of the locking issues you mention below?

> - Raise an exception using longjmp.
>   But then, if we start using locks all over the place like in Java,
>   we are pretty sure to leave the program in an inconsistent state.

We're currently lock-free, which makes this sound like a good option.

> - Use a small amount of emergency memory to invoke a user-defined
>   handler which can either release some memory or signal the problem
>   to the user.

I worked on some code that had something like this for memory allocation,
and found it worked well.  If malloc failed, the code would call a handler
which free()d a chunk of emergency memory grabbed on startup, and set a
global "we're low on memory" flag.  Then it would retry the allocation,
and exit if it failed.  Whenever memory was reclaimed, the system would
check to see if it had freed the emergency chunk, and if so, try to
reallocate it and clear the flag.  The global low-mem flag could then be
used to try to reduce overall memory use.

>   But how much emergency memory is enough?

64k?  There will certainly be pathologically large allocations for which
we can never have enough, but it seems like we could experiment until we
found a size that gives us enough time to recover or exit gracefully.

/s




ICU and Parrot

2002-05-28 Thread George Rhoten

Hello all,

Hopefully I won't get too burned by flames by jumping into the middle of
the conversation like this.

I recently stumbled across your list talking about ICU and Unicode. I am
not advocating that you should or shouldn't use ICU. Each group has their
own requirements. As a person that actively works on the ICU
implementation, I thought I should clear up some of your questions and
misconceptions on ICU. I also have a question of my own for this mailing
list later on.

ICU 2.1 works on MacOS X, and has mostly worked in the past on MacOS 8 and
9 (project files for older Macs are not included). Some companies actively
use ICU on the MacOS 8 and 9. The list of supported platforms that was
quoted on your mailing list recently was old.  Please take a look here
http://oss.software.ibm.com/cvs/icu/~checkout~/icu/readme.html#HowToBuildSupported
for the latest list of supported platforms.

ICU does not work on PalmOS or VMS yet; however, ICU does work on Alpha
based Linux, and ICU has been ported to the Tru64 (OSF) platform with
Compaq's cxx compiler in the past. We do not have the resources to do this
porting effort for every platform ourselves, and so we need other people to
do the porting for us. As an open source project we always welcome
contributions to make ICU work on other platforms.

It is true that parts of ICU uses C++. Some parts of ICU are written in C++
with a C wrapper. Some other parts are written in C with a C++ wrapper. It
depends on the API being used.  Most of the functionality in the common
library is written in C, and most of our i18n library uses C++. You can see
some of the C/C++ dependencies here
http://oss.software.ibm.com/icu/userguide/design.html under API
dependencies.  

The vast majority of people that we encounter do have a C++ compiler, and
we only use the most portable subset of C++. C++ features like templates,
exceptions, run-time type information, STL and multiple inheritance are NOT
used in ICU.  All of our C code is ANSI C89 compliant according to gcc.
Since ICU works with some old C compilers, I'm sure that there shouldn't be
any concerns about our usage of the C language.

Many questions about ICU can be answered on our icu4c-support list. You can
go here http://oss.software.ibm.com/icu/archives/index.html to see how to
subscribe to the list.

On a side note, we have been thinking about putting regular expressions
into ICU someday (no firm plans yet). Maybe we could do some collaboration
with a regular expression engine in ICU. Would this group be interested in
such a collaboration?

Thank you for your interest in ICU.

George



GC, exceptions, and stuff

2002-05-28 Thread Dan Sugalski

Okay, i've thought things over a bit. Here's what we're going to do 
to deal with infant mortality, exceptions, and suchlike things.

Important given: We can *not* use setjmp/longjmp. Period. Not an 
option--not safe with threads. At this point, having considered the 
alternatives, I wish it were otherwise but it's not. Too bad for us.

So, on to the rules/proclamations/exercise of unreasonable dictatorial power.

1) Functions are only responsible for ensuring liveness of 
strings/buffers/PMCs up until the point they exit. 
Strings/PMCs/Buffers that are returned to a caller are the caller's 
problem.

2) All calls to routines which may fail must check for failure and 
exit indicating an exception if they do so. They should exit as 
gracefully as they can. They *may* override the exception if 
appropriate. (Potentially voiding it, or throwing a different 
exception)

3) Opcode functions which note that something they called has thrown 
an exception are responsible for posting an interpreter exception.

4) Everything that can fail *must* be checked. So no code like:

 string_foo(string_foo(), string_foo())

if string_foo can pitch an exception.

5) We're dealing with infant mortality by pushing baby strings on the 
stack. We'll add in a stack_extend and quickpush routine to 
pre-extend (guaranteed) the stack and push the potential baby 
string/PMC/Buffer respectively.


Yes, these will be a pain to deal with. Alas, too bad for us.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



RE: GC design

2002-05-28 Thread Brent Dax

Sean O'Rourke:
# On Tue, 28 May 2002, Jerome Vouillon wrote:
# > That's an interesting point, actually.  What is the right 
# thing to do 
# > when we run out of memory?
# > - Abort immediately.
# >   This is not very user-friendly.
# > - Return a special value.
# >   But then we need to check the return value of almost all functions
# >   (including string_compare, for instance).
# 
# I personally dislike this approach, as it requires a large 
# amount of programming discipline from everyone who works on 
# the project.  The current code indicates that if we took this 
# approach, we would spend quite a bit of time squashing bugs 
# from not checking return values in wierd places.  It probably 
# also hurts common-case performance to litter your code with 
# redundant null-checks as well, but I don't have any data.
# 
# > - Instead of returning a special value, we can set a global variable
# >   to signal the error.
# >   Again, we need to check this variable everywhere.
# 
# This by itself seems worse than the above, since it makes 
# problems even easier to ignore.

OTOH, if all C functions start with a boilerplate like:

if(interpreter->flags & PARROT_exception_FLAG) {
return NULL;
}

then this allows us to easily do things like:

string_concat(interpreter, string_concat(interpreter, a, b), c);

as long as we check for the exception immediately after.  In fact, that
even lets you delay checking for exceptions so you can centralize
things--see below.

# >   Note that this is the solution adopted by Java.
# 
# Last time I wrote in Java, errors were entirely 
# exception-based.  Have things changed because of the locking 
# issues you mention below?

I assume he means within the JVM's internals.

# > - Raise an exception using longjmp.
# >   But then, if we start using locks all over the place like in Java,
# >   we are pretty sure to leave the program in an inconsistent state.
# 
# We're currently lock-free, which makes this sound like a good option.

I think we just have to say "if you put a lock on something, make sure
to set up an exception handler to unlock it and then rethrow the
exception".

I also think that any way you do it we'll have to wrap it in macros--at
least for embedders and extenders.  If I ever find a few months of pure
boredom, I'd like to try to reimplement Parrot in C++, just to see how
much easier it would make things like PMCs; if we have appropriately set
up macros, we can hide from the user the actual implementation of
exceptions:

#define PARROT_TRY   /* nothing */
#define PARROT_CATCH if(interpreter->flags &
PARROT_exception_FLAG)

...

PARROT_TRY {
d=string_concat(interpreter,
string_concat(interpreter, a, b), c);
...
}
PARROT_CATCH {
(code)
}

Just change the macros to:

#define PARROT_TRY try
#define PARROT_CATCH catch(void* PARROT_CATCH_this_is_unused)

for C++, or:

#define PARROT_TRY if(!setjmp())
#define PARROT_CATCH else

for setjmp/longjmp.  (Yes, I know that's not the way setjmp is really
used, but you get the idea.)  For short stretches of code, I'd imagine
that

PARROT_TRY   d=string_concat(interpreter,
string_concat(interpreter, a, b), c)
PARROT_CATCH Parrot_fputs(interpreter, Parrot_stderr, "We're
screwed!");

(assuming that C++'s try and catch can take statements instead of
blocks, anyway--although even if they don't, it's just four more
characters.)

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

blink:  Text blinks (alternates between visible and invisible).
Conforming user agents are not required to support this value.
--The W3C CSS-2 Specification




Re: Parrot and Mono / .Net

2002-05-28 Thread Peter Cooper

> > >   I know the technical reason for a new VM, but this could've been a
new
> > >   VM for Perl 6 only. What I'd like to know is the motivation to open
up
> > >   the architecture and allow for plugable parser, compilers, bytecode
> > >   generators / optimizers, ...
> >
> > 
>
> I believe that the idea is to make things flexible enough that FURTHER
> changes to Perl, beyond Perl 6, will be easier too.

And, even better, (and this is stated on the Parrot site) this allows Perl 6
to be written _in Perl_. Some people I've spoken to don't see the
signficance of this, but it'll be an amazing feat.. no longer will anyone be
able to talk Perl down ;-)

BTW, this is my first post here, so hi!

Cheers,
Peter Cooper




Re: Perl6 currying

2002-05-28 Thread Glenn Linderman

Larry Wall wrote:
> 
> If we're going to make it a method, however, it's possible that "curry"
> is the wrong popular name, despite its being the correct technical name.
> There's really nothing about the word "curry" that suggest partial
> binding to the casual reader.  Perhaps we really want something like:
> 
> my &half = &div.prebind(y => 2);
> 
> or:
> 
> my &half = &div.rewrite(y => 2);
> 
> or even:
> 
> my &half = &div.assume(y => 2);
> 
> I think I like that last one the best.  Maybe it would read better as
> "assuming".  But that's getting a bit long for Mr Huffman.  Maybe it's
> finally time to reach into our bag of English topicalizers and pull out
> "with":
> 
> my &half = &div.with(y => 2);

"with" reads very nicely, but we already have a perl6 precedent,
perhaps... how about reusing "when" as the method name for currying? 
This may not curry favor with Damian, but I suggest

  my & half = & div.when(y => 2);

would declare the subroutine "half" to be equal to the subroutine "div"
when the parameter y is given the value 2.  The code and the English
both read very nicely IMHO, and virtually identically, and the English
version seems even more clear to me when using "when" rather than with
"with". 

Further, there is a nice analogy with the usage of the keyword "when" in
a given clause and the usage of the method name "when" to perform
currying: both "when"s specify particular values that control the choice
of code to be executed.  The "given" keyword supplies the 'parameters'
for the "given" statement, and the original function declaration
supplies the parameters for the original function.


> Larry

-- 
Glenn
=
Remember, 84.3% of all statistics are made up on the spot.



Re: Perl6 currying

2002-05-28 Thread Luke Palmer

On Tue, 28 May 2002, Glenn Linderman wrote:

> "with" reads very nicely, but we already have a perl6 precedent,
> perhaps... how about reusing "when" as the method name for currying? 
> This may not curry favor with Damian, but I suggest
> 
>   my & half = & div.when(y => 2);
> 
> would declare the subroutine "half" to be equal to the subroutine "div"
> when the parameter y is given the value 2.  The code and the English
> both read very nicely IMHO, and virtually identically, and the English
> version seems even more clear to me when using "when" rather than with
> "with". 
> 
> Further, there is a nice analogy with the usage of the keyword "when" in
> a given clause and the usage of the method name "when" to perform
> currying: both "when"s specify particular values that control the choice
> of code to be executed.  The "given" keyword supplies the 'parameters'
> for the "given" statement, and the original function declaration
> supplies the parameters for the original function.

It is precisely that similarity that's going to become confusing. A big 
problem with Perl 5, as I have seen, is that it takes a lot of effort to 
learn. Overloading C will give learners a really hard time. "Wait, 
so when I give a variable after it then a block it does that block... but 
if I give a dot before it it returns a function?" they'd seem to say.

Wait, does this have any meaning?:

  my &half = \div(y => 2)

Is backslash even a valid operator for reference anymore? If so, this 
makes sense to me.

Luke




RE: Perl6 currying

2002-05-28 Thread Brent Dax

Luke Palmer:
# Wait, does this have any meaning?:
# 
#   my &half = \div(y => 2)

Call div() with the named parameter 'y' equal to 2, take a reference to
its return value, and store that in &half.

# Is backslash even a valid operator for reference anymore? If so, this 
# makes sense to me.

I'm sure it's still there.  Otherwise there's no way to take a reference
to a scalar.

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

blink:  Text blinks (alternates between visible and invisible).
Conforming user agents are not required to support this value.
--The W3C CSS-2 Specification




Re: [COMMIT] Added preprocessor layer to newasm.

2002-05-28 Thread Jeff

[EMAIL PROTECTED] wrote:
> 
> On Tue, 28 May 2002 01:19:25 -0400 Jeff <[EMAIL PROTECTED]> wrote:
> 
> >newasm now handles constants, macros, and local >labels within. Here's a
> 
> Great work!

Thanks.

> >expansion. Also, they don't expand >recursively. '.constant FOO
> >"blah"n.constant BAR "Hey, .FOO"' won't do what >you want, sadly.
> 
> Thats exactly what I want. I don't think
> the assembler should do any sort of interpolation
> with string constants at all.

I wasn't crazy about recursive expansion either. Glad I don't have to
work on it now :)

> >that's what I'll work on next.
> 
> When Simon first committed it, I tested
> newasm and noticed 2-3x speedup on assembly
> speed. Is this still the case?

I haven't been tracking assembly speed at all. Keep in mind that a perl
assembler is only a temporary measure, and it'll be rewritten in C
eventually. It's only written in Perl so that we can change features
rapidly. If it were in C (as it will be, once changes settle down), the
assembler would likely crystallize and problems would get fixed much
slower, if at all.

Also, if you want more speed, then separate the Macro and Assembler
classes out of the main file, and pass your code directly to the Macro
object instead of writing tests to a file. After the XS object is
rewritten in Perl, I'll rewrite the tests to use the newasm syntax, and
newasm will become the new standard.

> I've been running some tests with executing
> Cola on the fly, and the assembly phase (slow)
> is the big bug in the soup right now, however
> I've been using the old assembler.

Well, again, once this is all redone in C that should go away. I might
even work on that next, if everyone is comfortable with the current
format.

> PS: Thanks (Simon and Jeff) for a lot of hard
> work on newasm.

Thanks on both of our behalves. (Is that a word?) I don't want to cut it
over until the XS stuff is redone to my satisfaction, which means that
it's rewritten in perl. Building the XS extension there is causing no
end of confusion in the build process.

> -Melvin
--
Jeff <[EMAIL PROTECTED]>



RE: GC, exceptions, and stuff

2002-05-28 Thread Hong Zhang

> Okay, i've thought things over a bit. Here's what we're going to do 
> to deal with infant mortality, exceptions, and suchlike things.
> 
> Important given: We can *not* use setjmp/longjmp. Period. Not an 
> option--not safe with threads. At this point, having considered the 
> alternatives, I wish it were otherwise but it's not. Too bad for us.

I think this statement is not very accurate. The real problem is
setjmp/longjmp does not work well inside signal handler.

The thread-package-compatible setjmp/longjmp can be easily implemented 
using assembly code. It does not require access to any private data 
structures. Note that Microsoft Windows "Structured Exception Handler" 
works well under thread and signal. The assembly code of __try will 
show you how to do it.

However, signal-compatible will be very difficult. It requries access
to ucontext, and most of thread package can not provide 100% correct
ucontext for signal. (The thread package may have the right info, but
the ucontext parameter may not have the info.)

My basic suggestion is if we need convenient and fast C-based exception
handling, we can write our own setjmp/longjmp in assembly code. The 
functionality will be exported as magic macros. Such as

TRY {
  ...
} CATCH (EBADF) {
  ...
} CATCH (ENOMEM) {
  ...
} END;

Hong



RE: GC, exceptions, and stuff

2002-05-28 Thread Dan Sugalski

At 5:47 PM -0700 5/28/02, Hong Zhang wrote:
>  > Okay, i've thought things over a bit. Here's what we're going to do
>>  to deal with infant mortality, exceptions, and suchlike things.
>>
>>  Important given: We can *not* use setjmp/longjmp. Period. Not an
>>  option--not safe with threads. At this point, having considered the
>>  alternatives, I wish it were otherwise but it's not. Too bad for us.
>
>I think this statement is not very accurate. The real problem is
>setjmp/longjmp does not work well inside signal handler.
>
>The thread-package-compatible setjmp/longjmp can be easily implemented
>using assembly code. It does not require access to any private data
>structures. Note that Microsoft Windows "Structured Exception Handler"
>works well under thread and signal. The assembly code of __try will
>show you how to do it.

Yup, and we can use platform-specific exception handling mechanisms 
as well, if there are any. Except...

>However, signal-compatible will be very difficult. It requries access
>to ucontext, and most of thread package can not provide 100% correct
>ucontext for signal. (The thread package may have the right info, but
>the ucontext parameter may not have the info.)

You hit this. And we can't universally guarantee that it'll work, either.

>My basic suggestion is if we need convenient and fast C-based exception
>handling, we can write our own setjmp/longjmp in assembly code. The
>functionality will be exported as magic macros. Such as

If we're going to do this, and believe me I dearly want to, we're 
going to be yanking ourselves out a bunch of levels. We'll be setting 
the setjmp in runops.c just outside the interpreter loop, and yank 
ourselves way the heck out. It's that multi-level cross-file jumping 
that I really worry about.
-- 
 Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



[netlabs #632] dispatching SIGPIPE, SIGCHLD to proper threa

2002-05-28 Thread via RT

# New Ticket Created by  Rocco Caputo 
# Please include the string:  [netlabs #632]
# in the subject line of all future correspondence about this issue. 
# http://bugs6.perl.org/rt2/Ticket/Display.html?id=632 >


21:37  We'll need to interface with signals somehow. Getting them
  properly dispatched will be tough given how little info they
  carry.
21:38  You can fake sigpipe by catching errno in I/O ops.
21:39  Yep. Except in those cases where sigpipe's considered
  async and gets delivered process wide.
21:40  Well, OK, in those cases too, but you get stray SIGPIPEs
  for fun.
21:40  CHLD is not so hard if fork() knows which child PID
  was spawned in which thread.
21:40  async sigpipe seems a little bogus
21:41  Tell that to the Tru64 folks. I trust they had a reason.
21:41  Oh, right, child tracking. Good idea. Could someone send
  that to bugs6 so it doesn't get forgotten?
21:42  alarm() is going to be fun, though going full async for IO
  will reduce the number of blocking system calls.

-- Rocco Caputo / [EMAIL PROTECTED] / poe.perl.org / poe.sf.net




RE: [COMMIT] Added preprocessor layer to newasm.

2002-05-28 Thread Brent Dax

Jeff:
# I haven't been tracking assembly speed at all. Keep in mind 
# that a perl assembler is only a temporary measure, and it'll 
# be rewritten in C eventually. It's only written in Perl so 

C or PASM (or Perl 6)?  The latter might be better.

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

blink:  Text blinks (alternates between visible and invisible).
Conforming user agents are not required to support this value.
--The W3C CSS-2 Specification





RE: GC, exceptions, and stuff

2002-05-28 Thread Hong Zhang

> >The thread-package-compatible setjmp/longjmp can be easily implemented
> >using assembly code. It does not require access to any private data
> >structures. Note that Microsoft Windows "Structured Exception Handler"
> >works well under thread and signal. The assembly code of __try will
> >show you how to do it.
> 
> Yup, and we can use platform-specific exception handling mechanisms 
> as well, if there are any. Except...

The stack unwinding is very basic, that is why we have setjmp/longjmp.
Even it is CPU specific, it requires only very small piece of asm code,
much less than JIT. BTW, JIT needs similar kind of functionalities,
otherwise JIT will not be able to handle exceptions very fast. It will
be very awrkward to check for every null pointer and every function
return.

> >However, signal-compatible will be very difficult. It requries access
> >to ucontext, and most of thread package can not provide 100% correct
> >ucontext for signal. (The thread package may have the right info, but
> >the ucontext parameter may not have the info.)
> 
> You hit this. And we can't universally guarantee that it'll work, either.

Parrot has to handle signals, such as SIGSEGV. I believe we have to solve
this problem, no matter whether use sigjmp/longjmp as general exception
handling. In general, most of libc functions do not work well inside signal 
handler.

> >My basic suggestion is if we need convenient and fast C-based exception
> >handling, we can write our own setjmp/longjmp in assembly code. The
> >functionality will be exported as magic macros. Such as
> 
> If we're going to do this, and believe me I dearly want to, we're 
> going to be yanking ourselves out a bunch of levels. We'll be setting 
> the setjmp in runops.c just outside the interpreter loop, and yank 
> ourselves way the heck out. It's that multi-level cross-file jumping 
> that I really worry about.

The multi-level jump should not be a problem inside parrot code itself.
The GC disapline should have handled the problem already.

1) If the parrot code allocate any thing that can not be handle by GC,
it must setup exception handler to release it, see sample.

  void * mem = NULL;
  TRY {
mem = malloc(sizeof(foo));
  } FINALLY {
free(mem);
  } END;

2) If the parrot code allocate any thing that are finalizable, there is
no need to release them. When the object is not referenced, the next gc
will finalize it. We can still use TRY block to enfore cleanup in timely
fashion.

However, we can not use setjmp/longjmp (even parrot-specific version)
to unwind non-parrot frames. If an third party C application calls 
Parrot_xxx, the Parrot_xxx should catch any exception and translate
it into error code and returns it.

Implement parrot-specific version setjmp/longjmp will be trivial compare
to the complexity of JIT and GC. When we solved the JIT, GC, threading,
and signal handling, the problems with setjmp/longjmp should have been
solved by then. But if we only want a simple interpreter solution, there
is no need to take on this additional complexity.

Hong



Re: [COMMIT] Added preprocessor layer to newasm.

2002-05-28 Thread Jeff

Brent Dax wrote:
> 
> Jeff:
> # I haven't been tracking assembly speed at all. Keep in mind
> # that a perl assembler is only a temporary measure, and it'll
> # be rewritten in C eventually. It's only written in Perl so
> 
> C or PASM (or Perl 6)?  The latter might be better.

PASM is tempting, if only for the bootstrap potential...

In other news, I've rewritten the test suite, and with the most recent
checkin (adds one PMC type I missed and stops processing when it
encounters a flying argument), it passes everything but one Regex test.
The diffs are rather large, and will be attached to this file.

If there are no objections, I might simply spruce up the documentation,
add a '.include' statement, and change assemble.pl over to this style,
after going through the rest of the languages/ directory. Of course, the
people that are responsible for their language could do it themselves,
letting me get back to converting the assembler to .pasm or maybe just
..c or .perl...
--
Jeff <[EMAIL PROTECTED]>

diff -ru parrot_foo/t/op/ifunless.t parrot/t/op/ifunless.t
--- parrot_foo/t/op/ifunless.t  Tue May 28 21:34:07 2002
+++ parrot/t/op/ifunless.t  Fri May 17 23:52:34 2002
@@ -7,18 +7,21 @@
set I1, -2147483648
set I2, 0
 
+#  if_i_ic I0, ONE
if  I0, ONE
 branch  ERROR
print   "bad\\n"
 
 ONE:
print   "ok 1\\n"
+#  if_i_ic I1, TWO
if  I1, TWO
 branch ERROR
print   "bad\\n"
 
 TWO:
print   "ok 2\\n"
+#  if_i_ic I2, ERROR
if  I2, ERROR
 branch  THREE
print   "bad\\n"
@@ -100,12 +103,14 @@
set I0, 0
set I1, -2147483648
 
+#  unless_i_ic I0, ONE
unless  I0, ONE
 branch  ERROR
print   "bad\\n"
 
 ONE:
print   "ok 1\\n"
+#  unless_i_ic I1, ERROR
unless  I1, ERROR
 branch TWO
print   "bad\\n"
diff -ru parrot_foo/t/op/integer.t parrot/t/op/integer.t
--- parrot_foo/t/op/integer.t   Tue May 28 21:36:53 2002
+++ parrot/t/op/integer.t   Fri May 17 23:52:34 2002
@@ -340,6 +340,7 @@
 output_is(<<>
 OUTPUT
 
 output_is(gentest('b', <<'CODE'), <<'OUTPUT', 'A is not B');
-   rx_literal P0, "a", ADVANCE
+   rx_literal P0, "a", $advance
 CODE
 no match
 OUTPUT
 
 output_is(gentest('a', <<'CODE'), <<'OUTPUT', 'Pattern longer than string');
-   rx_literal P0, "aa", ADVANCE
+   rx_literal P0, "aa", $advance
 CODE
 no match
 OUTPUT
 
 output_is(gentest('ba', <<'CODE'), <<'OUTPUT', 'inching through the string');
-   rx_literal P0, "a", ADVANCE
+   rx_literal P0, "a", $advance
 CODE
 <>
 OUTPUT
 
 output_is(gentest('a', <<'CODE'), <<'OUTPUT', 'character classes (successful)');
-   rx_oneof P0, "aeiou", ADVANCE
+   rx_oneof P0, "aeiou", $advance
 CODE
 <><>
 OUTPUT
 
 output_is(gentest('b', <<'CODE'), <<'OUTPUT', 'character classes (failure)');
-   rx_oneof P0, "aeiou", ADVANCE
+   rx_oneof P0, "aeiou", $advance
 CODE
 no match
 OUTPUT
 
 output_is(gentest('a', <<'CODE'), <<'OUTPUT', 'dot (success)');
-   rx_dot P0, ADVANCE
+   rx_dot P0, $advance
 CODE
 <><>
 OUTPUT
 
 output_is(gentest('\n', <<'CODE'), <<'OUTPUT', 'dot (failure)');
-   rx_dot P0, ADVANCE
+   rx_dot P0, $advance
 CODE
 no match
 OUTPUT
 
 output_is(gentest('aA9_', <<'CODE'), <<'OUTPUT', '\w (success)');
-   rx_is_w P0, ADVANCE
-   rx_is_w P0, ADVANCE
-   rx_is_w P0, ADVANCE
-   rx_is_w P0, ADVANCE
+   rx_is_w P0, $advance
+   rx_is_w P0, $advance
+   rx_is_w P0, $advance
+   rx_is_w P0, $advance
 CODE
 <><>
 OUTPUT
 
 output_is(gentest('?', <<'CODE'), <<'OUTPUT', '\w (failure)');
-   rx_is_w P0, ADVANCE
+   rx_is_w P0, $advance
 CODE
 no match
 OUTPUT
 
 output_is(gentest('0123456789', <<'CODE'), <<'OUTPUT', '\d (success)');
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
+   rx_is_d P0, $advance
 CODE
 <><0123456789><>
 OUTPUT
 
 output_is(gentest('@?#', <<'CODE'), <<'OUTPUT', '\d (failure)');
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
-   rx_is_d P0, ADVANCE
+

Re: ICU and Parrot

2002-05-28 Thread Melvin Smith

At 02:42 PM 5/28/2002 -0700, George Rhoten wrote:
>Hello all,
>
>Hopefully I won't get too burned by flames by jumping into the middle of
>the conversation like this.

Fortunately this list is very low on flammable material. :)

Thanks for the helpful info. One of the concerns with using
an external library is it should be in ANSI C if we were to
include it with Parrot. I think there are so many unanswered and unfinished
issues other than Unicode that noone has had to time to really
define what using ICU would spell, but it appears that including ICU
would spell allowing C++ elsewhere in the project also.

I'm not saying C++ is bad, it would make life easier in some areas,
but I think its a Pandora's box that noone wants to open, ...yet.

We also don't wish to inherit ICU development if ICU becomes unsupported.
However, I agree with your point about collaboration. I rather like the idea
of being able to mail the ICU guys a bug report as I go back to hacking
my Parrot mess.

I'm curious, is it possible to carve out an "ICU-lite" in ANSI-C89?

-Melvin




Re: Perl6 currying

2002-05-28 Thread Larry Wall

We've pretty much settled on &div.prebind(y => 2) as the most informative and
least conflictive.

Larry



GC Benchmarking Tests

2002-05-28 Thread Mike Lambert

Hey all,

After finding out that life.pasm only does maybe 1KB per collection, and
Sean reminding me that there's more to GC than life, I decided to create
some pasm files testing specific behaviors.

Attached is what I've been using to test and compare running times for
different GC systems. It's given a list of builds of parrot, a list of
tests to run, and runs each four times and takes the sum of them as the
value for that test. Then it prints out a simple table for comparing the
results. It's not really robust or easily workable in a CVS checkout
(since it operates on multiple parrot checkouts).

Included are five tests of certain memory behaviors. They are:

gc_alloc_new.pbc
allocates more and more memory
checks collection speed, and the ability to grow the heap

gc_alloc_reuse.pbc
allocates more memory, but discards the old
checks collection speed, and the ability to reclaim the heap

gc_header_new.pbc
allocates more and more headers
checks DOD speed, and the ability to allocate new headers

gc_header_reuse.pbc
allocates more headers, but discards the old
checks DOD speed, and the ability to pick up old headers

gc_waves_headers.pbc
total headers (contain no data) allocated is wave-like
no data, so collection is not tested
tests ability to handle wavelike header usage pattersn

gc_waves_sizeable_data.pbc
buffer data (pointed to by some headers) is wave-like
a few headers, so some DOD is tested
mainly tests ability to handle wavelike buffer usage patterns

gc_waves_sizeable_headers.pbc
total headers (and some memory) allocated is wave-like
sort of a combination of the previous two
each header points to some data, so it tests the collectors
  ability to handle changing header and small-sized memory usage

gc_generations.pbc
me trying to simulate behavior which should perform exceptionally
  well under a genertaional collector, even though we don't have one :)
each memory allocation lasts either
  a long time, a medium time, or a short time


Please let me know if there are any other specific behaviors which could
use benchmarking to help compare every aspect of our GCs? Real-world
programs are too hard to come by. :) Results of the above test suite on my
machine comparing my local GC work and the current parrot GC are coming
soon...

Enjoy!
Mike Lambert

PS: If you get bouncing emails from me because my email server is down, I
apologize, and I do know about it. My email server is behind cox's
firewall which prevents port 25 access. It should be relocated and online
again in a few days.



gc_bench.zip
Description: gc_bench.zip


Re: Perl6 currying

2002-05-28 Thread Damian Conway

Larry Wall wrote:
 
> We've pretty much settled on &div.prebind(y => 2) as the most informative and
> least conflictive.

and I'll demonstrate it in my next Conway Channel diary entry later today.

Damian



[netlabs #634] GC Bench: Linked-list for free header list

2002-05-28 Thread via RT

# New Ticket Created by  Mike Lambert 
# Please include the string:  [netlabs #634]
# in the subject line of all future correspondence about this issue. 
# http://bugs6.perl.org/rt2/Ticket/Display.html?id=634 >


Peter recently submitted a patch to RT that uses a linked-list for free
headers. Here are before and after results:

before  after
gc_alloc_new4.1559994.016
gc_alloc_reuse  16.574  12.648002
gc_generations  4.025   3.975001
gc_header_new   3.686   3.986
gc_header_reuse 5.5779994.175998
gc_waves_headers3.8150023.595999
gc_waves_sizeable_data  8.3830028.381999
gc_waves_sizeable_hdrs  5.668   5.396999

We win on the header-intensive stuff. Not sure why it would be slower on
the gc_header_new tests. My best guess is that we know are touching the
contents of the buffer header, which we weren't doing before. And when we
allocate a bunch of new headers, we have to explcitly free them all, which
involves touching the first pointer of every buffer in that memory, as
opposed to one pointer in the Parrot_allocated memory we used before.

IMO, the gc_alloc_reuse and gc_header_reuse benchmarks more than
outweigh gc_header_new.

The portion of Peter's patch to do just this change is included below.

Mike Lambert

Index: resources.c
===
RCS file: /cvs/public/parrot/resources.c,v
retrieving revision 1.60
diff -u -r1.60 resources.c
--- resources.c 26 May 2002 20:20:08 -  1.60
+++ resources.c 29 May 2002 07:08:26 -
@@ -41,28 +41,15 @@

 /* Create a new tracked resource pool */
 static struct Resource_Pool *
-new_resource_pool(struct Parrot_Interp *interpreter, size_t free_pool_size,
+new_resource_pool(struct Parrot_Interp *interpreter,
   size_t unit_size, size_t units_per_alloc,
  void (*replenish)(struct Parrot_Interp *, struct
Resource_Pool *),
   struct Memory_Pool *mem_pool)
 {
 struct Resource_Pool *pool;
-size_t temp_len;

 pool = mem_sys_allocate(sizeof(struct Resource_Pool));
-temp_len = free_pool_size * sizeof(void *);
-if (interpreter->arena_base->buffer_header_pool) {
-pool->free_pool_buffer = new_buffer_header(interpreter);
-}
-else {
-pool->free_pool_buffer = mem_sys_allocate(sizeof(Buffer));
-}
-pool->free_pool_buffer->bufstart =
-mem_allocate(interpreter, &temp_len,
- interpreter->arena_base->memory_pool);
-pool->free_pool_buffer->buflen = temp_len;
-pool->free_pool_buffer->flags = BUFFER_immune_FLAG;
-pool->free_pool_size = temp_len / sizeof(void *);
+pool->free_list = NULL;
 pool->free_entries = 0;
 pool->unit_size = unit_size;
 pool->units_per_alloc = units_per_alloc;
@@ -72,28 +59,6 @@
 return pool;
 }

-/* Expand free pool to accomdate at least n additional entries
- * Currently, the minimum expansion is 20% of the current size
-*/
-static void
-expand_free_pool(struct Parrot_Interp *interpreter,
- struct Resource_Pool *pool, size_t n)
-{
-size_t growth;
-
-if (pool->free_pool_size - pool->free_entries < n) {
-growth = (n - (pool->free_pool_size - pool->free_entries)) *
- sizeof(void *);
-if (growth < pool->free_pool_buffer->buflen / 5) {
-growth = pool->free_pool_buffer->buflen / 5;
-}
-Parrot_reallocate(interpreter, pool->free_pool_buffer,
-  pool->free_pool_buffer->buflen + growth);
-pool->free_pool_size += (growth / sizeof(void *));
-}
-}
-
-
 /* Add entry to free pool
  * Requires that any object-specific processing (eg flag setting, statistics)
  * has already been done by the caller
@@ -102,20 +67,8 @@
 add_to_free_pool(struct Parrot_Interp *interpreter,
  struct Resource_Pool *pool, void *to_add)
 {
-void **temp_ptr;
-
-if (pool->free_pool_size == pool->free_entries) {
-expand_free_pool(interpreter, pool, 1);
-}
-
-#ifdef GC_DEBUG
-Parrot_go_collect(interpreter);
-#endif
-
-/* Okay, so there's space. Add the header on */
-temp_ptr = pool->free_pool_buffer->bufstart;
-temp_ptr += pool->free_entries;
-*temp_ptr = to_add;
+*(void **)to_add = pool->free_list;
+pool->free_list = to_add;
 pool->free_entries++;
 }

@@ -127,7 +80,7 @@
 get_from_free_pool(struct Parrot_Interp *interpreter,
struct Resource_Pool *pool)
 {
-void ** ptr;
+void *ptr;

 if (!pool->free_entries) {
 Parrot_do_dod_run(interpreter);
@@ -140,9 +93,10 @@
 return NULL;
 }

-ptr = pool->free_pool_buffer->bufstart;
-ptr += --pool->free_entries;
-return *ptr;
+ptr = pool->free_list;
+pool->free_list = *(void **)ptr;
+pool->free_entries--;
+return ptr;
 }

 /* We have no more headers on the f