Re: [A-Z]+\s*\{

2002-01-21 Thread Ariel Scolnicov

Larry Wall <[EMAIL PROTECTED]> writes:

> [EMAIL PROTECTED] writes:

[...]

> : How does one enforce the no side-effects rule, and how deeply does it
> : traverse?  
> 
> Dunno.  Could warn or fail on assignment to any non-lexical or
> non-local lexical as a start.  Maybe we could warn or fail on method
> calls known to modify an object.  But I don't know how much
> technological enforcement we can achieve without an unnecessarily
> fascist declaration policy.  I expect the most important thing would be
> to encourage people to think that way.

You cannot usefully tell if something has a "side effect".  Well, you
can do stuff like warn if it obviously contains assignments, prints
and maybe gotos, or you can run it and die horribly if it tries to do
one of the above.  But I doubt this would be useful.

Here's an example.  Suppose f is a function "with no side effects".
Then I can write

  POST {
f($x) == $y or die;  # (die is the ultimate side effect, but
 # it's somehow still OK, right?)
  }

Now, since f is a function "with no side effects", I can Memoize it a
la MJD, right?

But I can't do both at once, because then my POST will call the
Memoize'd f, which has what is technically a side effect.

How about

  POST {
%h{$x} == $y or die;
  }

OK, right?  And if %h is really tie'd to a file?

The point is that what counts as a real side effect depends on the
program, not on the language.  (Well, maybe not if you're a functional
programmer).  The definition is that an expression "has side effects"
if it can affect the (internal or external) state of the program.  But
the *useful* definition is that an expression "has side effects" if it
can affect the state of the program in a manner which matters to the
program.

I'd say warn in the documentation that PRE and POST should not contain
side effects, maybe even add a switch to disable their execution, but
not attempt to enforce anything.  It's just not useful, I think.

[...]

> : sub non_method {
> : loop {
> : PRE {
> : print "PRE\n"; # Is printing a side-effect? 
> 
> Gee, maybe warnings and failures are side effects, and should be also
> prohibited.  :-)

print is OK (for a program running on the console STDOUT not usefully
redirected, etc.), because its side effect presumably doesn't matter
to the program.  In a CGI application, print would not be side effect
free.

> Like I said earlier, I think the important thing is to make people
> think about it, not to continually frustrate them.  I'd be inclined to
> let people choose their own level of pain by setting DbC strictness
> thresholds.

Please set my threshold of side effect pain to undef...

-- 
Ariel Scolnicov|http://3w.compugen.co.il/~ariels
Compugen Ltd.  |[EMAIL PROTECTED]
72 Pinhas Rosen St.|Tel: +972-3-7658117  "fast, good, and cheap;
Tel-Aviv 69512, ISRAEL |Fax: +972-3-7658555   pick any two!"




Re: A question

2002-01-21 Thread Piers Cawley

Larry Wall <[EMAIL PROTECTED]> writes:

> Piers Cawley writes:
> : Yeah, that's sort of where I got to as well. But I just wanted to make
> : sure. I confess I'm somewhat wary of the ';' operator, especially
> : where it's 'unguarded' by brackets, and once I start programming in
> : Perl 6 then 
> : 
> : for (@aaa ; @bbb -> $a; $b) { ... }
> : 
> : will be one of my personal style guidelines.
>
> That is likely a syntax error, because the -> is not an operator, but a
> kind of unary keyword like "sub", and it binds the righthand arguments
> to the following block.  You'd have to say:
>
> for (@aaa; @bbb) -> ($a; $b) { ... }

So long as there's *some* way of 'protecting' the ; operator the
details of the syntax are almost irrelevant. And that does make a good
deal more sense.

-- 
Piers

   "It is a truth universally acknowledged that a language in
possession of a rich syntax must be in need of a rewrite."
 -- Jane Austen?




Re: A question

2002-01-21 Thread Piers Cawley

Piers Cawley <[EMAIL PROTECTED]> writes:

> Larry Wall <[EMAIL PROTECTED]> writes:
>
>> Piers Cawley writes:
>> : Yeah, that's sort of where I got to as well. But I just wanted to make
>> : sure. I confess I'm somewhat wary of the ';' operator, especially
>> : where it's 'unguarded' by brackets, and once I start programming in
>> : Perl 6 then 
>> : 
>> : for (@aaa ; @bbb -> $a; $b) { ... }
>> : 
>> : will be one of my personal style guidelines.
>>
>> That is likely a syntax error, because the -> is not an operator, but a
>> kind of unary keyword like "sub", and it binds the righthand arguments
>> to the following block.  You'd have to say:
>>
>> for (@aaa; @bbb) -> ($a; $b) { ... }
>
> So long as there's *some* way of 'protecting' the ; operator the
> details of the syntax are almost irrelevant. And that does make a good
> deal more sense.

 than my original suggestion.

-- 
Piers

   "It is a truth universally acknowledged that a language in
possession of a rich syntax must be in need of a rewrite."
 -- Jane Austen?




[PATCH] MANIFEST update

2002-01-21 Thread Simon Glover


 Please, people, if you create new files, remember to add them to the
 MANIFEST.

 Simon

--- MANIFEST.oldMon Jan 21 12:17:34 2002
+++ MANIFESTMon Jan 21 12:18:47 2002
@@ -75,6 +75,7 @@
 examples/assembly/call.pasm
 examples/assembly/euclid.pasm
 examples/assembly/fact.pasm
+examples/assembly/io1.pasm
 examples/assembly/jump.pasm
 examples/assembly/life.pasm
 examples/assembly/local_label.pasm
@@ -119,6 +120,7 @@
 include/parrot/resources.h
 include/parrot/runops_cores.h
 include/parrot/rx.h
+include/parrot/rxstacks.h
 include/parrot/stacks.h
 include/parrot/string.h
 include/parrot/trace.h
@@ -202,6 +204,7 @@
 runops_cores.c
 rx.c
 rx.ops
+rxstacks.c
 stacks.c
 string.c
 t/harness
 




Re: [A-Z]+\s*\{

2002-01-21 Thread Damian Conway

Larry wrote:

> : One way to do that would be to define POST and NEXT to return their
> : own (single, closure) argument. So then you could write:
> :
> :   NEXT POST { ... }
> 
> As long as everyone realizes that that return happens at compile 
> time...

Sorry, yes, I should have been explicit about that.


Damian



Re: [PATCH] MANIFEST update

2002-01-21 Thread Melvin Smith

At 12:21 PM 1/21/2002 +, Simon Glover wrote:

>  Please, people, if you create new files, remember to add them to the
>  MANIFEST.
>
>  Simon
>
>--- MANIFEST.oldMon Jan 21 12:17:34 2002
>+++ MANIFESTMon Jan 21 12:18:47 2002
>@@ -75,6 +75,7 @@
>  examples/assembly/call.pasm
>  examples/assembly/euclid.pasm
>  examples/assembly/fact.pasm
>+examples/assembly/io1.pasm

Don't ask me how it didn't get committed, its in my copy.

-Melvin






[Possible PATCH] IO ops docs

2002-01-21 Thread Simon Glover


 While you're online: now that you've split the io ops into their
 own separate file, their documentation isn't going to core_ops.pod 
 any more. The enclosed patch fixes this by autogenerating io_ops.pod
 in the same fashion that core_ops.pod is generated, but I'm not sure
 whether this is the right thing to do - do we want every ops lib to have
 separate documentation, or should we just keep all of the documentation
 in one place, in a single file?

 Simon

--- Makefile.oldMon Jan 21 12:34:36 2002
+++ MakefileMon Jan 21 12:35:05 2002
@@ -1,7 +1,7 @@
 PERL = perl
 RM_F = rm -f
 
-all: packfile-c.pod packfile-perl.pod core_ops.pod
+all: packfile-c.pod packfile-perl.pod core_ops.pod io_ops.pod
 
 packfile-c.pod: ../packfile.c
perldoc -u ../packfile.c > packfile-c.pod
@@ -11,6 +11,9 @@
 
 core_ops.pod: ../core.ops
perldoc -u ../core.ops > core_ops.pod
+
+io_ops.pod: ../io.ops
+   perldoc -u ../io.ops > io_ops.pod
 
 clean:
$(RM_F) packfile-c.pod packfile-perl.pod
 




Re: [Possible PATCH] IO ops docs

2002-01-21 Thread Simon Glover


 If you decide to apply the last patch, you should probably apply this
 one as well, so that people know about the new file. If not, then junk
 'em both.

 Simon

--- parrot.pod.old  Mon Jan 21 12:56:15 2002
+++ parrot.pod  Mon Jan 21 12:57:11 2002
@@ -31,6 +31,10 @@
 
 A description of the core operations in the Parrot assembly language.
 
+=item F
+
+A description of the operations used in Parrot's IO subsystem.
+
 =item F
 
 The master list of Parrot assembly operations; not all of these have




Re: Apoc4: Parentheses

2002-01-21 Thread Bryan C. Warnock

On Sunday 20 January 2002 21:00, Damian Conway wrote:
> Bryan C. Warnock asked:
> > Since the parentheses are no longer required, will the expressions
> > lose or retain their own scope level?  (I'm assuming that whatever
> > rule applies, it will hold true if you do elect to use parantheses
> > anyway.)
>
> Err. Expressions don't have their own scope level, even in Perl 5.

They do in block conditional expressions.  Try this:

#!/your/path/to/perl -w
my $x = 4;

if (my $x = 5) {
print "$x\n";
my $x = 6;
print "$x\n";
} elsif (my $x = 7) {
print "$x\n";
my $x = 6;
print "$x\n";
} else {
print "$x\n";
my $x = 6;
print "$x\n";
}

print "$x\n"; 

"my" variable $x masks earlier declaration in same scope at Perl/demo.pl 
line 9.   # the elsif masking the if
Found = in conditional, should be == at Perl/demo.pl line 9.
Found = in conditional, should be == at Perl/demo.pl line 5.
5
6
4

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: [Possible PATCH] IO ops docs

2002-01-21 Thread Melvin Smith

At 12:54 PM 1/21/2002 +, Simon Glover wrote:

>  While you're online: now that you've split the io ops into their
>  own separate file, their documentation isn't going to core_ops.pod
>  any more. The enclosed patch fixes this by autogenerating io_ops.pod
>  in the same fashion that core_ops.pod is generated, but I'm not sure
>  whether this is the right thing to do - do we want every ops lib to have
>  separate documentation, or should we just keep all of the documentation
>  in one place, in a single file?

My personal feeling is that this makes sense (seperate pod), since
they are sort of an "API" compared to the core ops. I'll see what rest of the
guys say first, then probably apply it.

As far as IO ops, right now they are implemented as inline ops but eventually
they will be replaced by method calls on the IO object and won't
show up in the core (except maybe some bootstrap print/printerr/readline, 
etc.) ..

At least this is the way I see it, opinions may vary.

-Melvin




[PATCH] harness just the tests you want

2002-01-21 Thread Nicholas Clark

À la perl 5, it can be useful just to run 1 test script under the harness.

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- t/harness.orig  Wed Jan  2 19:19:09 2002
+++ t/harness   Mon Jan 21 11:46:54 2002
@@ -1,7 +1,9 @@
 #! perl -w
+# $Id: $
 
 use strict;
 use Test::Harness qw(runtests);
 
-my @tests = map { glob( "t/$_/*.t" ) } ( qw(op) );
+# Pass in a list of tests to run on the command line, else run all the tests.
+my @tests = @ARGV ? @ARGV : map { glob( "t/$_/*.t" ) } ( qw(op) );
 runtests(@tests);



[PATCH] Parrot::Assembler pod clean-up

2002-01-21 Thread Simon Glover


 Enclosed patch fixes the POD brokenness in Parrot::Assembler reported
 by Steve Fink, and generally makes it more aesthetically pleasing.

 I've also supplied the missing documentation for the 
 constantize_number and constantize_integer functions - could someone
 who knows check that I've explained them correctly?

 Also enclosed is a small patch to running.pod to remove the reference
 to the brokenness.

 Simon

--- running.pod.old Mon Jan 21 15:44:20 2002
+++ running.pod Mon Jan 21 15:46:08 2002
@@ -13,8 +13,9 @@
 
   assemble.pl foo.pasm > foo.pbc
 
-Usage information: no usage message available. There is some amount of
-malformed POD visible by running C.
+Usage information: no usage message available. Documentation for the
+C module, around which C is a wrapper,
+can be viewed by running C.
 
 =item C
 
--- Assembler.pm.oldMon Jan 21 14:05:23 2002
+++ Assembler.pmMon Jan 21 15:40:27 2002
@@ -67,6 +67,7 @@
 output_listing() if $options{'listing'};
 exit 0;
 
+=cut
 
 ###
 ###
@@ -85,6 +86,7 @@
 my $pf = $asm->assemble($code);
 exit $interp->run($pf);
 
+=cut
 
 ###
 ###
@@ -105,8 +107,8 @@
 
 =head2 %type_to_suffix
 
-type_to_suffix is used to change from an argument type to the suffix that
-would be used in the name of the function that contained that argument.
+This is used to change from an argument type to the suffix that would be 
+used in the name of the function that contained that argument.
 
 =cut
 
@@ -120,26 +122,26 @@
 
 =head2 @program
 
-@program will hold an array ref for each line in the program. Each array ref
-will contain:
+This holds an array ref for each line in the program. Each array ref
+contains: 
 
 =over 4
 
 =item 1
 
-The file name in which the source line was found
+The file name in which the source line was found.
 
 =item 2
 
-The line number in the file of the source line
+The line number in the file of the source line.
 
 =item 3
 
-The chomped source line without beginning and ending spaces
+The chomped source line without beginning and ending spaces.
 
 =item 4
 
-The chomped source line
+The chomped source line.
 
 =back
 
@@ -150,25 +152,17 @@
 
 ###
 
-=head2 $output
-=head2 $listing
-=head2 $bytecode
+=head2 $output 
 
-=over 4
+What is output to the bytecode file.
 
-=item $output
-
-will be what is output to the bytecode file.
-
-=item $listing
-
-will be what is output to the listing file.
+=head2 $listing
 
-=item $bytecode
+What is output to the listing file.
 
-is the program's bytecode (executable instructions).
+=head2 $bytecode
 
-=back
+The program's bytecode (executable instructions).
 
 =cut
 
@@ -177,14 +171,10 @@
 
 ###
 
-=head2 $file
-=head2 $line
-=head2 $pline
-=head2 $sline
-
-$file, $line, $pline, and $sline are used to reference information from the
-@program array.  Please look at the comments for @program for the description
-of each.
+=head2 $file, $line, $pline, $sline
+
+These variables are used to reference information from the C<@program> array.  
+Please look at the comments for C<@program> for the description of each.
 
 =cut
 
@@ -194,41 +184,31 @@
 ###
 
 =head2 %label
-=head2 %fixup
-=head2 %macros
-=head2 %local_label
-=head2 %local_fixup
-=head2 $last_label
 
-=over 4
-
-=item %label
-
-will hold each label and the PC at which it was defined.
-
-=item %fixup
+This holds each label and the PC at which it was defined.
 
-will hold labels that have not yet been defined, where they are used in
-the source code, and the PC at that point. It is used for backpatching.
+=head2 %fixup
 
-=item %macros
+This holds labels that have not yet been defined, the position they are 
+used in the source code, and the PC at that point. It is used for 
+backpatching.
 
-will map a macro name to an array of program lines with the same format
-as @program.
+=head2 %macros
 
-=item %local_label
+This maps a macro name to an array of program lines with the same format
+as C<@program>.
 
-will hold local label definitions,
+=head2 %local_label
 
-=item %local_fixup
+This holds local label definitions.
 
-will hold the occurances of local labels in the source file.
+=head2 %local_fixup
 
-=item $last_label
+This holds the occurrences of local labels in the source file.
 
-is the name of the last label seen
+=head2 $last_label
 
-=back
+This the name of the last label seen.
 
 =cut
 
@@ -238,10 +218,12 @@
 ###
 
 =head2 $pc

Re: [A-Z]+\s*\{

2002-01-21 Thread Larry Wall

Larry Wall writes:
: This is only slightly less problematic than
: 
: NEXT $coderef;
: 
: which in turn is only slightly less problematic than
: 
: if $condition $coderef;

Actually, that'd probably have to be:

if $condition, $coderef;

Still not sure if that has any possibility of actually working.  Maybe
depends on how the regex for the C syntax is written, and whether
such syntax can fall back onto ordinary syntactic conventions.
Probably not.  Something tells me that we'd better require the block of
C et al., or we'll have difficulty detecting missing semicolons,
which would try to make an C statement parse as an C modifier.

This has slightly more chance of working:

$condition.if($ifcode, $elsecode)

But really, people will be surprised if you do that.  They'll expect
you to write this instead:

$condition ?? $ifcode() :: $elsecode();

So I'm not terribly interested in going out of my way to make statement
blocks parse exactly like terms in an ordinary expressions.  If it
happens, it'll probably be by accident.

Larry



Re: Apoc4: Parentheses

2002-01-21 Thread Larry Wall

[EMAIL PROTECTED] writes:
: On Sunday 20 January 2002 21:00, Damian Conway wrote:
: > Bryan C. Warnock asked:
: > > Since the parentheses are no longer required, will the expressions
: > > lose or retain their own scope level?  (I'm assuming that whatever
: > > rule applies, it will hold true if you do elect to use parantheses
: > > anyway.)
: >
: > Err. Expressions don't have their own scope level, even in Perl 5.
: 
: They do in block conditional expressions.  Try this:
: 
: #!/your/path/to/perl -w
: my $x = 4;
: 
: if (my $x = 5) {
: print "$x\n";
: my $x = 6;
: print "$x\n";
: } elsif (my $x = 7) {
: print "$x\n";
: my $x = 6;
: print "$x\n";
: } else {
: print "$x\n";
: my $x = 6;
: print "$x\n";
: }
: 
: print "$x\n"; 
: 
: "my" variable $x masks earlier declaration in same scope at Perl/demo.pl 
: line 9.   # the elsif masking the if
: Found = in conditional, should be == at Perl/demo.pl line 9.
: Found = in conditional, should be == at Perl/demo.pl line 5.
: 5
: 6
: 4

Compound statements in Perl 5 do have an implicit {} around the entire
statement, but that has nothing to do with the required parentheses
around the expressions, other than the fact that we're doing away with
both of those special rules in Perl 6.  But parentheses have always
been totally transparent to any C contained within them.  It's
the implicit {} that was protecting the C condition from getting
a warning like the C got (which got the warning because it's
at the same scope level as C's declaration).

Larry



Re: on parrot strings

2002-01-21 Thread Dave Mitchell

Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> There is no string type built out of native eight-bit bytes.

In the good ol'days, one could usefully use regexes on 8-bit binary data,
eg

open G, 'myfile.gif' or die;
read G, $buf, 8192 or die;
if ($buf =~ /^GIF89a\x08\x02/) {
.

where it was clear to everyone that we are checking whether the first few
bytes of the file contain (0x47, 0x49, ..., 0x02)

Is this sort of thing now completely dead in the Brave New World of
Unicode, Locales etc etc? (yes, there's always pack, but pack is so... errr
hmm )

Dave.




Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi

On Mon, Jan 21, 2002 at 04:37:46PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > There is no string type built out of native eight-bit bytes.
> 
> In the good ol'days, one could usefully use regexes on 8-bit binary data,
> eg
> 
> open G, 'myfile.gif' or die;
> read G, $buf, 8192 or die;
> if ($buf =~ /^GIF89a\x08\x02/) {
> .
> 
> where it was clear to everyone that we are checking whether the first few
> bytes of the file contain (0x47, 0x49, ..., 0x02)
> 
> Is this sort of thing now completely dead in the Brave New World of

Of course not, I do not remember forbiddding \xHH.  The default of
data coming in from filehandles could still be opaque 8-bit bytes.

> Unicode, Locales etc etc? (yes, there's always pack, but pack is so... errr
> hmm )

> Dave.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



Re: on parrot strings

2002-01-21 Thread Dave Mitchell

Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > In the good ol'days, one could usefully use regexes on 8-bit binary data,
> > eg
> > 
> > open G, 'myfile.gif' or die;
> > read G, $buf, 8192 or die;
> > if ($buf =~ /^GIF89a\x08\x02/) {
> > .
> > 
> > where it was clear to everyone that we are checking whether the first few
> > bytes of the file contain (0x47, 0x49, ..., 0x02)
> > 
> > Is this sort of thing now completely dead in the Brave New World of
> 
> Of course not, I do not remember forbiddding \xHH.  The default of
> data coming in from filehandles could still be opaque 8-bit bytes.

Good :-)

I'm not clear though, how binary data could get passed to parrot's
regex engine, unless there's a BINARY_8 CEF in addition to
UNICODE_CEF_UTF_8 etc in C

???




[PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover


 Enclosed patch fixes a couple of bugs in the optimizer. The first was 
 that the parser wasn't correctly recognising register names - it needs
 to check for these _before_ checking for labels, or else they're 
 incorrectly identified as labels. Strangely, this wasn't causing
 any problems with the optimized code, at least as far as I could see, 
 but this may be down to luck.

 The other bug is a misplaced ? in the regex checking for integers.
 This makes the match non-greedy, so 1.0 (for example) gets
 split up into 1000 (which matches the regex) and 0.0 (which matches
 as a float the next time around the loop). This means that code
 such as   

 set N1, 1.0

 gets converted to

 set N1, 1000, 0.0

 which quite rightly fails to assemble. Removing the ? appears to make 
 everything work as intended.

 Simon

--- Optimizer.pm.oldFri Dec 14 06:04:27 2001
+++ Optimizer.pmMon Jan 21 17:35:47 2002
@@ -53,16 +53,16 @@
 # Collect arbitrary parameters
 #
 while(/\S/) {
-  if(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
+  if(s/^([INSP]\d+\b)//) { # Register name
+push @{$line->{parameter}},{type=>'register',value=>$1};
+  }
+  elsif(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
 push @{$line->{parameter}},{type=>'label_global',value=>$1};
   }
   elsif(s/^(\$\w+)//) {# local label
 push @{$line->{parameter}},{type=>'label_local',value=>$1};
   }
-  elsif(s/^([INSP]\d+\b)//) {  # Register name
-push @{$line->{parameter}},{type=>'register',value=>$1};
-  }
-  elsif(s/^(-?\d+)(?!\.)//) {  # integer
+  elsif(s/^(-?\d+)(!\.)//) {  # integer
 push @{$line->{parameter}},{type=>'constant_i',value=>$1};
   }
   elsif(s/^(-?\d+\.\d+)//) {   # float
 






Re: Apoc4: Parentheses

2002-01-21 Thread Damian Conway

> > Err. Expressions don't have their own scope level, even in Perl 5.
> 
> They do in block conditional expressions.

But that's a property of being in a block conditional, not of being an expression.

And, yes, it's going away in Perl 6.

Damian




Re: Apoc4: The loop keyword

2002-01-21 Thread Michael G Schwern

On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
> : while( my $line =  ) {
> : ...
> : }
> 
> That still works fine--it's just that $line lives on after the while.

This creeping lexical leakage bothers me.  While it might make the
language simpler, the proliferation of left-over lexicals seems
sloppy.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
"Hey kids!  I'm Beefy the Elf!  Follow me to Meattart Land!"
"It's like chewing on a lemon cow!"
"I like Meattarts 'cause they're meaty!"
"I like Meattarts 'cause... wait.  These suck."
http://www.goats.com/archive/000608.html



Re: on parrot strings

2002-01-21 Thread Jarkko Hietaniemi

On Mon, Jan 21, 2002 at 05:09:06PM +, Dave Mitchell wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> wrote:
> > > In the good ol'days, one could usefully use regexes on 8-bit binary data,
> > > eg
> > > 
> > > open G, 'myfile.gif' or die;
> > > read G, $buf, 8192 or die;
> > > if ($buf =~ /^GIF89a\x08\x02/) {
> > > .
> > > 
> > > where it was clear to everyone that we are checking whether the first few
> > > bytes of the file contain (0x47, 0x49, ..., 0x02)
> > > 
> > > Is this sort of thing now completely dead in the Brave New World of
> > 
> > Of course not, I do not remember forbiddding \xHH.  The default of
> > data coming in from filehandles could still be opaque 8-bit bytes.
> 
> Good :-)
> 
> I'm not clear though, how binary data could get passed to parrot's
> regex engine, unless there's a BINARY_8 CEF in addition to
> UNICODE_CEF_UTF_8 etc in C

Yes, that's somewhat problematic.  Making up "a byte CEF" would be
Wrong, though, because there is, by definition, no CCS to map, and
we would be dangerously close to conflating in CES, too...
ACR-CCS-CEF-CES.  Read the character model.  Understand the character
model.  Embrace the character model.  Be the character model.  (And
once you're it, read the relevant Unicode, XML, and Web standards.)

To highlight the difference between opaque numbers and characters,
the above should really be:

if ($buf =~ /\x47\x49\x46\x38\x39\x61\x08\x02/) { ... }

I think what needs to be done is that \xHH must not be encoded as
literals (as it is now, 'A' and \x41 are identical (in ASCII)), but
instead as regex nodes of their own, storing the code points.  Then
the regex engine can try both the "right/new way" (the Unicode code
point), and the "wrong/legacy way" (the native code point).

String literals have the same problem.  What does "foo\x41" mean?
(Here, unlike with the regular expressions, we can't "try both",
unless we integrate Damian's quantum state variables to the core :-)
We have various options: there might be a pragma to tell what CCS
"naked codepoints" are to be understood in, or the default could be
grovelled out of environment settings (both these options could affect
the regex solution, too), and so forth.

-- 
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen



[maybe PATCH] use Term::ReadLine where possible

2002-01-21 Thread Nicholas Clark

I think that this is a good idea, but there may be arguments against it.
The stub Term::ReadLine has been in perl since pre 5.004, so it's quite safe
to use it. However, to actually get line editing one needs to have installed
either Term::ReadLine::Perl or Term::ReadLine::Gnu. Attached patch makes
Configure.pl use Term::ReadLine to give interactive editing if there's a real
Term::ReadLine present, else Configure.pl continues to use the old way.

I think that this is easier to use than cut and paste or the rem:{} add:{}
syntax that &prompt appears to offer.

Tested with Term::ReadLine::Gnu and Term::ReadLine::Perl
(and I don't know why Term::ReadLine::Perl later decided that it could do
multi-line editing when it initially was doing sideways scrolling)

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- Configure.pl.orig   Sun Jan 20 22:57:28 2002
+++ Configure.plMon Jan 21 17:25:28 2002
@@ -13,10 +13,9 @@
 use Getopt::Long;
 use ExtUtils::Manifest qw(manicheck);
 use File::Copy;
-
+use Term::ReadLine; # The stub is present from earlier than 5.004
 use Parrot::BuildUtil;
 
-
 #
 # Read the array and scalar forms of the version.
 # from the VERSION file.
@@ -287,15 +286,17 @@
 # Ask questions
 #
 
-prompt("What C compiler do you want to use?", 'cc');
-prompt("How about your linker?", 'ld');
-prompt("What flags would you like passed to your C compiler?", 'ccflags');
-prompt("What flags would you like passed to your linker?", 'ldflags');
-prompt("Which libraries would you like your C compiler to include?", 'libs');
-prompt("How big would you like integers to be?", 'iv');
-prompt("And your floats?", 'nv');
-prompt("What is your native opcode type?", 'opcode_t');
+my $term = initialise_term();
 
+prompt($term, "What C compiler do you want to use?", 'cc');
+prompt($term, "How about your linker?", 'ld');
+prompt($term, "What flags would you like passed to your C compiler?",
+   'ccflags');
+prompt($term, "Which libraries would you like your C compiler to include?",
+   'libs');
+prompt($term, "How big would you like integers to be?", 'iv');
+prompt($term, "And your floats?", 'nv');
+prompt($term, "What is your native opcode type?", 'opcode_t');
 
 {
my(@ops)=glob("*.ops");
@@ -326,7 +327,7 @@
 Which opcode files would you like?
 END
 
-   prompt($msg, 'ops');
+   prompt($term, $msg, 'ops');
 }
 
 
@@ -428,7 +429,7 @@
 next unless $opt; # Ignore blank lines
 $c{cc_warn} .= " $opt";
 }
-prompt("What gcc warning flags do you want to use?", 'cc_warn');
+prompt($term, "What gcc warning flags do you want to use?", 'cc_warn');
 }
 
 #
@@ -708,21 +709,29 @@
 sub prompt {
 return if $opt_defaults;
 
-my($message, $field)=(@_);
+my($term, $message, $field)=(@_);
 my($input);
-print "$message [$c{$field}] ";
-chomp($input=);
+if ($term) {
+# Term::ReadLine::Gnu does a multiline edit just like bash.
+# Term::ReadLine::Perl does a sideways scrolling single line like ksh.
+print "$message [$c{$field}]\n";
+$input = $term->readline("", $c{$field});
+$term->addhistory($input) if /\S/ and !$term->Features->{autohistory};
+} else {
+print "$message [$c{$field}] ";
+chomp($input=);
 
-if($input =~ s/^\+//) {
-$input="$c{$field} $input";
-}
-else {
-if($input =~ s/:rem\{(.*?)\}//) {
-$c{$field} =~ s/$_//g for split / /, $1;
+if($input =~ s/^\+//) {
+$input="$c{$field} $input";
 }
+else {
+if($input =~ s/:rem\{(.*?)\}//) {
+$c{$field} =~ s/$_//g for split / /, $1;
+}
 
-if($input =~ s/:add\{(.*?)\}//) {
-$input="$c{$field} $1 $input";
+if($input =~ s/:add\{(.*?)\}//) {
+$input="$c{$field} $1 $input";
+}
 }
 }
 
@@ -816,8 +825,32 @@
 
 exit 1;
 }
-else {
-print <<"END";
+}
+
+#
+# initialise_term()
+#
+
+sub initialise_term {
+my $term = Term::ReadLine->new ('Parrot configuration');
+undef $term if $term && $term->ReadLine eq "Term::ReadLine::Stub";
+
+if ($term) {
+my $type = $term->ReadLine;
+print <<"END";
+Okay, we found everything.  Next you'll need to answer
+a few questions about your system.  You have
+${ type} installed, so I'll use that to let
+you edit your answers interactively. I'll put the
+default in square brackets, and also prime the input
+line with the default. Just hit enter straight away to
+accept the default, or edit it to suit. Like Perl 5's
+Configure you can also chose the default by entering a
+zero length line.
+
+END
+} else {
+print <<"END";
 Okay, we found everything.  Next you'll need to answer
 a few questions about your system.  Defaults are in square
 brackets, and you can hit enter to accept them.  If you
@@ -827,8 +860,8 @@
 
 END
 }
+return $term;
 }
-
 
 #

HP-UX state

2002-01-21 Thread H . Merijn Brand

l1:/pro/3gl/CPAN/parrot-current 114 > perl Configure.pl --default
Parrot Version 0.0.3 Configure
Copyright (C) 2001-2002 Yet Another Society

Since you're running this script, you obviously have
Perl 5--I'll be pulling some defaults from its configuration.

Checking the MANIFEST to make sure you have a complete Parrot kit...

Okay, we found everything.  Next you'll need to answer
a few questions about your system.  Defaults are in square
brackets, and you can hit enter to accept them.  If you
don't want the default, type a new value in.  If that new
value starts with a '+', it will be concatenated to the
default value.


Determining if your C compiler is actually gcc (this could take a while):


Your C compiler is not gcc.


Probing Perl 5's configuration to determine which headers you have (this could
take a while on slow machines)...

Determining C data type sizes by compiling and running a small C program (this
could take a while):

  Building ./test.c   from test_c.in...

Figuring out the formats to pass to pack() for the various Parrot internal
types...
Figuring out what integer type we can mix with pointers...
We'll use 'unsigned int'.

Building a preliminary version of include/parrot/config.h, your Makefiles, and
other files:

  Building include/parrot/config.hfrom config_h.in...
  Building ./Makefile from Makefile.in...
  Building ./classes/Makefile from classes/Makefile.in...
  Building ./docs/Makefilefrom docs/Makefile.in...
  Building ./languages/Makefile   from languages/Makefile.in...
  Building ./languages/jako/Makefile  from languages/jako/Makefile.in...
  Building ./languages/miniperl/Makefile  from languages/miniperl/Makefile.in...
  Building ./languages/scheme/Makefilefrom languages/scheme/Makefile.in...
  Building Parrot/Types.pmfrom Types_pm.in...
  Building Parrot/Config.pm   from Config_pm.in...

Checking some things by compiling and running another small C program (this
could take a while):

  Building ./testparrotsizes.cfrom testparrotsizes_c.in...

Updating include/parrot/config.h:

  Building include/parrot/config.hfrom config_h.in...

Okay, we're done!

You can now use `make' (or your platform's equivalent to `make') to build your
Parrot. After that, you can use `make test' to run the test suite.

Happy Hacking,

The Parrot Team

l1:/pro/3gl/CPAN/parrot-current 115 > make
perl vtable_h.pl
make: *** No rule to make target `include/parrot/rxstacks.h', needed by `test_main.o'. 
 Stop.
Exit 2
l1:/pro/3gl/CPAN/parrot-current 116 > cat .timestamp
1011556802
Sun Jan 20 20:00:02 2002 UTC

(time of this cvs update)
l1:/pro/3gl/CPAN/parrot-current 117 >

-- 
H.Merijn BrandAmsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.7.2 & 631 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
  WinNT 4, Win2K pro & WinCE 2.11.  Smoking perl CORE: [EMAIL PROTECTED]
http:[EMAIL PROTECTED]/   [EMAIL PROTECTED]
send smoke reports to: [EMAIL PROTECTED], QA: http://qa.perl.org




Re: HP-UX state

2002-01-21 Thread Simon Glover



On Mon, 21 Jan 2002, H.Merijn Brand wrote:

> perl vtable_h.pl
> make: *** No rule to make target `include/parrot/rxstacks.h', needed by 
>`test_main.o'.  Stop.

 This exists (and has done for a couple of days) but isn't in the MANIFEST
 at the moment (I've already sent a patch). Could that be causing the
 problem?

 Simon




Re: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover



On Mon, 21 Jan 2002, Simon Glover wrote:

>  The other bug is a misplaced ? in the regex checking for integers.
>  This makes the match non-greedy, so 1.0 (for example) gets
>  split up into 1000 (which matches the regex) and 0.0 (which matches
>  as a float the next time around the loop). This means that code
>  such as   
> 
>  set N1, 1.0
> 
>  gets converted to
> 
>  set N1, 1000, 0.0
> 
>  which quite rightly fails to assemble. Removing the ? appears to make 
>  everything work as intended.

 Forget this, this is garbage - the ? doesn't mean what I thought it
 meant. Correct patch to follow shortly.

 Simon





Re: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover


 Right: the real cause of the second bug is similar to what I thought it
 was - when it sees a float, the regex engine first checks to see if it 
 is an integer by trying the substitution:

s/^(-?\d+)(?!\.)// 
 
 The problem is that when, say, 1.0 gets fed to this, and fails
 to match, the regex engine starts to back up, until it sucessfully
 matches (and erases) 1000, leaving 0.0 to be parsed on the next
 iteration of the loop, and hence producing incorrect output.

 The best way that I've been able to think of to fix this is to swap
 the order of the integer and float comparisons, so that any floats
 get matched before we get to the above; if anyone else can think of a
 better way, or some reason why this won't work, I'd be glad to hear it.

 Correct(?) patch enclosed below - note that it replaces the one sent
 earlier, which should be ignored. 

 Simon

--- Optimizer.pm.oldMon Jan 21 18:12:16 2002
+++ Optimizer.pmMon Jan 21 18:44:05 2002
@@ -53,20 +53,20 @@
 # Collect arbitrary parameters
 #
 while(/\S/) {
-  if(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
+  if(s/^([INSP]\d+\b)//) { # Register name
+push @{$line->{parameter}},{type=>'register',value=>$1};
+  }
+  elsif(s/^([a-zA-Z][a-zA-Z0-9]+)//) {# global label
 push @{$line->{parameter}},{type=>'label_global',value=>$1};
   }
   elsif(s/^(\$\w+)//) {# local label
 push @{$line->{parameter}},{type=>'label_local',value=>$1};
   }
-  elsif(s/^([INSP]\d+\b)//) {  # Register name
-push @{$line->{parameter}},{type=>'register',value=>$1};
+  elsif(s/^(-?\d+\.\d+)//) {   # float
+push @{$line->{parameter}},{type=>'constant_n',value=>$1};
   }
   elsif(s/^(-?\d+)(?!\.)//) {  # integer
 push @{$line->{parameter}},{type=>'constant_i',value=>$1};
-  }
-  elsif(s/^(-?\d+\.\d+)//) {   # float
-push @{$line->{parameter}},{type=>'constant_n',value=>$1};
   }
   elsif(s/^("(?:[^\\"]|(?:\\(?>["tnr\\])))*")// or # single-quoted string
 s/^('(?:[^\\']|(?:\\(?>['tnr\\])))*')//) { # double-quoted string
 
  
 
 





RE: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Brent Dax

Simon Glover:
#  Right: the real cause of the second bug is similar to what I
# thought it
#  was - when it sees a float, the regex engine first checks to
# see if it
#  is an integer by trying the substitution:
#
# s/^(-?\d+)(?!\.)//
#
#  The problem is that when, say, 1.0 gets fed to this, and fails
#  to match, the regex engine starts to back up, until it sucessfully
#  matches (and erases) 1000, leaving 0.0 to be parsed on the next
#  iteration of the loop, and hence producing incorrect output.
#
#  The best way that I've been able to think of to fix this is to swap
#  the order of the integer and float comparisons, so that any floats
#  get matched before we get to the above; if anyone else can think of a
#  better way, or some reason why this won't work, I'd be glad
# to hear it.

If the problem is backtracking, can't you just use the (?>)
no-backtracking syntax?

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker

 . hawt sysadmin chx0rs
 This is sad. I know of *a* hawt sysamin chx0r.
 I know more than a few.
 obra: There are two? Are you sure it's not the same one?




RE: catching warnings

2002-01-21 Thread David Whipp

In light of Apo4, I thought I'd re-ask this question. Is the following still
the approved idiom, or will we have a nice little /[A-Z]+/ thingie:

sub foo
{
  temp $SIG{__WARN__} = sub {
warn "$(timestamp) $@\n"
  }
  warn "hello"
}


Dave.
--
Dave Whipp, Senior Verification Engineer,
Fast-Chip inc., 950 Kifer Rd, Sunnyvale, CA. 94086
tel: 408 523 8071; http://www.fast-chip.com
Opinions my own; statements of fact may be in error.




Night of the Living Lexical (sequel to Apoc4: The loop keyword)

2002-01-21 Thread Melvin Smith

At 12:32 PM 1/21/2002 -0500, Michael G Schwern wrote:
>On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
> > : while( my $line =  ) {
> > : ...
> > : }
> >
> > That still works fine--it's just that $line lives on after the while.
>
>This creeping lexical leakage bothers me.  While it might make the

"lives on", ... "creeping lexical", I feel the same way, we must find some
way to kill these... :)

-Melvin




RE: [PATCH] Parrot::Optimizer bugs

2002-01-21 Thread Simon Glover



On Mon, 21 Jan 2002, Brent Dax wrote:

> 
> If the problem is backtracking, can't you just use the (?>)
> no-backtracking syntax?
> 

 Didn't think of that. I'm a bit concerned at the large warning
 signs attached to it in perlre.pod, though.

 Simon





Re: Night of the Living Lexical (sequel to Apoc4: The loop keyword)

2002-01-21 Thread Michael G Schwern

On Mon, Jan 21, 2002 at 02:44:40PM -0500, Melvin Smith wrote:
> At 12:32 PM 1/21/2002 -0500, Michael G Schwern wrote:
> >On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
> >> : while( my $line =  ) {
> >> : ...
> >> : }
> >>
> >> That still works fine--it's just that $line lives on after the while.
> >
> >This creeping lexical leakage bothers me.  While it might make the
> 
> "lives on", ... "creeping lexical", I feel the same way, we must find some
> way to kill these... :)

If we consult the existing literature...

bullet to the head
fire
electrocution ("Day of the Dead")

Going to the foreign journals:

lawn mower ("Braindead" aka "Dead Alive")
blender("Braindead" again)
sharpened shovel to the head ("Dellamorte Dellamore" aka "Cemetary Man")


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
It sure is fun masturbating.
http://www.unamerican.com/



RE: Night of the Living Lexical (sequel to Apoc4: The loop keyword)

2002-01-21 Thread Tzadik Vanderhoof

Why all the fuss?  Often, you would *want* to access that lexical after the
loop terminates, for instance to check how it terminated.

-Original Message-
From: Michael G Schwern [mailto:[EMAIL PROTECTED]]
Sent: Monday, January 21, 2002 2:59 PM
To: Melvin Smith
Cc: Larry Wall; Damian Conway; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: Night of the Living Lexical (sequel to Apoc4: The loop
keyword)


On Mon, Jan 21, 2002 at 02:44:40PM -0500, Melvin Smith wrote:
> At 12:32 PM 1/21/2002 -0500, Michael G Schwern wrote:
> >On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
> >> : while( my $line =  ) {
> >> : ...
> >> : }
> >>
> >> That still works fine--it's just that $line lives on after the while.
> >
> >This creeping lexical leakage bothers me.  While it might make the
> 
> "lives on", ... "creeping lexical", I feel the same way, we must find some
> way to kill these... :)

If we consult the existing literature...

bullet to the head
fire
electrocution ("Day of the Dead")

Going to the foreign journals:

lawn mower ("Braindead" aka "Dead Alive")
blender("Braindead" again)
sharpened shovel to the head ("Dellamorte Dellamore" aka "Cemetary Man")


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
It sure is fun masturbating.
http://www.unamerican.com/



Re: Night of the Living Lexical (sequel to Apoc4: The loop keywor d)

2002-01-21 Thread Michael G Schwern

On Mon, Jan 21, 2002 at 03:02:06PM -0500, Tzadik Vanderhoof wrote:
> Why all the fuss?  Often, you would *want* to access that lexical after the
> loop terminates, for instance to check how it terminated.

In most cases you don't want that to happen, usually the life of the
lexical is only the block.  If you do want it to live on you can
simply predeclare it.

my $foo;
if $foo = bar {
...
}

which is much simpler than the current setup, where you'll often have
to do this to keep $foo in its proper place.

do {
if my $foo = bar {
...
}
}

The problem with having lexicals leaking out of loop/if conditions
is it defeats the point of lexicals!  Currently, if you write:

use strict;
if( my $bar = bar() ) {
..
}

$bar = baz();  # ooops, ment $baz

you'll get an error from that typo.  Under the current setup:

use strict;
if my $bar = bar() {
...
}

$bar = baz();   # just fine, since it was declared above.

no error.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
Fuck with me and I will saw off your legs.
http://www.unamerican.com/



RE: Night of the Living Lexical (sequel to Apoc4: The loop keywor d)

2002-01-21 Thread Melvin Smith

At 03:02 PM 1/21/2002 -0500, you wrote:
>Why all the fuss?  Often, you would *want* to access that lexical after the
>loop terminates, for instance to check how it terminated.

Why would you want to check it when the condition is typically boolean?

 while( my $line =  ) {

I think many people just "expect" it to work the other way. If I recall, C++
originally worked this way (at least some compilers) and went the
other way.

-Melvin




RE: Night of the Living Lexical (sequel to Apoc4: The loop keywor d)

2002-01-21 Thread Tzadik Vanderhoof

It's not the condition you would want to check, it's the variable (e.g.
$line).

-Original Message-
From: Melvin Smith [mailto:[EMAIL PROTECTED]]
Sent: Monday, January 21, 2002 3:15 PM
To: Tzadik Vanderhoof
Cc: [EMAIL PROTECTED]
Subject: RE: Night of the Living Lexical (sequel to Apoc4: The loop
keywor d)


At 03:02 PM 1/21/2002 -0500, you wrote:
>Why all the fuss?  Often, you would *want* to access that lexical after the
>loop terminates, for instance to check how it terminated.

Why would you want to check it when the condition is typically boolean?

 while( my $line =  ) {

I think many people just "expect" it to work the other way. If I recall, C++
originally worked this way (at least some compilers) and went the
other way.

-Melvin



RE: Night of the Living Lexical (sequel to Apoc4: The loop keywor d)

2002-01-21 Thread Melvin Smith

At 03:14 PM 1/21/2002 -0500, Melvin Smith wrote:
>At 03:02 PM 1/21/2002 -0500, you wrote:
>>Why all the fuss?  Often, you would *want* to access that lexical after the
>>loop terminates, for instance to check how it terminated.
>
>Why would you want to check it when the condition is typically boolean?
>
> while( my $line =  ) {

Well let me rephrase, I see what you mean. (If a terminating condition
was met inside the loop then you might check it); what I meant was
the typical situation is the "while not EOF" or "while still MUNCHING".

-Melvin




RE: Night of the Living Lexical (sequel to Apoc4: The loop keywor d)

2002-01-21 Thread Melvin Smith

At 03:16 PM 1/21/2002 -0500, Tzadik Vanderhoof wrote:
>It's not the condition you would want to check, it's the variable (e.g.
>$line).

Right, I gotcha. I guess I would rather see it cater to the typical use,
not the atypical. Of course my opinion of typical might differ from yours.

I do feel this will be another point of confusion for newcomers to the
language.

-Melvin




Re: Apoc4: The loop keyword

2002-01-21 Thread Ted Ashton

Thus it was written in the epistle of Michael G Schwern,
> On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
> > : while( my $line =  ) {
> > : ...
> > : }
> > 
> > That still works fine--it's just that $line lives on after the while.
> 
> This creeping lexical leakage bothers me.  While it might make the
> language simpler, the proliferation of left-over lexicals seems
> sloppy.

.. . . if not to say downright ugly.  The boolean of an if or a while is more a
part of the "inner stuff" than the "outer".  What's the chance that it could
be considered so?  

Ted
-- 
Ted Ashton ([EMAIL PROTECTED]), Info Sys, Southern Adventist University
  ==
There is no nature at an instant.
 -- Whitehead, Alfred North
  ==
 Deep thought to be found at http://www.southern.edu/~ashted



Re: Apoc4: The loop keyword

2002-01-21 Thread Casey West

On Mon, Jan 21, 2002 at 03:26:30PM -0500, Ted Ashton wrote:
:
:Thus it was written in the epistle of Michael G Schwern,
:> On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
:> > : while( my $line =  ) {
:> > : ...
:> > : }
:> > 
:> > That still works fine--it's just that $line lives on after the while.
:> 
:> This creeping lexical leakage bothers me.  While it might make the
:> language simpler, the proliferation of left-over lexicals seems
:> sloppy.
:
:.. . . if not to say downright ugly.  The boolean of an if or a while is more a
:part of the "inner stuff" than the "outer".  What's the chance that it could
:be considered so?  

So you're suggesting that we fake lexical scoping?  That sounds more
icky than sticking to true lexical scoping.  A block dictates scope,
not before and not after.  I don't see ickyness about making that so.

  Casey West

-- 
Shooting yourself in the foot with Modula-2 
After realizing that you can't actually accomplish anything in this
language, you shoot yourself in the head. 



Re: Night of the Living Lexical (sequel to Apoc4: The loop keywor d)

2002-01-21 Thread Larry Wall

Melvin Smith writes:
: At 03:16 PM 1/21/2002 -0500, Tzadik Vanderhoof wrote:
: >It's not the condition you would want to check, it's the variable (e.g.
: >$line).
: 
: Right, I gotcha. I guess I would rather see it cater to the typical use,
: not the atypical. Of course my opinion of typical might differ from yours.
: 
: I do feel this will be another point of confusion for newcomers to the
: language.

It's confusing if there are different rules for compound statements
than for simple statements.  And what about user defined statements of
indeterminate compoundhood?

mumble my $x = <$in> { process($x) };

How is the user to know whether that C is limited to the block?

The proposed rule is very simple, and consistent with the way things
have always worked with simple statements:  "All C variables live
on past the innermost statement in which they were declared, to the end
of the current statement sequence.  Period."

I sincerely doubt I'm going to change my mind on this one.  The dwim
arguments are inconclusive, so the simplicity argument wins.

Larry



Re: Apoc4: The loop keyword

2002-01-21 Thread Rafael Garcia-Suarez

On 2002.01.21 18:32 Michael G Schwern wrote:
> On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
> > : while( my $line =  ) {
> > : ...
> > : }
> > 
> > That still works fine--it's just that $line lives on after the while.
> 
> This creeping lexical leakage bothers me.  While it might make the
> language simpler, the proliferation of left-over lexicals seems
> sloppy.

Not really -- you're not going to use a lot of different loop-variables,
aren't you ? But :

  while ( my $line =  ) { ... }
  ...
  while ( my $line =  ) { ... }

is now probably going to issue some warning (about a lexical masking an
earlier declaration, you know.)



Re: Apoc4: The loop keyword

2002-01-21 Thread Damian Conway

Casey wrote:

> So you're suggesting that we fake lexical scoping?  That sounds more
> icky than sticking to true lexical scoping.  A block dictates scope,
> not before and not after.  I don't see ickyness about making that so.

Exactly!

What we're cleaning up is the ickiness of having things declared outside
the braces be lexical to the braces. *That's* hard to explain to beginners.

Damian



RE: Apoc4: The loop keyword

2002-01-21 Thread David Whipp

Casey West wrote:
> So you're suggesting that we fake lexical scoping?  That sounds more
> icky than sticking to true lexical scoping.  A block dictates scope,
> not before and not after.  I don't see ickyness about making that so.

Perl is well known for its non-orthogonality. To say that "A block
dictates scope" is only possible definition. But what are the
unintended consequences of saying that an arglist dictates scope?
bare-blocks are not allowed, so a block is always part of an arglist!

(In reality we'd probably want to keep both rules, because an arglist
can contain multiple blocks)

Dave.



Re: Apoc4: The loop keyword

2002-01-21 Thread Larry Wall

Ted Ashton writes:
: Thus it was written in the epistle of Michael G Schwern,
: > On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
: > > : while( my $line =  ) {
: > > : ...
: > > : }
: > > 
: > > That still works fine--it's just that $line lives on after the while.
: > 
: > This creeping lexical leakage bothers me.  While it might make the
: > language simpler, the proliferation of left-over lexicals seems
: > sloppy.
: 
: . . . if not to say downright ugly.  The boolean of an if or a while is more a
: part of the "inner stuff" than the "outer".

It doesn't seem that way to me.

: What's the chance that it could be considered so?  

In most other languages, you wouldn't even have the opportunity to put
a declaration into the conditional.  You'd have to say something like:

my $line = <$in>;
if $line ne "" { ... }

Since

if my $line = <$in> { ... }

is Perl shorthand for those two lines, I don't see how one can say that
the variable is more related to the inside than the outside of the block.
One can claim that the code after the C may not be interested in
C<$line>, but the same is true of the block itself!  The conditional
only decides whether the block runs.  It's not part of the block.

Larry



RE: on parrot strings

2002-01-21 Thread Hong Zhang

> But e` and e are different letters man. And re`sume` and resume are
> different words come to that. If the user wants something that'll
> match 'em both then the pattern should surely be:
> 
>/r[ee`]sum[ee`]/

I disagree. The difference between 'e' and 'e`' is similar to 'c'
and 'C'. The Unicode compability equivalence has similar effect
too, such as "half width letter" and "full width letter".

It may just be my personal perference. But I don't think it is
good idea to push this problem to user of regex.

Hong



RE: on parrot strings

2002-01-21 Thread Hong Zhang

> Yes, that's somewhat problematic.  Making up "a byte CEF" would be
> Wrong, though, because there is, by definition, no CCS to map, and
> we would be dangerously close to conflating in CES, too...
> ACR-CCS-CEF-CES.  Read the character model.  Understand the character
> model.  Embrace the character model.  Be the character model.  (And
> once you're it, read the relevant Unicode, XML, and Web standards.)
> 
> To highlight the difference between opaque numbers and characters,
> the above should really be:
> 
>   if ($buf =~ /\x47\x49\x46\x38\x39\x61\x08\x02/) { ... }
> 
> I think what needs to be done is that \xHH must not be encoded as
> literals (as it is now, 'A' and \x41 are identical (in ASCII)), but
> instead as regex nodes of their own, storing the code points.  Then
> the regex engine can try both the "right/new way" (the Unicode code
> point), and the "wrong/legacy way" (the native code point).

My suggest will be add a binary mode, such as //b. When binary mode
is in effect, only ascii characters (0 - 127) still carry text property.
\p{IsLower} will only match ascii a to z. All 128 - 255 always have false
text property. Any code points must be between 0 and 255. The regcomp
can easily check it upon compilation.

A dedicated binary mode will simplify many issues. And the regex will
be very readable. We can make binary mode be exclusive with text mode,
i.e. and regex expression must be either binary or text, but not both.
(I am not sure if it is really useful to have mixed mode.)

Hong



RE: on parrot strings

2002-01-21 Thread Garrett Goebel

From: Hong Zhang [mailto:[EMAIL PROTECTED]]
> 
> > But e` and e are different letters man. And re`sume` and resume are
> > different words come to that. If the user wants something that'll
> > match 'em both then the pattern should surely be:
> > 
> >/r[ee`]sum[ee`]/
> 
> I disagree. The difference between 'e' and 'e`' is similar to 'c'
> and 'C'. The Unicode compability equivalence has similar effect
> too, such as "half width letter" and "full width letter".

German to English
 schon => already
 schön => nice

2 totally different words.

I'm sure there are words in some language where the difference between a 'e'
and 'e`' can be the difference between an insult and a compliment.
 



Re: Apoc4: The loop keyword

2002-01-21 Thread Graham Barr

On Mon, Jan 21, 2002 at 12:50:38PM -0800, Larry Wall wrote:
> : What's the chance that it could be considered so?  
> 
> In most other languages, you wouldn't even have the opportunity to put
> a declaration into the conditional.  You'd have to say something like:
> 
> my $line = <$in>;
> if $line ne "" { ... }
> 
> Since
> 
> if my $line = <$in> { ... }
> 
> is Perl shorthand for those two lines, I don't see how one can say that
> the variable is more related to the inside than the outside of the block.
> One can claim that the code after the C may not be interested in
> C<$line>, but the same is true of the block itself!  The conditional
> only decides whether the block runs.  It's not part of the block.

But are we not at risk of introducing another form of

  my $x if 0;

with

  if my $one =  {
...
  }
  elsif my $two =  {
  }

  if ($two) {
...
  }

Graham.



Re: Apoc4: The loop keyword

2002-01-21 Thread Michael G Schwern

On Mon, Jan 21, 2002 at 03:27:29PM -0500, Casey West wrote:
> So you're suggesting that we fake lexical scoping?  That sounds more
> icky than sticking to true lexical scoping.  A block dictates scope,
> not before and not after.  I don't see ickyness about making that
> so.

Perl5 already fakes lexical scoping all over the place.  A lot of that
fakery can be removed from the language, yes, but in the case of block
conditions it seems that DWIMery should win over orthoginality.


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
And God created Cat to be a companion to Adam.  And Cat would not obey
Adam.  And when Adam gazed into Cat's eyes, he was reminded that he
was not the supreme being.  And Adam learned humility.
-- http://www.catsarefrommars.com/creationist.htm



Re: Apoc4: The loop keyword

2002-01-21 Thread Larry Wall

David Whipp writes:
: Casey West wrote:
: > So you're suggesting that we fake lexical scoping?  That sounds more
: > icky than sticking to true lexical scoping.  A block dictates scope,
: > not before and not after.  I don't see ickyness about making that so.
: 
: Perl is well known for its non-orthogonality. To say that "A block
: dictates scope" is only possible definition. But what are the
: unintended consequences of saying that an arglist dictates scope?
: bare-blocks are not allowed, so a block is always part of an arglist!
: 
: (In reality we'd probably want to keep both rules, because an arglist
: can contain multiple blocks)

Non-orthogonality is not a virtue--shortcuts are a virtue.  As
shortcuts go, this particular one is not buying us much of anything
that offsets the difficulty in learning it, so out it goes.

Larry



Re: Apoc4: The loop keyword

2002-01-21 Thread Michael G Schwern

On Mon, Jan 21, 2002 at 03:43:07PM -0500, Damian Conway wrote:
> Casey wrote:
> 
> > So you're suggesting that we fake lexical scoping?  That sounds more
> > icky than sticking to true lexical scoping.  A block dictates scope,
> > not before and not after.  I don't see ickyness about making that so.
> 
> Exactly!
> 
> What we're cleaning up is the ickiness of having things declared outside
> the braces be lexical to the braces. *That's* hard to explain to beginners.

In this case I'll take long-term simplicity over short-term
easy-to-explain rules.  Otherwise we'll be writing this all over the
place til Kingdom come.

do {
if my $foo = bar() {
...
}
}


-- 

Michael G. Schwern   <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/
Perl Quality Assurance  <[EMAIL PROTECTED]> Kwalitee Is Job One
I sit on the floor and pick my nose
  and think of dirty things
Of deviant dwarfs who suck their toes
  and elves who drub their dings.
-- Frito Bugger, "Bored Of The Rings"



Re: Apoc4: The loop keyword

2002-01-21 Thread Melvin Smith

At 12:50 PM 1/21/2002 -0800, Larry Wall wrote:
>In most other languages, you wouldn't even have the opportunity to put
>a declaration into the conditional.  You'd have to say something like:

I grudgingly agree here. Where did this shorthand come from anyway?
The first time I ever used it was C++ for() loops, but I'm sure it has 
roots older
than that. I guess Perl took shorthand to the next level so C++ arguments
wouldn't work here, *sniff*!

-Melvin




Re: Apoc4: The loop keyword

2002-01-21 Thread Larry Wall

Michael G Schwern writes:
: On Mon, Jan 21, 2002 at 03:27:29PM -0500, Casey West wrote:
: > So you're suggesting that we fake lexical scoping?  That sounds more
: > icky than sticking to true lexical scoping.  A block dictates scope,
: > not before and not after.  I don't see ickyness about making that
: > so.
: 
: Perl5 already fakes lexical scoping all over the place.  A lot of that
: fakery can be removed from the language, yes, but in the case of block
: conditions it seems that DWIMery should win over orthoginality.

It may seem that way to you, but it doesn't seem that way to me, seemingly.

Larry



Re: Apoc4: The loop keyword

2002-01-21 Thread Graham Barr

On Mon, Jan 21, 2002 at 03:58:49PM -0500, Michael G Schwern wrote:
> On Mon, Jan 21, 2002 at 03:43:07PM -0500, Damian Conway wrote:
> > Casey wrote:
> > 
> > > So you're suggesting that we fake lexical scoping?  That sounds more
> > > icky than sticking to true lexical scoping.  A block dictates scope,
> > > not before and not after.  I don't see ickyness about making that so.
> > 
> > Exactly!
> > 
> > What we're cleaning up is the ickiness of having things declared outside
> > the braces be lexical to the braces. *That's* hard to explain to beginners.
> 
> In this case I'll take long-term simplicity over short-term
> easy-to-explain rules.  Otherwise we'll be writing this all over the
> place til Kingdom come.
> 
> do {
> if my $foo = bar() {
> ...
> }
> }

If that is what you want then fine. But I have lost count of the number
of times I have wanted to do

  if ((my $foo = bar()) eq 'foo') {
...
  }

  if ($foo eq 'bar') {
...
  }

Personally I really don't see it as a problem.
  
Graham.



Re: Apoc4: The loop keyword

2002-01-21 Thread Larry Wall

Graham Barr writes:
: But are we not at risk of introducing another form of
: 
:   my $x if 0;
: 
: with
: 
:   if my $one =  {
: ...
:   }
:   elsif my $two =  {
:   }
: 
:   if ($two) {
: ...
:   }

Then it's just undefined.  It's no different from how &&, ||, or ??::
work when you put a declaration in something that's conditionalized.

Larry



Re: Apoc4: The loop keyword

2002-01-21 Thread Graham Barr

On Mon, Jan 21, 2002 at 01:01:09PM -0800, Larry Wall wrote:
> Graham Barr writes:
> : But are we not at risk of introducing another form of
> : 
> :   my $x if 0;
> : 
> : with
> : 
> :   if my $one =  {
> : ...
> :   }
> :   elsif my $two =  {
> :   }
> : 
> :   if ($two) {
> : ...
> :   }
> 
> Then it's just undefined.  It's no different from how &&, ||, or ??::
> work when you put a declaration in something that's conditionalized.

Right. So we need to make sure that the implementation does that. In Perl5
my has a runtime part, so if it is not actually run then the lexical
can hold the value of the previous time it was executed.

Graham.



Re: Apoc4: The loop keyword

2002-01-21 Thread Larry Wall

Michael G Schwern writes:
: In this case I'll take long-term simplicity over short-term
: easy-to-explain rules.

I fail to see what's simpler about it.

: Otherwise we'll be writing this all over the
: place til Kingdom come.
: 
: do {
: if my $foo = bar() {
: ...
: }
: }

I predict that your prediction will not pan out.

Larry



RE: Apoc4: The loop keyword

2002-01-21 Thread David Whipp

Graham Barr wrote:
> But I have lost count of the number
> of times I have wanted to do
>
>   if ((my $foo = bar()) eq 'foo') {
> ...
>   }
> 
>   if ($foo eq 'bar') {
> ...
>   }
> 

To be contrasted with:

while (my($k, $v) = each %h1)
{
  ...
}

while (my($k, $v) = each %h2) # error?
{
  ...
}

Of course, there's an alternative now, using for, ->, and pairs.


Dave.




Re: Night of the Living Lexical (sequel to Apoc4: The loop keyword)

2002-01-21 Thread Uri Guttman

> "MS" == Melvin Smith <[EMAIL PROTECTED]> writes:

  MS> At 12:32 PM 1/21/2002 -0500, Michael G Schwern wrote:
  >> On Sun, Jan 20, 2002 at 10:58:34PM -0800, Larry Wall wrote:
  >> > : while( my $line =  ) {
  >> > : ...
  >> > : }
  >> >
  >> > That still works fine--it's just that $line lives on after the while.
  >> 
  >> This creeping lexical leakage bothers me.  While it might make the

  MS> "lives on", ... "creeping lexical", I feel the same way, we must find some
  MS> way to kill these... :)

well, larry looks at it differently and what he said on the cruise makes
sense. the bigger problems were firstly not supporting lexical tunneling
from the 'for my $foo' into the continue block. now that NEXT blocks are
inside the loop that is fixed. and secondly the rule about the for
variable being declare in the outer scope also solves the problem of
keeping that last value around after the loop (prematurely) exits. so
the lexical is not creeping out but being declared in the surrounding
and proper scope.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
-- Stem is an Open Source Network Development Toolkit and Application Suite -
- Stem and Perl Development, Systems Architecture, Design and Coding 
Search or Offer Perl Jobs    http://jobs.perl.org



Re: Night of the Living Lexical (sequel to Apoc4: The loop keyword)

2002-01-21 Thread Melvin Smith

At 04:12 PM 1/21/2002 -0500, Uri Guttman wrote:
>   MS> "lives on", ... "creeping lexical", I feel the same way, we must 
> find some
>   MS> way to kill these... :)
>
>well, larry looks at it differently and what he said on the cruise makes

Well we had a go, but our kung fu powers were no match for Larry's.

-Melvin




RE: on parrot strings

2002-01-21 Thread Hong Zhang

> > But e` and e are different letters man. And re`sume` and resume are 
> > different words come to that. If the user wants something that'll 
> > match 'em both then the pattern should surely be: 
> > 
> >/r[ee`]sum[ee`]/ 
> 
> I disagree. The difference between 'e' and 'e`' is similar to 'c' 
> and 'C'. The Unicode compability equivalence has similar effect 
> too, such as "half width letter" and "full width letter". 

German to English 
 schon => already 
 schön => nice 

2 totally different words. 

I am talking about similar word where you are talking about different word.
I don't mind if someone can search cross languages. Some Chinese search
enginee can do chinese search using engish keyword (for people having
chinese viewer but not chinese input method.) Of course, no one expect
regex engine should do that.

The "re`sume`" do appear in English sentence. The "[half|full] width letter"
are in the same language.

Hong



Re: Apoc4: The loop keyword

2002-01-21 Thread Larry Wall

Graham Barr writes:
: On Mon, Jan 21, 2002 at 01:01:09PM -0800, Larry Wall wrote:
: > Graham Barr writes:
: > : But are we not at risk of introducing another form of
: > : 
: > :   my $x if 0;
: > : 
: > : with
: > : 
: > :   if my $one =  {
: > : ...
: > :   }
: > :   elsif my $two =  {
: > :   }
: > : 
: > :   if ($two) {
: > : ...
: > :   }
: > 
: > Then it's just undefined.  It's no different from how &&, ||, or ??::
: > work when you put a declaration in something that's conditionalized.
: 
: Right. So we need to make sure that the implementation does that. In Perl5
: my has a runtime part, so if it is not actually run then the lexical
: can hold the value of the previous time it was executed.

Well, true enough.  Perhaps "undefined" is too meaningful.  We could
borrow a phrase from Ada culture and just call it "erroneous".

Larry



Re: Apoc4: The loop keyword

2002-01-21 Thread Glenn Linderman

Michael G Schwern wrote:
> 
> In this case I'll take long-term simplicity over short-term
> easy-to-explain rules.  Otherwise we'll be writing this all over the
> place til Kingdom come.
> 
> do {
> if my $foo = bar() {
> ...
> }
> }

I'm surprised no one else has invented

{ unless my $foo = bar () { last };
  ...
}

during this argument :)  Of course, it doesn't scale to else clauses, so
maybe that is why.

-- 
Glenn
=
Due to the current economic situation, the light at the
end of the tunnel will be turned off until further notice.



Re: [A-Z]+\s*\{

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 11:14, Larry Wall wrote:
> So I'm not terribly interested in going out of my way to make statement
> blocks parse exactly like terms in an ordinary expressions.  If it
> happens, it'll probably be by accident.

That's fine.  (And I agree.)  All I really cared about was map, grep, and 
sort.  The rest was was simply an extension to the implausable end.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: Apoc4: Parentheses

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 11:27, Larry Wall wrote:
> Compound statements in Perl 5 do have an implicit {} around the entire
> statement, but that has nothing to do with the required parentheses
> around the expressions, other than the fact that we're doing away with
> both of those special rules in Perl 6.  But parentheses have always
> been totally transparent to any C contained within them.  It's
> the implicit {} that was protecting the C condition from getting
> a warning like the C got (which got the warning because it's
> at the same scope level as C's declaration).

But the flies are spontaneously generating!  ;-)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



Re: on parrot strings

2002-01-21 Thread Russ Allbery

Hong Zhang <[EMAIL PROTECTED]> writes:

> I disagree. The difference between 'e' and 'e`' is similar to 'c'
> and 'C'.

No, it's not.

In many languages, an accented character is a completely different letter.
It's alphabetized separately, it's pronounced differently, and there are
many words that differ only in the presence of an accent.

Changing the capitalization of C does not change the word.  Adding or
removing an accent does.

> The Unicode compability equivalence has similar effect too, such as
> "half width letter" and "full width letter".

You'll find that the Unicode compatibility equivalence does nothing as
ill-conceived as unifying e and e', for very good reason because that
would be a horrible mistake.

-- 
Russ Allbery ([EMAIL PROTECTED]) 



Re: on parrot strings

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 16:43, Russ Allbery wrote:
> Changing the capitalization of C does not change the word. 

Er, most of the time. 

-- 
Bryan C. Warnock
[EMAIL PROTECTED]



RE: on parrot strings

2002-01-21 Thread Stephen Howard

Not to get modifier-happy, but it seems like a user-oriented solution would be to let 
the user specify a modifier:

 "caseinsensitive" =~ m/CaseInsensitive/i

 "resume" =~ m/re`sume`/d (diacritic modifier?)

-Stephen

-Original Message-
From: Hong Zhang [mailto:[EMAIL PROTECTED]]
Sent: Monday, January 21, 2002 04:10 PM
Cc: [EMAIL PROTECTED]
Subject: RE: on parrot strings


> > But e` and e are different letters man. And re`sume` and resume are
> > different words come to that. If the user wants something that'll
> > match 'em both then the pattern should surely be:
> >
> >/r[ee`]sum[ee`]/
>
> I disagree. The difference between 'e' and 'e`' is similar to 'c'
> and 'C'. The Unicode compability equivalence has similar effect
> too, such as "half width letter" and "full width letter".

German to English
 schon => already
 schön => nice

2 totally different words.

I am talking about similar word where you are talking about different word.
I don't mind if someone can search cross languages. Some Chinese search
enginee can do chinese search using engish keyword (for people having
chinese viewer but not chinese input method.) Of course, no one expect
regex engine should do that.

The "re`sume`" do appear in English sentence. The "[half|full] width letter"
are in the same language.

Hong




Re: on parrot strings

2002-01-21 Thread Russ Allbery

Bryan C Warnock <[EMAIL PROTECTED]> writes:
> On Monday 21 January 2002 16:43, Russ Allbery wrote:

>> Changing the capitalization of C does not change the word. 

> Er, most of the time. 

No, pretty much all of the time.  There are differences between proper
nouns and common nouns, but those are differences routinely quashed as a
typesetting decision; if you write both proper nouns and common nouns in
all caps as part of a headline, the lack of distinction is not considered
a misspelling.  Similarly, if you capitalize the common noun because it
occurs at the beginning of the sentence, that doesn't transform its
meaning.

Whereas adding or removing an accent is always considered a misspelling,
at least in some languages.  It's like adding or removing random letters
from the word.

re'sume' and resume are two different words.  It so happens that in
English re'sume' is a varient spelling for one meaning of resume.  I don't
believe that regexes should try to automatically pick up varient
spellings.  Should the regex /aerie/ match /eyrie/?  That makes as much
sense as a search for /resume/ matching /re'sume'/.

-- 
Russ Allbery ([EMAIL PROTECTED]) 



[PATCH] are characters unsigned?

2002-01-21 Thread Nicholas Clark

This warning:

string.c: In function `string_transcode':
string.c:194: warning: passing arg 2 of pointer to function as unsigned due to 
prototype

represents a can of worms. The summary is "are characters signed or unsigned?"

I am of the opinion that they are UINTVAL, not INTVAL. (and EOF being a
negative value such as -1 is only needed for C stdio, and I seem to remember
that Dan has strong opinions on C stdio, and what C can do with it)

This is not a very considered opinion, I should add. It just feels safer with
them as unsigned, on the assumption that our code doesn't do EOF.

In which case, the following rather involved patch is needed. Or something
similar. And it's scary because it redefines chartypes, so please could
someone sanity check it.

I thought that it should be this

INTVAL (*get_digit)(UINTVAL c);

not this

UINTVAL (*get_digit)(UINTVAL c);

as I'd not be surprised if Unicode contains a glyph in some script that is
for a digit with negative value. (And if there isn't the Klingons will
invent one to be awkward)

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- include/parrot/chartype.h~  Thu Dec 27 18:50:28 2001
+++ include/parrot/chartype.h   Mon Jan 21 19:12:16 2002
@@ -13,15 +13,15 @@
 #if !defined(PARROT_CHARTYPE_H_GUARD)
 #define PARROT_ENCODING_H_GUARD
 
-typedef INTVAL (*CHARTYPE_TRANSCODER)(INTVAL c);
+typedef UINTVAL (*CHARTYPE_TRANSCODER)(UINTVAL c);
 
 typedef struct {
 const char *name;
 const char *default_encoding;
 CHARTYPE_TRANSCODER (*transcode_from)(const char *from);
 CHARTYPE_TRANSCODER (*transcode_to)(const char *to);
-BOOLVAL (*is_digit)(INTVAL c);
-INTVAL (*get_digit)(INTVAL c);
+BOOLVAL (*is_digit)(UINTVAL c);
+INTVAL (*get_digit)(UINTVAL c);
 } CHARTYPE;
 
 const CHARTYPE *
--- ../parrot/string.c  Tue Jan 15 23:14:51 2002
+++ string.cMon Jan 21 19:28:24 2002
@@ -186,7 +186,7 @@
 destend = deststart;
 
 while (srcstart < srcend) {
-INTVAL c = src->encoding->decode(srcstart);
+UINTVAL c = src->encoding->decode(srcstart);
 
 if (transcoder1) c = transcoder1(c);
 if (transcoder2) c = transcoder2(c);
@@ -424,7 +424,7 @@
 }
 
 if (len == 1) {
-INTVAL c = s->encoding->decode(s->bufstart);
+UINTVAL c = s->encoding->decode(s->bufstart);
 if (s->type->is_digit(c) && s->type->get_digit(c) == 0) {
 return 0;
 }
@@ -456,7 +456,7 @@
 BOOLVAL in_number = 0;
 
 while (start < end) {
-INTVAL c = s->encoding->decode(start);
+UINTVAL c = s->encoding->decode(start);
 
 if (s->type->is_digit(c)) {
 in_number = 1;
@@ -500,7 +500,7 @@
 INTVAL fake_exponent = 0;
 
 while (start < end) {
-INTVAL c = s->encoding->decode(start);
+UINTVAL c = s->encoding->decode(start);
 
 if (s->type->is_digit(c)) {
 if (in_exp) {
--- ../parrot/chartypes/unicode.c   Tue Jan 15 20:02:54 2002
+++ chartypes/unicode.c Mon Jan 21 20:06:09 2002
@@ -23,12 +23,12 @@
 }
 
 static BOOLVAL
-unicode_is_digit(INTVAL c) {
+unicode_is_digit(UINTVAL c) {
 return (BOOLVAL)(isdigit(c) ? 1 : 0); /* FIXME - Other code points are also 
digits */
 }
 
-static INTVAL
-unicode_get_digit(INTVAL c) {
+static UINTVAL
+unicode_get_digit(UINTVAL c) {
 return c - '0'; /* FIXME - many more digits than this... */
 }
 
--- ../parrot/chartypes/usascii.c   Tue Jan 15 20:02:54 2002
+++ chartypes/usascii.c Mon Jan 21 20:10:49 2002
@@ -12,9 +12,9 @@
 
 #include "parrot/parrot.h"
 
-static INTVAL
-usascii_transcode_from_unicode(INTVAL c) {
-if (c < 0 || c > 127) {
+static UINTVAL
+usascii_transcode_from_unicode(UINTVAL c) {
+if (c > 127) {
 internal_exception(INVALID_CHARACTER, "Invalid character for US-ASCII");
 }
 return c;
@@ -30,8 +30,8 @@
 }
 }
 
-static INTVAL
-usascii_transcode_to_unicode(INTVAL c) {
+static UINTVAL
+usascii_transcode_to_unicode(UINTVAL c) {
 return c;
 }
 
@@ -46,13 +46,13 @@
 }
 
 static BOOLVAL
-usascii_is_digit(INTVAL c) {
-return (BOOLVAL)(isdigit(c) ? 1 : 0);
+usascii_is_digit(UINTVAL c) {
+return (BOOLVAL)(isdigit((int) c) ? 1 : 0);
 }
 
 static INTVAL
-usascii_get_digit(INTVAL c) {
-return c - '0';
+usascii_get_digit(UINTVAL c) {
+return ((INTVAL) c) - '0';
 }
 
 const CHARTYPE usascii_chartype = {



[PATCH] MANIFEST.SKIP

2002-01-21 Thread Nicholas Clark

This patch (context diffs mean that it's atop the Term::ReadLine patch)
adds a check for unexpected files not in the MANIFEST to Configure.pl

I'm not certain that putting the test in Configure.pl is the right place
for it, but I do believe that having an accurate MANIFEST.SKIP and the
ability to run 

perl -MExtUtils::Manifest -e ExtUtils::Manifest::fullcheck

(possibly as a Makefile target) is useful.

Currently:

Not in MANIFEST: include/parrot/rxstacks.h
Not in MANIFEST: rxstacks.c

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- /mnt/six/parrot/parrot_readline++/Configure.pl~ Mon Jan 21 17:44:03 2002
+++ Configure.plMon Jan 21 19:48:37 2002
@@ -11,7 +11,7 @@
 
 use Config;
 use Getopt::Long;
-use ExtUtils::Manifest qw(manicheck);
+use ExtUtils::Manifest qw(fullcheck);
 use File::Copy;
 use Term::ReadLine; # The stub is present from earlier than 5.004
 use Parrot::BuildUtil;
@@ -810,11 +810,10 @@
 #
 
 sub check_manifest {
-print "\n";
 
-my(@missing)=manicheck();
+my($missing,$extra)=fullcheck();
 
-if(@missing) {
+if(@$missing) {
 print <<"END";
 
 Ack, some files were missing!  I can't continue running
@@ -838,6 +837,7 @@
 if ($term) {
 my $type = $term->ReadLine;
 print <<"END";
+
 Okay, we found everything.  Next you'll need to answer
 a few questions about your system.  You have
 ${ type} installed, so I'll use that to let
--- /dev/null   Mon Jul 16 22:57:44 2001
+++ MANIFEST.SKIP   Mon Jan 21 20:13:26 2002
@@ -0,0 +1,52 @@
+\.o$
+^\.cvsignore$
+/\.cvsignore$
+^include/parrot/config\.h$
+^include/parrot/platform\.h$
+^Makefile$
+/Makefile$
+^Parrot/Types\.pm$
+^Parrot/Config\.pm$
+^platform\.c$
+^config.opt$
+
+^vtable\.ops$
+^include/parrot/vtable\.h$
+^include/parrot/jit_struct\.h$
+^include/parrot/oplib/core_ops\.h$
+^include/parrot/oplib/core_ops_prederef\.h$
+
+^core_ops\.c$
+^core_ops_prederef\.c$
+^vtable_ops\.c$
+
+^Parrot/Jit\.pm$
+^Parrot/PMC\.pm$
+^Parrot/OpLib/core\.pm$
+
+^classes/default\.h$
+^classes/default\.c$
+^classes/intqueue\.h$
+^classes/intqueue\.c$
+^classes/parrotpointer\.h$
+^classes/parrotpointer\.c$
+^classes/perlarray\.h$
+^classes/perlarray\.c$
+^classes/perlhash\.h$
+^classes/perlhash\.c$
+^classes/perlint\.h$
+^classes/perlint\.c$
+^classes/perlnum\.h$
+^classes/perlnum\.c$
+^classes/perlstring\.h$
+^classes/perlstring\.c$
+^classes/perlundef\.h$
+^classes/perlundef\.c$
+
+^docs/packfile-c\.pod$
+^docs/packfile-perl\.pod$
+^docs/core_ops\.pod$
+
+^test_parrot$
+^pdump$
+^blib/
--- ../parrot/MANIFEST  Mon Jan 21 16:42:17 2002
+++ MANIFESTMon Jan 21 19:34:16 2002
@@ -3,6 +3,7 @@
 Configure.pl
 KNOWN_ISSUES
 MANIFEST
+MANIFEST.SKIP
 Makefile.in
 NEWS
 Parrot/Assembler.pm



[PATCH] warnings in test_main.c

2002-01-21 Thread Nicholas Clark

Before:

cc -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow -Wpointer-arith 
-Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Winline -W 
-Wsign-compare -Wno-unused   -I./include  -DHAS_JIT -DI386 -o test_main.o -c 
test_main.c
test_main.c: In function `main':
test_main.c:230: warning: passing arg 4 of `PackFile_unpack' as unsigned due to 
prototype
test_main.c:249: warning: declaration of `time' shadows global declaration

After:

cc -Wall -Wstrict-prototypes -Wmissing-prototypes -Winline -Wshadow -Wpointer-arith 
-Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Winline -W 
-Wsign-compare -Wno-unused   -I./include  -DHAS_JIT -DI386 -o test_main.o -c 
test_main.c

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- test_main.c.origMon Jan 14 20:32:55 2002
+++ test_main.c Mon Jan 21 17:58:38 2002
@@ -227,7 +227,7 @@
 
 pf = PackFile_new();
 if( !PackFile_unpack(interpreter, pf, (char *)program_code, 
- (opcode_t)program_size) ) {
+ (size_t)program_size) ) {
 printf( "Can't unpack.\n" );
 return 1;
 }
@@ -246,7 +246,7 @@
 unsigned int j;
 int op_count   = 0;
 int call_count = 0;
-FLOATVAL time = 0.0;
+FLOATVAL sum_time = 0.0;
 
 printf("Operation profile:\n\n");
 
@@ -257,7 +257,7 @@
 if(interpreter->profile[j].numcalls > 0) {
 op_count++;
 call_count += interpreter->profile[j].numcalls;
-time += interpreter->profile[j].time;
+sum_time += interpreter->profile[j].time;
 
 printf("  %5d  %-12s  %12ld  %5.6f  %5.6f\n", j, 
interpreter->op_info_table[j].full_name,
@@ -274,8 +274,8 @@
 op_count,
 "",
 call_count,
-time,
-time / (FLOATVAL)call_count
+sum_time,
+sum_time / (FLOATVAL)call_count
 );
 }
 }



Re: [maybe PATCH] use Term::ReadLine where possible

2002-01-21 Thread Nicholas Clark

On Mon, Jan 21, 2002 at 05:52:52PM +, Nicholas Clark wrote:
> I think that this is a good idea, but there may be arguments against it.

If it's a good idea it needs this correction

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- Configure.pl~   Mon Jan 21 17:44:03 2002
+++ Configure.plMon Jan 21 20:05:37 2002
@@ -716,7 +716,8 @@
 # Term::ReadLine::Perl does a sideways scrolling single line like ksh.
 print "$message [$c{$field}]\n";
 $input = $term->readline("", $c{$field});
-$term->addhistory($input) if /\S/ and !$term->Features->{autohistory};
+$term->addhistory($input)
+if $input =~ /\S/ and !$term->Features->{autohistory};
 } else {
 print "$message [$c{$field}] ";
 chomp($input=);



Re: Apoc4: The loop keyword

2002-01-21 Thread Graham Barr

On Mon, Jan 21, 2002 at 01:38:39PM -0800, Larry Wall wrote:
> Graham Barr writes:
> : On Mon, Jan 21, 2002 at 01:01:09PM -0800, Larry Wall wrote:
> : > Graham Barr writes:
> : > : But are we not at risk of introducing another form of
> : > : 
> : > :   my $x if 0;
> : > : 
> : > : with
> : > : 
> : > :   if my $one =  {
> : > : ...
> : > :   }
> : > :   elsif my $two =  {
> : > :   }
> : > : 
> : > :   if ($two) {
> : > : ...
> : > :   }
> : > 
> : > Then it's just undefined.  It's no different from how &&, ||, or ??::
> : > work when you put a declaration in something that's conditionalized.
> : 
> : Right. So we need to make sure that the implementation does that. In Perl5
> : my has a runtime part, so if it is not actually run then the lexical
> : can hold the value of the previous time it was executed.
> 
> Well, true enough.  Perhaps "undefined" is too meaningful.  We could
> borrow a phrase from Ada culture and just call it "erroneous".

Either that, or instead of my having a runtime element, we just initialize
all lexicals at the start of the block in which they are declared. So the
initialize step is not controlled by runtime effects.

I guess whether or not this (bug?) arises again depends on how parrot
implements lexicals.

Graham.




Re: on parrot strings

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 17:11, Russ Allbery wrote:
> No, pretty much all of the time.  There are differences between proper
> nouns and common nouns, but those are differences routinely quashed as a
> typesetting decision; if you write both proper nouns and common nouns in
> all caps as part of a headline, the lack of distinction is not considered
> a misspelling.  Similarly, if you capitalize the common noun because it
> occurs at the beginning of the sentence, that doesn't transform its
> meaning.

That doesn't mitigate the fact that they are different words.  Sure, English 
is forgiving, as its filled with heteronyms and homographs.  But it's all 
moot because regexes are character-oriented, not word-oriented.  

Given that they're character-oriented, we only need to provide character 
transformations between upper, lower, and title case.  But is that the 
dividing line?

>
> Whereas adding or removing an accent is always considered a misspelling,
> at least in some languages.  It's like adding or removing random letters
> from the word.

No, it's substituting letters in a word.  It's adding or removing random 
characters from the string representation of the word.

>
> re'sume' and resume are two different words.  It so happens that in
> English re'sume' is a varient spelling for one meaning of resume.  I don't
> believe that regexes should try to automatically pick up varient
> spellings.  Should the regex /aerie/ match /eyrie/?  That makes as much
> sense as a search for /resume/ matching /re'sume'/.

Varient spellings imply word-oriented searches.  We're talking about 
character-oriented transformations, and the questions is whether or not 
there's enough justification - which I feel won't come from grammatical 
rationales, but from the 7-bit ASCII storage of words with accents - to 
provide a transformation from a base letter with accents to just the base 
letter.  

Do you feel that altering accented letters to better represent them within 
the facilities provided isn't done, or is wrong?  I'm not sure what 
you're typing as your example word, and whether or not it's getting munged 
in the meantime, but "résumé"  (r, e accent, s, u, m, e accent) is coming 
across "re'sume'" (r, e, apostrophe, s, u, m, e, apostrophe).  (The incoming 
message was encoded ISO-8859-1, so presumably it should have preserved 
character 233, which is what I'm sending out.)

This isn't a ridiculous question.  Personally, I don't think that we should. 
The facilities are quickly coming into place to be able to do proper 
character encodings, and I think that we should lead from the front and 
encourage folks to be proper - not only in their searches, but in their text 
production. 


-- 
Bryan C. Warnock
[EMAIL PROTECTED]



[PATCH] quieten many pmc warnings

2002-01-21 Thread Nicholas Clark

This eliminates many gcc warnings from pmc code by
1: changing index to idx
2: including the pmc's own header file so as to give declarations for its
   functions
3: moving the declarations of the global init functions to global_setup.h so
   that the pmc files see a declaration for their own init function (which
   otherwise gcc will warn about, on the zealous warnings we use)

Nicholas Clark
-- 
ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html

--- ./include/parrot/global_setup.h.origMon Dec 31 15:58:28 2001
+++ ./include/parrot/global_setup.h Mon Jan 21 21:32:03 2002
@@ -14,6 +14,16 @@
 #if !defined(PARROT_GLOBAL_SETUP_H_GUARD)
 #define PARROT_GLOBAL_SETUP_H_GUARD
 
+/* Needed because this might get compiled before pmcs have been built */
+void Parrot_PerlUndef_class_init(void);
+void Parrot_PerlInt_class_init(void);
+void Parrot_PerlNum_class_init(void);
+void Parrot_PerlString_class_init(void);
+void Parrot_PerlArray_class_init(void);
+void Parrot_PerlHash_class_init(void);
+void Parrot_ParrotPointer_class_init(void);
+void Parrot_IntQueue_class_init(void);
+
 void
 init_world(void);
 
--- ./global_setup.c.orig   Mon Jan 14 20:32:52 2002
+++ ./global_setup.cMon Jan 21 21:31:50 2002
@@ -14,16 +14,6 @@
 #define INSIDE_GLOBAL_SETUP
 #include "parrot/parrot.h"
 
-/* Needed because this might get compiled before pmcs have been built */
-void Parrot_PerlUndef_class_init(void);
-void Parrot_PerlInt_class_init(void);
-void Parrot_PerlNum_class_init(void);
-void Parrot_PerlString_class_init(void);
-void Parrot_PerlArray_class_init(void);
-void Parrot_PerlHash_class_init(void);
-void Parrot_ParrotPointer_class_init(void);
-void Parrot_IntQueue_class_init(void);
-
 void
 init_world(void) {
 string_init(); /* Set up the string subsystem */
--- ./classes/pmc2c.pl.orig Fri Jan  4 02:29:18 2002
+++ ./classes/pmc2c.pl  Mon Jan 21 21:21:25 2002
@@ -185,7 +185,10 @@
   my @methods;
 
   my $OUT = '';
-  my $HOUT = '';
+  my $HOUT = <<"EOC";
+ /* Do not edit - automatically generated from '$pmcfile' by $0 */
+
+EOC
   my %defaulted;
 
   while ($classblock =~ s/($signature_re)//) {
@@ -228,9 +231,12 @@
 
   my $includes = '';
   foreach my $class (keys %visible_supers) {
-  next if $class eq $classname;
+  # No, include yourself to check your headers match your bodies
+  # (and gcc -W... is happy then)
+  # next if $class eq $classname;
   $includes .= qq(#include "\L$class.h"\n);
   }
+
 
   $OUT = cache.int_val = value->vtable->get_integer(INTERP,value);
 }
 
-void set_integer_index (INTVAL value, INTVAL index) {
+void set_integer_index (INTVAL value, INTVAL idx) {
 }
 
 void set_number (PMC * value) {
@@ -123,7 +123,7 @@
SELF->cache.num_val = (FLOATVAL)value->cache.int_val;
 }
 
-void set_number_index (FLOATVAL value, INTVAL index) {
+void set_number_index (FLOATVAL value, INTVAL idx) {
 }
 
 void set_string (PMC * value) {
@@ -148,7 +148,7 @@
string_copy(INTERP, (STRING*)value->cache.struct_val);
 }
 
-void set_string_index (STRING* value, INTVAL index) {
+void set_string_index (STRING* value, INTVAL idx) {
 }
 
 void set_value (void* value) {
--- ./classes/perlnum.pmc.orig  Mon Jan 14 20:32:57 2002
+++ ./classes/perlnum.pmc   Mon Jan 21 21:23:31 2002
@@ -50,14 +50,14 @@
return (INTVAL)SELF->cache.num_val;
 }
 
-INTVAL get_integer_index (INTVAL index) {
+INTVAL get_integer_index (INTVAL idx) {
 }
 
 FLOATVAL get_number () {
 return SELF->cache.num_val;
 }
 
-FLOATVAL get_number_index (INTVAL index) {
+FLOATVAL get_number_index (INTVAL idx) {
 }
 
 STRING* get_string () {
@@ -73,7 +73,7 @@
return s;
 }
 
-STRING* get_string_index (INTVAL index) {
+STRING* get_string_index (INTVAL idx) {
 }
 
 BOOLVAL get_bool () {
@@ -108,7 +108,7 @@
SELF->cache.int_val = value->cache.int_val;
 }
 
-void set_integer_index (INTVAL value, INTVAL index) {
+void set_integer_index (INTVAL value, INTVAL idx) {
 }
 
 void set_number (PMC * value) {
@@ -127,7 +127,7 @@
SELF->cache.num_val = value->cache.num_val;
 }
 
-void set_number_index (FLOATVAL value, INTVAL index) {
+void set_number_index (FLOATVAL value, INTVAL idx) {
 }
 
 void set_string (PMC * value) {
@@ -155,7 +155,7 @@
SELF->cache.struct_val = value->cache.struct_val;
 }
 
-void set_string_index (STRING* value, INTVAL index) {
+void set_string_index (STRING* value, INTVAL idx) {
 }
 
 void set_value (void* value) {
--- ./classes/perlint.pmc.orig  Mon Jan 14 20:32:57 2002
+++ ./classes/perlint.pmc   Mon Jan 21 21:23:42 2002
@@ -50,14 +50,14 @@
 return SELF->cache.int_val;
 }
 
-  

[PATCH] tidy up JIT temporaries

2002-01-21 Thread Nicholas Clark

On Mon, Jan 21, 2002 at 09:00:48PM +, Nicholas Clark wrote:
> I'm not certain that putting the test in Configure.pl is the right place
> for it, but I do believe that having an accurate MANIFEST.SKIP and the
> ability to run 
> 
> perl -MExtUtils::Manifest -e ExtUtils::Manifest::fullcheck
> 
> (possibly as a Makefile target) is useful.

If MANIFEST.SKIP is thought worthy, then the appended piece of tidying up is
a good idea.

Nicholas Clark
-- 
ETAXMANUNHAPPY http://www.ccl4.org/~nick/CV.html

--- Parrot/Jit/i386Generic.pm~  Sun Jan 20 20:52:23 2002
+++ Parrot/Jit/i386Generic.pm   Mon Jan 21 20:25:25 2002
@@ -110,6 +110,7 @@
 
 write_as($assembler,TMP_AS);
 assemble(TMP_AS, TMP_OBJ);
+unlink TMP_AS or warn "Could not unlink " . TMP_AS . ": $!";
 return disassemble(TMP_OBJ,\@special_arg,\@special,$ln);
 }
 



[PATCH] format warning in key.c

2002-01-21 Thread Nicholas Clark

We do mandate an ANSI conformant C compiler, don't we?

Appended patch cures these warnings:

key.c: In function `debug_key':
key.c:29: warning: int format, INTVAL arg (arg 3)
key.c:33: warning: int format, INTVAL arg (arg 3)
key.c:33: warning: int format, INTVAL arg (arg 4)
key.c:36: warning: int format, INTVAL arg (arg 3)
key.c:36: warning: int format, INTVAL arg (arg 4)


Nicholas Clark
-- 
ENOJOB http://www.ccl4.org/~nick/CV.html

--- key.c.orig  Mon Jan 14 20:32:54 2002
+++ key.c   Mon Jan 21 23:09:06 2002
@@ -26,14 +26,14 @@
 debug_key (struct Parrot_Interp* interpreter, KEY* key) {
   INTVAL i;
   fprintf(stderr," *** key %p\n",key);
-  fprintf(stderr," *** size %d\n",key->size);
+  fprintf(stderr," *** size " INTVAL_FMT "\n",key->size);
   for(i=0;isize;i++) {
 INTVAL type = key->keys[i].type;
 if(type == enum_key_bucket) {
-  fprintf(stderr," *** Bucket %d type %d\n",i,type);
+  fprintf(stderr," *** Bucket " INTVAL_FMT " type " INTVAL_FMT "\n",i,type);
 }
 else if(type != enum_key_undef) {
-  fprintf(stderr," *** Other %d type %d\n",i,type);
+  fprintf(stderr," *** Other " INTVAL_FMT " type " INTVAL_FMT "\n",i,type);
 }
   }
 }



Re: tainting nums (was Re: the handiness of undef becoming NaN (when you want that))

2002-01-21 Thread Larry Wall

Nicholas Clark writes:
: On Fri, Nov 09, 2001 at 09:14:10AM -0800, Larry Wall wrote:
: > NaN is merely the floating-point representation of undef when your
: > variable is stored in a bare num.  And if you declare a variable as
: > int, there may well be no representation for undef at all!  Similarly,
: > it may be impossible to taint an int or a num, unless we can figure
: > out a way to stuff such information into 0 bits.  But I'd like an
: > array of int or num to be compact.
: 
: Probably this is rather late, and possibly this is an internals issue, but
: isn't squeezing it in 0 bits as simple as having a parallel bit array for
: storing the taint bit for each array of int or array of num?
: (when tainting is enabled)

We could certainly do that.  But it's possible we could simply deem
numbers not to be a large security threat.  Numbers don't generally
contain a lot of shell metacharacters, for instance.  And most numeric
algorithms are pretty sensible about dealing with out-of-range
numbers.  I expect troublesome numbers like SSNs and telephone numbers
would mostly remain stored as strings.  It's possible that a
maliciously large number could cause excessive memory allocation, but
tainting doesn't check for that now...

Larry



Re: [PATCH] quieten many pmc warnings

2002-01-21 Thread Nicholas Clark

Something Jarkko has just sent to p5p reminded me of a comment I thought of
but failed to include in the e-mail

On Mon, Jan 21, 2002 at 10:47:20PM +, Nicholas Clark wrote:
> +  # No, include yourself to check your headers match your bodies

There must be a decent Baron Munchausen quote to replace the above
(from the part of the film where they are visiting the king and queen
of the moon)

Nicholas Clark
-- 
ECOPIOUSFREETIME http://www.ccl4.org/~nick/CV.html



Re: Benchmarking regexes

2002-01-21 Thread Steve Fink

On Mon, Jan 14, 2002 at 01:49:44AM -0800, Brent Dax wrote:
> I wrote a _very_ simple benchmark program to compare Perl 5 and Parrot.
> Here's the result of a test run on my machine:
> 
> C:\brent\Visual Studio Projects\Perl 6\parrot\parrot>..\benchmark
> Benchmarking "bbcdefg" =~ /b[cde]*.f/...
>  perl: 0.03000 seconds for 10_000 iters
>parrot: 0.24100 seconds for 10_000 iters
> Best: perl, worst: parrot. Spread of 0.21100.
> 
> The program is attached; it requires my latest regex patch to work.  You
> may need to tr{\\}{/} in a few places to get it to work on Unix systems.

Are you compiling with optimization? I have my own implementation I've
been toying with, and the first time I benchmarked it, it was pretty
much identical to yours (a little surprising, considering I was
benchmarking a totally different expression!) Then I noticed that I
had compiled it without optimization and tried again with -O3, and the
gap narrowed significantly.

With mine, I am currently seeing:

Benchmarking "xxabbbxx" =~ /ab+a*b/...
 perl: 1.20323 seconds for 500_000 iters
   parrot: 2.87138 seconds for 500_000 iters

Mine doesn't yet handle character classes, so I can't do a direct
comparison. If you want, you can send me an rx.ops implementation of
/ab+a*b/ and I'll report the timings of all three. (This isn't a very
fair benchmark, though, because perl5's optimizations come into play
with this one, and neither of our engines has a "scan for exact
string" op that would let us emulate the optimized expression.)

I notice that string_ord() is taking up a pretty big chunk of time.
Which isn't too surprising, considering that string_index() is

return s->encoding->decode(s->encoding->skip_forward(s->bufstart, idx))

which is more levels of indirection than you can shake a stick at. And
that makes me wonder if we can ever compete fairly with perl5 without
implementing a binary buffer matching mode. Seems like we're always
paying a penalty for doing "proper" string matching by going through
all these levels of encoding.

My RE engine is still pretty rudimentary, but I'll mail a patch to
anyone who wants to take a look at it. The core really isn't much
different from Brent's rx stuff; I think his is slightly more
explicit. The internal wiring is likely to be rather different,
though.



Re: [dha@panix.com: Re: ^=~]

2002-01-21 Thread Damian Conway

> Hmm.  A hyperdwim operator.  So that means that
> 
> @result = @a ^=~ @b
> 
> is the same as
> 
> @result = map -> $a; $b { $a =~ $b } (@a; @b)
> 
> Or something resembling that that actually works...
> 
> Hmm.  I suppose it could be argued that a C in list context:
> 
> @result = for @a; @b -> $a; $b { $a =~ $b }
> 
> should have the same effect.  I almost like that.

It would be nice to retire C.
And list-context C is an elegant and more-powerful solution.

However, it doesn't generalize to cover C well, or C,
or C, C, or C (or whatever you're actually
going to call the various multidimensional restructuring methods ;-).

You *could* instead consider reversing the arguments to all the list
manipulation operators:

@result = map @data { mapping() }
@result = grep @data { selector() };
@result = sort @data { comparison() };
@result = reduce @data { combinator() };

Then you would have the possibility of:

@result = map @data -> $x { mapping($_, $x) }
@result = map @data1;@data2 -> $x1;$x2 { $x1 =~ $x2 }
@result = grep @data -> $a, $b { two_at_a_time($a, $b) };

In addition, the slightly odd, right-to-left evaluation of pipelined
data would be (partially) rectified. The Perl5-ish:

@sorted_by_size = 
map { $_->[0] } 
sort { $a->[1] <=> $b->[1] } 
map { [$_, -s] } 
@files;

would become:

@sorted_by_size = 
map sort map @files
{ [$_, -s] }
{ $^a[1] <=> $^b[1] }
{ $_[0] };

Though I suppose the method call approach solves that nicely too:

@sorted_by_size = 
@files.map( { $_[0] })
  .sort({ $^a[1] <=> $^b[1] })
  .map( { [$_, -s] });

with the operations reading left-to-right, down the page.


Of course, now that I consider it, also solves most of the original problem:

@result = @data.map(->$x { mapping($_, $x) });
@result = @data.grep(->$a,$b { two_at_a_time($a, $b) });

It would be nice if it also solved the "parallel iteration" case, but
I guess that's rare and complex enough that a real C is warranted anyway.

I suppose this discussion also raises the vexed question whether ??::
can also be put out to pasture in favour of:

$val = if $x { 1 } else { 2 }

Or if the latter should perhaps be added as an alternative to:

$val = $x ?? do{ something_more_complex_than_an_expression }
  :: do{ operation_at_lower_precedence_than_ternary };

at least.

Not to mention:

@squares = while <> { $_**2 };


Damian



Re: [PATCH] format warning in key.c

2002-01-21 Thread Steve Fink

All of your last several patches look good to me. Didn't Dan give you
commit rights yet? I'm pretty sure he intended to. Dan was also going
to have a discussion of commit policy -- when should we just commit,
and when should we discuss first -- as soon as he gets more settled,
but my vote would be to commit all these cleanup patches. (Including
the unsigned characters but signed digits one.)



Re: [PATCH] format warning in key.c

2002-01-21 Thread Dan Sugalski

At 11:10 PM + 1/21/02, Nicholas Clark wrote:
>We do mandate an ANSI conformant C compiler, don't we?

Yep. If we haven't given you commit rights, go over to dev.perl.org 
and get an account. Then mail me the account name and we'll fix that.
-- 

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [PATCH] format warning in key.c

2002-01-21 Thread Dan Sugalski

At 3:56 PM -0800 1/21/02, Steve Fink wrote:
>All of your last several patches look good to me. Didn't Dan give you
>commit rights yet? I'm pretty sure he intended to. Dan was also going
>to have a discussion of commit policy -- when should we just commit,
>and when should we discuss first -- as soon as he gets more settled,
>but my vote would be to commit all these cleanup patches. (Including
>the unsigned characters but signed digits one.)

All patches that clean up warnings, style gaffes, and add correct 
comments can just go in. Commits in areas you (the generic you, here) 
have some responsibility for (Brent with the RE code, Jeff Goff for 
PMC stuff, Melvin for IO, for example) can also go in if you're 
comfortable with them. The rest use your judgement cautiously, and if 
you're not sure pop a note to the list and we can go from there.
-- 

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: [PATCH] format warning in key.c [APPLIED]

2002-01-21 Thread Dan Sugalski

At 11:10 PM + 1/21/02, Nicholas Clark wrote:
>We do mandate an ANSI conformant C compiler, don't we?
>
>Appended patch cures these warnings:

Oh, and applied. Thanks.
-- 

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



RE: Apoc4: The loop keyword

2002-01-21 Thread Michael Percy

Graham Barr wrote:
> On Mon, Jan 21, 2002 at 03:58:49PM -0500, Michael G Schwern wrote:

Case 1:
> > do {
> > if my $foo = bar() {
> > ...
> > }
> > }

Case 2:
>   if ((my $foo = bar()) eq 'foo') {
> ...
>   }
> 
>   if ($foo eq 'bar') {
> ...
>   }

Despite fear of joining in a religious argument, to me Case 1 above is much
more painful (both in readability and typing) than fixing Case 2 with a
single line would be.

More importantly, named arguments in subroutines would break lexical scoping
according to the "only inside the braces" definition. IMHO, what we
sacrifice (larger private scope + maintainability + paramed subs...) seems
more dear than what we gain (parsing purity?).

Regards,
Michael Percy




Re: [PATCH] format warning in key.c

2002-01-21 Thread Bryan C. Warnock

On Monday 21 January 2002 19:06, Dan Sugalski wrote:
> Commits in areas you (the generic you, here)
> have some responsibility for (Brent with the RE code, Jeff Goff for
> PMC stuff, Melvin for IO, for example) can also go in if you're
> comfortable with them. 

That should probably be amended with "only" in there somewhere.  Perhaps in 
multiple places.  (Commits in one area that are depending on or dictating 
design in another should probably get some sort of feedback, too.)


-- 
Bryan C. Warnock
[EMAIL PROTECTED]



[APPLIED] Re: [PATCH] are characters unsigned?

2002-01-21 Thread Alex Gough

On Mon, 21 Jan 2002, Nicholas Clark wrote:

> I thought that it should be this
>
> INTVAL (*get_digit)(UINTVAL c);
>
> not this
>
> UINTVAL (*get_digit)(UINTVAL c);
>

It seems you thought both, I've made a small modification and applied
the patch, thanks.

Alex Gough




Re: [PATCH] warnings in test_main.c

2002-01-21 Thread Alex Gough

On Mon, 21 Jan 2002, Nicholas Clark wrote:

> Before:

lots.

> After:

less.


Applied, thanks.

Alex Gough




Re: [PATCH] are characters unsigned?

2002-01-21 Thread Melvin Smith

At 09:41 PM 1/21/2002 +, Nicholas Clark wrote:
>I am of the opinion that they are UINTVAL, not INTVAL. (and EOF being a
>negative value such as -1 is only needed for C stdio, and I seem to remember
>that Dan has strong opinions on C stdio, and what C can do with it)

Specifically Dan has declared Parrot shall not include stdio by default.
This doesn't stop us from adding a stdio wrapper layer later.

I did see someone mention that they thought "miniparrot" or whatever, might
require us to use a stdio wrapper but I'm not convinced that is the case,
unless that system's _only_ API is stdio. Are there any like that? I know 
of none.

I can speak for only the systems I know (mostly UNIXish) and the low
level file calls don't indicate EOF by returning an "EOF" char anyway,
they indicate by returning 0 bytes on a read attempt. STDIO is the library
that implements the (int)EOF char.

If we implement getc/getchar type calls around read, then we probably
have to implement an EOF value but there is no reason we can't cast the
the unsigned to signed -1, right?

-Melvin




Re: [PATCH] quieten many pmc warnings

2002-01-21 Thread Alex Gough

On Mon, 21 Jan 2002, Nicholas Clark wrote:

> This eliminates many gcc warnings from pmc code by

Applied, thanks.

Alex Gough




String/null terminations

2002-01-21 Thread Melvin Smith

While a few people active, can someone "re-clue" me in on intentions
of string handling. I'd like to stick a couple of calls in the string lib
to:
1) Terminate a string's current buffer if there is room
2) Create a local or alloced buffer with a null terminated string.

These calls would only be used for when there were calls expecting C strings.

Else all low-level code has to do its own copying/dinking with the buffers.
I'll submit a patch but since String stuff isn't my area I'd rather whoever
is maintaining it let me know how they want to handle it.

-Melvin




Some Apocalypse 4 exception handling questions.

2002-01-21 Thread Tony Olekshy

In Apocalypse 4, Larry Wall wrote:
|
|   In fact, a C of the form:
|
|   CATCH {
|   when xxx { ... }  # 1st case
|   when yyy { ... }  # 2nd case
|   ...   # other cases, maybe a default
|   }
|
|means something vaguely like:
|
|   BEGIN {
|   %MY.catcher = {
|   given current_exception() -> $! {
|
|   when xxx { ... }  # 1st case from above
|   when yyy { ... }  # 2nd case from above
|   ...   # other cases, maybe a default
|
|   die;# rethrow $! as implicit default
|   }
|   $!.markclean;   # handled cleanly, in theory
|   }
|   }

Beautiful. The synthesis of CATCH, BEGIN blocks, %MY, given, when,
break, dwim =~, die, $!, $!.clean, and $!.stack is awe-inspiring.
The way proto-exceptions, fail, and use fatal work together is also
brilliant.

I particularly enjoyed this one:

   CATCH { when @$! =~ Foo { ... } }

I do have a few questions.

   1. Does this example:

  {
  my $p = P.new;   LAST { $p and $p.Done; }
  foo();
  my $q = Q.new;   LAST { $q and $q.Done; }
  ...
  }

  effectively get compiled into something like:

  {
  my $p;  my $q;
  $p = P.new;   LAST { $p and $p.Done; }
  foo();
  $q = Q.new;   LAST { $q and $q.Done; }
  ...
  }

  If not, how can we evaluate $q in the LAST block if foo() dies?
  Or are LASTs not handled by a magic BEGIN mechanism? Or are the
  LASTs converted into a BEGIN plus some run-time state variable
  that is only set when the LAST is encountered during execution?
  Or am I missing the point entirely ;-?

   2. Consider the following example:

  for my $file ( @files ) {
  my $f is last { close } = open $file or next;
  foo($f);
  CATCH { default { print "foo($f) failed\n" } }
  }

  The last and CATCH blocks must be invoked at the end of each
  time around the for block, no? Or should I be writing:

  for my $file ( @files ) {
  try {
  my $f is last { close } = open $file or next;
  foo($f);
  CATCH { default { print "foo($f) failed\n" } }
  }
  }

   3. Would the following execute the C? When do I have to worry
  about "accidentally" catching control exceptions?

  sub ...
  {
  return if 1;
  fragile();
  CATCH { default { die "Couldn't fragile." } }
  }

   4. The test for "block exited successfully" is C< !$! || $!.clean >,
  for the purposes of the block-end handing code, correct?  So

  KEEP is like LAST { if ( !$! || $!.clean ) { ... } }
  and
  UNDO is like LAST { unless ( !$! || $!.clean ) { ... } }

  in which case CATCH is actually like UNDO with an implied given,
  die, and $!.markclean, except it's handled in a different end-
  block order, yes?

   5. What is the order of processing all these special blocks at the
  end of their containing block? Is it:

  1. CONTINUE
  2. CATCH
  3. KEEP
  4. UNDO
  5. LAST
  6. POST

  or some other fixed order, or is there some sort of order-of-
  encounter interleaving of some of the kinds of blocks?

   6. What is the value of

  my $x = try { "1" CATCH { default { "2" } } LAST { "3" } };

  What happens for each permutation of replacing "n" by die "n"?

   7. Is there any particular reason why multiple CATCH blocks can't
  simply be queued in some fashion like multiple LAST blocks?

Yours, &c, Tony Olekshy





RE: Benchmarking regexes

2002-01-21 Thread Brent Dax

Steve Fink:
# On Mon, Jan 14, 2002 at 01:49:44AM -0800, Brent Dax wrote:
# > I wrote a _very_ simple benchmark program to compare Perl 5
# and Parrot.
# > Here's the result of a test run on my machine:
# >
# > C:\brent\Visual Studio Projects\Perl 6\parrot\parrot>..\benchmark
# > Benchmarking "bbcdefg" =~ /b[cde]*.f/...
# >  perl: 0.03000 seconds for 10_000 iters
# >parrot: 0.24100 seconds for 10_000 iters
# > Best: perl, worst: parrot. Spread of 0.21100.
# >
# > The program is attached; it requires my latest regex patch
# to work.  You
# > may need to tr{\\}{/} in a few places to get it to work on
# Unix systems.
#
# Are you compiling with optimization? I have my own implementation I've
# been toying with, and the first time I benchmarked it, it was pretty
# much identical to yours (a little surprising, considering I was
# benchmarking a totally different expression!) Then I noticed that I
# had compiled it without optimization and tried again with -O3, and the
# gap narrowed significantly.

I tried it once and did see the gap narrow some, but I keep forgetting
to re-enable it as I modify things and rebuild.  BTW, it's probably
better to use -O, which will let the compiler choose the best
optimization level.  -O3 forces it to optimize to level 3 or give up
completely.

# With mine, I am currently seeing:
#
# Benchmarking "xxabbbxx" =~ /ab+a*b/...
#  perl: 1.20323 seconds for 500_000 iters
#parrot: 2.87138 seconds for 500_000 iters
#
# Mine doesn't yet handle character classes, so I can't do a direct
# comparison.

What a shame.  Character classes are the funnest part of it!  ;^)

(To be fair, rx_oneof sat empty for a very long time.  It's the hardest
matching op to implement.)

# If you want, you can send me an rx.ops implementation of
# /ab+a*b/ and I'll report the timings of all three. (This isn't a very
# fair benchmark, though, because perl5's optimizations come into play
# with this one, and neither of our engines has a "scan for exact
# string" op that would let us emulate the optimized expression.)

Once Parrot gets an index() op based on a fast string search algorithm,
that will become a non-issue.  Also, I seem to remember that somebody
was at least trying to figure out what would be necessary to disable
regex optimizations in Perl 5.

Untested implementation of {"xxabbb"=~/ab+a*b/ for(I0=500_000; I0;
I0--)}:

set I0, 50
set S0, "xxabbbxx"
rx_allocinfo P0, S0
time N0
print N0
$top:
bsr RX_0
rx_clearinfo P0, S0
dec I0
if I0, $top

time N0
print N0
rx_freeinfo P0

RX_0:
rx_setprops "", 3
branch $start
$advance:
rx_advance P0, $fail
$start:
rx_literal P0, "ab", $advance
rx_pushmark P0
$top1:
rx_literal P0, "b", $next1
rx_pushindex P0
branch $top1
$back1:
rx_popindex P0, $advance
$next1:
rx_pushmark P0
$top2:
rx_literal P0, "a", $next2
rx_pushindex P0
branch $top2
$back2:
rx_popindex P0, $back1
$next2:
rx_literal P0, "b", $back2
rx_success P0
ret
$fail:
rx_fail P0
ret

# I notice that string_ord() is taking up a pretty big chunk of time.
# Which isn't too surprising, considering that string_index() is
#
# return
# s->encoding->decode(s->encoding->skip_forward(s->bufstart, idx))

It could also be that we're calling it so damn much.  Even a function
that's just {return a+b;} will take a lot of time if it's called
eleventy jillion times.

# which is more levels of indirection than you can shake a stick at. And
# that makes me wonder if we can ever compete fairly with perl5 without
# implementing a binary buffer matching mode. Seems like we're always
# paying a penalty for doing "proper" string matching by going through
# all these levels of encoding.

I really don't want to start mucking around in string internals.  OTOH,
I'm planning on forcing everything to utf32 Normalization Form KC, so it
may not be too big of a problem.

# My RE engine is still pretty rudimentary, but I'll mail a patch to
# anyone who wants to take a look at it. The core really isn't much
# different from Brent's rx stuff; I think his is slightly more
# explicit. The internal wiring is likely to be rather different,
# though.

Send me a copy.  There's sure to be at least a few things in it that are
better implemented, if not the whole thing.

--Brent Dax
[EMAIL PROTECTED]
Parrot Configure pumpking and regex hacker

 . hawt sysadmin chx0rs
 This is sad. I know of *a* hawt sysamin chx0r.
 I know more than a few.
 obra: There