date:20050512

C<::> in rules

2005-05-12 Thread Patrick R. Michaud

I have a couple of questions regarding C< :: > in perl 6 rules.
First, a question of verification -- in

$rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ;

"travel by plane jet train tgv today" ~~ $rule

I think the match should fail outright, as opposed to matching "train tgv".
In other words, it acts as though one had written

$rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;

and not

$rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

Does this sound right?

Next on my list, S05 says "It is illegal to use :: outside of 
an alternation", but A05 has

/[:w::foo bar]/

which leads me to believe that :: isn't illegal here even though there's
no alternation.  I'd like to strike that sentence from S05.

Also, A05 proposes incorrect alternatives to the above 

/[:w[]foo bar]/# null pattern illegal, use 
/[:w()foo bar]/# null capture illegal, and probably undesirable
/[:w\bfoo bar]/# not exactly the same as above

I'd like to remove those from A05, or at least put an "Update:"
note there that doesn't lead people astray.  One option not
mentioned in A05 that we can add there is

/[:wfoo bar]/  

which is admittedly ugly.

So, now then, on to the item that got me here in the first place.
The upshot of all of the above is that 

rx :w /foo bar/

is not equivalent to

rx /:w::foo bar/

which may surprise a few people.  The :: at the beginning of
the pattern effectively anchors the match to the beginning of
the string or the current position -- i.e., it eliminates the
implicit C< .*? > at the start of the match.  To put the :w
inside the rule (e.g., in a variable or subrule), one would
have to write it as

rx /[:w::foo bar]/
rx /:wfoo bar/

Now then, I don't have a problem at all with this outcome -- 
but I wanted to let p6l verify my interpretation of things and
make sure it's okay for me to adjust S05/A05 accordingly.

Pm

Re: split /(..)*/, 1234567890

2005-05-12 Thread TSa (Thomas Sandlaß)

Autrijus Tang wrote:
pugs> split /(..)*/, 1234567890
('', '12', '34', '56', '78', '90')
Is this sane?
Why the empty string match at the start?
--
$TSa == all( none( @Larry ), one( @p6l ))

Re: split /(..)*/, 1234567890

2005-05-12 Thread Autrijus Tang

On Thu, May 12, 2005 at 04:53:06PM +0200, "TSa (Thomas Sandlaï)" wrote:
> Autrijus Tang wrote:
> >pugs> split /(..)*/, 1234567890
> >('', '12', '34', '56', '78', '90')
> >
> >Is this sane?
> 
> Why the empty string match at the start?

I don't know, I didn't invent that! :-)

$ perl -le 'print join ",", split /(..)/, 123'
,12,3

Thanks,
/Autrijus/


pgp1jc7nwc0Zz.pgp
Description: PGP signature

Re: split /(..)*/, 1234567890

2005-05-12 Thread TSa (Thomas SandlaÃ)

Autrijus Tang wrote:
I don't know, I didn't invent that! :-)
$ perl -le 'print join ",", split /(..)/, 123'
,12,3
Hmm,
perl -le 'print join ",", split /(..)/, 112233445566'
,11,,22,,33,,44,,55,,66
For longer strings it makes every other match an empt string.
With the "Positions between chars" interpretation the above
string is with '.' indication position:
.1.1.2.2.3.3.4.4.5.5.6.6.
0 1 2 3 4 5 6 7 8 9 1 1 1
0 1 2
There are two matches each at 0, 2, 4, 6, 8 and 10.
The empty match at the end seams to be skipped because
position 12 is after the string? And for odd numbers of
chars the before last position doesn't produce an empty
match:
perl -le 'print join ",", split /(..)/, 11223'
,11,,22,3
Am I the only one who finds that inconsistent?
--
TSa (Thomas Sandlaß)

Re: split /(..)*/, 1234567890

2005-05-12 Thread David Storrs

On May 12, 2005, at 11:59 AM, Autrijus Tang wrote:
On Thu, May 12, 2005 at 04:53:06PM +0200, "TSa (Thomas Sandlaï)"  
wrote:
Autrijus Tang wrote:
   pugs> split /(..)*/, 1234567890
   ('', '12', '34', '56', '78', '90')
Is this sane?
Why the empty string match at the start?
I don't know, I didn't invent that! :-)
$ perl -le 'print join ",", split /(..)/, 123'
,12,3
This makes sense when I think about what split is doing, but it is  
surprising at first glance.  Perhaps this should be included as an  
example in the docs?

--Dks

Re: C<::> in rules

2005-05-12 Thread Aaron Sherman

My take, based on S05:

On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote:
> I have a couple of questions regarding C< :: > in perl 6 rules.
> First, a question of verification -- in
> 
> $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ;
> 
> "travel by plane jet train tgv today" ~~ $rule
> 
> I think the match should fail outright, as opposed to matching "train tgv".

Correct, that's the meaning of ::

S05: "Backtracking over a double colon causes the surrounding group of
alternations to immediately fail:"

Your surrounding group is the entire rule, and thus you fail at that
point.

> In other words, it acts as though one had written
> 
> $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
> 
> and not
> 
> $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

Your two examples fail in the same way because of the fact that the
group IS the whole rule.

> Next on my list, S05 says "It is illegal to use :: outside of 
> an alternation", but A05 has
> 
> /[:w::foo bar]/

I can't even figure out what that means. :w turns on word mode
(lexically scoped per S05) and "::" is a group-level commit. What are we
committing exactly? Looks like a noop to me, which actually might not be
so bad. However, you're right: this is an error as there are no
alternations.

> which leads me to believe that :: isn't illegal here even though there's
> no alternation.  I'd like to strike that sentence from S05.

I don't think it should be removed. You can always use ::: if that's
what you wanted.

> Also, A05 proposes incorrect alternatives to the above 
> 
> /[:w[]foo bar]/# null pattern illegal, use 

Correct.

> /[:w()foo bar]/# null capture illegal, and probably undesirable

Correct.

> /[:w\bfoo bar]/# not exactly the same as above

No, I think that's exactly the same.

> So, now then, on to the item that got me here in the first place.
> The upshot of all of the above is that 
> 
> rx :w /foo bar/
> 
> is not equivalent to
> 
> rx /:w::foo bar/

If we feel strongly, it could be special-cased, but your  solution
seems fine to me.

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

Re: split /(..)*/, 1234567890

2005-05-12 Thread Aaron Sherman

On Thu, 2005-05-12 at 12:22, David Storrs wrote:
> On May 12, 2005, at 11:59 AM, Autrijus Tang wrote:
> > On Thu, May 12, 2005 at 04:53:06PM +0200, "TSa (Thomas Sandlaï)"  
> > wrote:
> >> Autrijus Tang wrote:
> >>
> >>>pugs> split /(..)*/, 1234567890
> >>>('', '12', '34', '56', '78', '90')

> >> Why the empty string match at the start?

> > I don't know, I didn't invent that! :-)

> > $ perl -le 'print join ",", split /(..)/, 123'
> > ,12,3
> 
> This makes sense when I think about what split is doing, but it is  
> surprising at first glance.  Perhaps this should be included as an  
> example in the docs?

perldoc -f split says:

"Splits a string into a list of strings and returns that list.
By default, empty leading fields are preserved, and empty
trailing ones are deleted [...] If PATTERN is also omitted,
splits on whitespace (after skipping any leading whitespace).
[...] Empty leading (or trailing) fields are produced when there
are positive width matches at the beginning (or end) of the
string [...] As a special case, specifying a PATTERN of space ('
') will split on white space just as "split" with no arguments
does. Thus, "split(' ')" can be used to emulate awk's default
behavior, whereas "split(/ /)" will give you as many null
initial fields as there are leading spaces [...]"

And there you have it.

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

Re: split /(..)*/, 1234567890

2005-05-12 Thread Jonathan Scott Duff

On Thu, May 12, 2005 at 06:29:49PM +0200, "TSa (Thomas Sandlaß)" wrote:
> Autrijus Tang wrote:
> >I don't know, I didn't invent that! :-)
> >
> >$ perl -le 'print join ",", split /(..)/, 123'
> >,12,3
> 
> Hmm,
> 
> perl -le 'print join ",", split /(..)/, 112233445566'
> ,11,,22,,33,,44,,55,,66
> 
> For longer strings it makes every other match an empt string.

Not quite. The matching part are the strings "11", "22", "33", etc.
And since what matches is what we're splitting on, we get the empty
string between pairs of characters (including the leading empty
string).The only reason you're getting the string that was matched
in the output is because that's what you've asked split to do by
placing parens around the pattern.  (Type "perldoc -f split" at your
command prompt and read all about it)

To bring this back to perl6, autrijus' original query was regarding

$ pugs -e 'say join ",", split /(..)*/, 1234567890'

which currently generates a list of ('','12','34','56','78','90')
In perl5 it would generate a list of ('','90') because only the last
pair of characters matched is kept (such is the nature of quantifiers
applied to capturing parens). But in perl6 quantified captures put all
of the matches into an array such that "abcdef" ~~ /(..)*/ will make
$0 = ['ab','cd','ef']. 

I think that the above split should generate a list like this:

('', [ '12','34','56','78','90'])

Or, another example:

$ pugs -e 'say join ",", split /(<[abc]>)*/, "xabxbxbcx"'
# ('x', ['a','b'], 'x', ['b'], 'x', ['b','c'], 'x')

But that's just MHO.

> With the "Positions between chars" interpretation the above
> string is with '.' indication position:
> 
> .1.1.2.2.3.3.4.4.5.5.6.6.
> 0 1 2 3 4 5 6 7 8 9 1 1 1
> 0 1 2
> 
> There are two matches each at 0, 2, 4, 6, 8 and 10.
> The empty match at the end seams to be skipped because
> position 12 is after the string? 

No, the empty match at the end is skipped because that's the default
behaviour of split.  Preserve leading empty fields and discard empty
trailing ones.

> And for odd numbers of
> chars the before last position doesn't produce an empty
> match:
> perl -le 'print join ",", split /(..)/, 11223'
> ,11,,22,3

There's an empty field between the beginning of the string and "11",
there's an empty field between the "11" and the "22", and finally
there's a field at the end containing only "3"

> Am I the only one who finds that inconsistent?

Probably.  :-)

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

Re: split /(..)*/, 1234567890

2005-05-12 Thread Uri Guttman

> "JSD" == Jonathan Scott Duff <[EMAIL PROTECTED]> writes:

  JSD> To bring this back to perl6, autrijus' original query was regarding

  JSD>  $ pugs -e 'say join ",", split /(..)*/, 1234567890'

  JSD> which currently generates a list of ('','12','34','56','78','90')
  JSD> In perl5 it would generate a list of ('','90') because only the last
  JSD> pair of characters matched is kept (such is the nature of quantifiers
  JSD> applied to capturing parens). But in perl6 quantified captures put all
  JSD> of the matches into an array such that "abcdef" ~~ /(..)*/ will make
  JSD> $0 = ['ab','cd','ef']. 

  JSD> I think that the above split should generate a list like this:

  JSD>  ('', [ '12','34','56','78','90'])

i disagree. if you want complex tree results, use a rule. split is for
creating a single list of elements from a string. it is better keep
split simple for it is commonly used in this domain. tree results are
more for real parsing (which split is not intended to do) so use a
parsing rule for that.

also note the coding style rule (i think randal created it) which is to
use split when you want to throw things away (the delimiters) and m//
when you want to keep thinks.

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: split /(..)*/, 1234567890

2005-05-12 Thread Jody Belka

On Thu, May 12, 2005 at 06:29:49PM +0200, "TSa (Thomas Sandla?)" wrote:
> perl -le 'print join ",", split /(..)/, 112233445566'
> ,11,,22,,33,,44,,55,,66
[snipped]
> perl -le 'print join ",", split /(..)/, 11223'
> ,11,,22,3
> 
> Am I the only one who finds that inconsistent?

Maybe, but it's because you're misunderstanding what split does (i can
heartily recommend TFM in this case).

Let's start with a simpler case (inside debugger for help):

x split /../, 112233445566, -1   [ -1 to preserve all found fields ]

0  ''
1  ''
2  ''
3  ''
4  ''
5  ''
6  ''

Split uses the regular expression to find "seperators" in the text, and
then return the contents of the fields between them. The above case looks
like this:

 sepsepsepsepsepsep
 |  |  |  |  |  |
 11 22 33 44 55 66
  |  |  |  |  |  |
field  field  field  field  field  field

Ok, let's try that with your second example:

x split /../, 11223, -1

0 ''
1 ''
2 3

 sepsep
 |  |
 11 22  3
  |  |  |
field  field  field

Now, if the regular expression contains parentheses, additional list
elements are created from each matching substring (quoted almost verbatim
from TFM). So:

x split /(..)/, 112233445566, -1

0  ''
1  11
2  ''
3  22
4  ''
5  33
6  ''
7  44
8  ''
9  55
10  ''
11  66
12  ''

x split /(..)/, 11223, -1

0  ''
1  11
2  ''
3  22
4  3

And of course, if we remove the LIMIT from the equation, then any trailing
fields will be removed. Ergo the results quoted at the top of this email.
Hope this helps you (and anyone else who might have been confused) understand
what is going on.

J

-- 
Jody Belka
knew (at) pimb (dot) org

Re: C<::> in rules

2005-05-12 Thread Jonathan Scott Duff

On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:
> On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote:
> > Next on my list, S05 says "It is illegal to use :: outside of 
> > an alternation", but A05 has
> > 
> > /[:w::foo bar]/
> 
> I can't even figure out what that means. :w turns on word mode
> (lexically scoped per S05) and "::" is a group-level commit. What are we
> committing exactly? Looks like a noop to me, which actually might not be
> so bad. However, you're right: this is an error as there are no
> alternations.

I think the definition of :: needs to be changed slightly.  You even
used a phrase that isn't exactly true according to spec but would be
if :: meant what I think it should mean.   That phrase is ":: is a
group-level commit".  This isn't how I read S05 (and apparently how
you and others read it as well, hence your comment to Pm that there
are no alternations).  S05 says:

Backtracking over a double colon causes the surrounding group of
alternations to immediately fail:

I think it should simply read:

Backtracking over a double colon causes the surrounding group to
immediately fail:

In other words, the phrase "of alternations" is a red herring.

> > which leads me to believe that :: isn't illegal here even though there's
> > no alternation.  I'd like to strike that sentence from S05.
> 
> I don't think it should be removed. You can always use ::: if that's
> what you wanted.

I too think it should be stricken.

> > /[:w\bfoo bar]/# not exactly the same as above
> 
> No, I think that's exactly the same.

What does \b mean again?  I assume it's no longer backspace?

> > So, now then, on to the item that got me here in the first place.
> > The upshot of all of the above is that 
> > 
> > rx :w /foo bar/
> > 
> > is not equivalent to
> > 
> > rx /:w::foo bar/
> 
> If we feel strongly, it could be special-cased, but your  solution
> seems fine to me.

If :: were to fail the surrounding group we can say that a rule
without [] or () is an implicit group for :: purposes.

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

Re: C<::> in rules

2005-05-12 Thread Patrick R. Michaud

On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:
> My take, based on S05:
> 
> > In other words, it acts as though one had written
> > 
> > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
> > 
> > and not
> > 
> > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;
> 
> Your two examples fail in the same way because of the fact that the
> group IS the whole rule.

False.  In the first case the group is the whole rule.  In the second
case the group would not include the (implied) '.*?' at the start of
the rule.  Perhaps it helps to see the difference if I write it this way:

$rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/;

Note that the rule is *unanchored*, thus it tries at the first character,
if it fails then it goes to the second character, if that fails it goes
to the third, etc.  Thus, given:

  $rule1 = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
  $rule2 = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

  "travel by plane jet train tgv today" ~~ $rule1;   # fails
  "travel by plane jet train tgv today" ~~ $rule2;   # matches "train tgv"

They're not equivalent.

> > Next on my list, S05 says "It is illegal to use :: outside of 
> > an alternation", but A05 has
> > 
> > /[:w::foo bar]/
> 
> I can't even figure out what that means. :w turns on word mode
> (lexically scoped per S05) and "::" is a group-level commit. What are we
> committing exactly? Looks like a noop to me, which actually might not be
> so bad. 

Yes, the point is that it's a no-op, because

/[:wfoo bar:]/

is something entirely different.

> > /[:w\bfoo bar]/# not exactly the same as above
> 
> No, I think that's exactly the same.

Nope.  Consider:  

 $foo = rx /[:w::foo bar]/
 $baz = rx /[:w\bfoo bar]/

 "myfoo bar" ~~ $foo  # matches
 "myfoo bar" ~~ $baz  # fails, foo is not on a word boundary

Pm

Re: C<::> in rules

2005-05-12 Thread Patrick R. Michaud

On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote:
> 
> > > /[:w\bfoo bar]/# not exactly the same as above
> > 
> > No, I think that's exactly the same.
> 
> What does \b mean again?  I assume it's no longer backspace?

For as long as I can remember \b has meant "word boundary" in
regular expressions.  :-) :-)

Pm

Re: C<::> in rules

2005-05-12 Thread Jonathan Scott Duff

On Thu, May 12, 2005 at 12:48:16PM -0500, Patrick R. Michaud wrote:
> On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote:
> > 
> > > > /[:w\bfoo bar]/# not exactly the same as above
> > > 
> > > No, I think that's exactly the same.
> > 
> > What does \b mean again?  I assume it's no longer backspace?
> 
> For as long as I can remember \b has meant "word boundary" in
> regular expressions.  :-) :-)

Doh!  See how the shiny new perl6 confuses?  ;-)

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

Re: split /(..)*/, 1234567890

2005-05-12 Thread Jonathan Scott Duff

On Thu, May 12, 2005 at 01:12:26PM -0400, Uri Guttman wrote:
> > "JSD" == Jonathan Scott Duff <[EMAIL PROTECTED]> writes:
> 
>   JSD> To bring this back to perl6, autrijus' original query was regarding
> 
>   JSD>$ pugs -e 'say join ",", split /(..)*/, 1234567890'
> 
>   JSD> which currently generates a list of ('','12','34','56','78','90')
>   JSD> In perl5 it would generate a list of ('','90') because only the last
>   JSD> pair of characters matched is kept (such is the nature of quantifiers
>   JSD> applied to capturing parens). But in perl6 quantified captures put all
>   JSD> of the matches into an array such that "abcdef" ~~ /(..)*/ will make
>   JSD> $0 = ['ab','cd','ef']. 
> 
>   JSD> I think that the above split should generate a list like this:
> 
>   JSD>('', [ '12','34','56','78','90'])
> 
> i disagree. if you want complex tree results, use a rule.

Well ... we *are* using a rule; it just doesn't have a name.

So, would you advocate too that 

my @a = "foofoofoobarbarbar" ~~ /(foo)+ (bar)+/;

should flatten? thus @a = ('foo','foo','foo','bar','bar','bar')
rather than (['foo','foo','foo'],['bar','bar','bar]) ?

This may have even been discussed before but we should probably make
the determination as to whether or not we keep the delimiters be
something other than the presence or absense of parentheses in the
pattern.  Perhaps the flattening/non-flattening behavior could be
modulated the same way.  Probably as a modifier to split

> split is for creating a single list of elements from a string. it is
> better keep split simple for it is commonly used in this domain.

I'll wager that splits with non-capturing patterns are far and away the
most common case. :-)

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

Re: C<::> in rules

2005-05-12 Thread Uri Guttman

> "PRM" == Patrick R Michaud <[EMAIL PROTECTED]> writes:

  PRM> On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote:
  >> 
  >> > > /[:w\bfoo bar]/# not exactly the same as above
  >> > 
  >> > No, I think that's exactly the same.
  >> 
  >> What does \b mean again?  I assume it's no longer backspace?

  PRM> For as long as I can remember \b has meant "word boundary" in
  PRM> regular expressions.  :-) :-)

except in char classes where it gets its backspace meaning back.

:-)

uri

-- 
Uri Guttman  --  [EMAIL PROTECTED]   http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs    http://jobs.perl.org

Re: C<::> in rules

2005-05-12 Thread Aaron Sherman

On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote:
> On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:

> > > In other words, it acts as though one had written
> > > 
> > > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
> > > 
> > > and not
> > > 
> > > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;
> > 
> > Your two examples fail in the same way because of the fact that the
> > group IS the whole rule.
> 
> False.  In the first case the group is the whole rule.  In the second
> case the group would not include the (implied) '.*?' at the start of
> the rule.

That cannot be true. If it were, then:

s/[a]//

and

s/a//

would replace different things, and they MUST NOT. If I've missed some
fundamental way in which rx:p5/(?:...)/ is different from rx/[...]/,
then please let me know. Otherwise, we can simply demonstrate this with
P5:

perl -le '"abcaabbcc" =~ /(?:aa)/;print $&'

and unshockingly, that prints "aa", not "abcaa"

> Note that the rule is *unanchored*, thus it tries at the first character,
> if it fails then it goes to the second character, if that fails it goes
> to the third, etc.  

Yes, you're correct, but when you step forward over input in order to
find a start for your unanchored expression, you do NOT consume that
input, grouping or not. To say:

$foo ~~ /unanchored/

is something like

for 0..length($foo)-1 -> $i {
substr($foo,$i) ~~ /^unanchored/;
}

and always has been. Unless I'm unaware of some subtlety of [], it is
just the same as P5's (?:...), which behaves exactly this way.

I'll skip the rest of your post for now, except for the last bit, since
I think we need to resolve which universe we're in before we can give
each other street directions ;-)

> > > /[:w\bfoo bar]/# not exactly the same as above
> > 
> > No, I think that's exactly the same.
> 
> Nope.  Consider:  
> 
>  $foo = rx /[:w::foo bar]/
>  $baz = rx /[:w\bfoo bar]/
> 
>  "myfoo bar" ~~ $foo  # matches
>  "myfoo bar" ~~ $baz  # fails, foo is not on a word boundary

You're correct, sorry about that.

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

Question on "is chomped"

2005-05-12 Thread Joshua Gatcomb

While E02 states that "is chomped" sets the chomped property of
afilehandle, I didn't find any detailed specifications in any of the
As or Ss.

So - is "is chomped" always the equivalent of:

while (  ) {
chomp;
}

For instance - you have opened the file rw

Or, the idea of having mutator and non-mutator versions of chomp (and
other functions) have been kicked around the list.

Any definitive word yet?

Cheers,
Joshua Gatcomb
a.k.a. L~R

Re: split /(..)*/, 1234567890

2005-05-12 Thread Larry Wall

On Thu, May 12, 2005 at 12:03:55PM -0500, Jonathan Scott Duff wrote:
: I think that the above split should generate a list like this:
: 
:   ('', [ '12','34','56','78','90'])

Yes, though I would think of it more generally as

('', $0, '', $0, '', $0, ...)

where in this case it just happens to be

('', $0)

and $0 expands to ['12','34','56','78','90'] if you treat it as an array.

Larry

S29: punt

2005-05-12 Thread Rod Adams

It looks like I'm going to have to punt on finishing S29.
I'm finding myself in a perpetual state of either no time to work on it, 
or when there is time, having insufficient brain power left to properly 
assimilate everything that needs to be considered to do any of the 
functions justice. Looking ahead, I do not see this state changing for 
the better in the foreseeable future.

It's my hope that someone none(@Larry) can and will pick this effort up. 
I will give whatever assistance I can to anyone choosing to do so. Drop 
me a line.

-- Rod Adams

Numification of captured match

2005-05-12 Thread Autrijus Tang

Thit has led to surprising results in Pugs's Net::IRC:

if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
my $socket = connect($0, $1);
}

If $1 is a match object here, and connect() assumes Int on its second
argument, then it will connect to port 1, as the match object numifies
to 1 (indicating a successful match).

I "fixed" this for 6.2.3 by flattening $0, $1, $2 into plain scalars
(for nonquantified matches), and use $/[0] etc to store match objects,
but I'm not sure this treatment is right.

Is it really intended that we get into habit of writing this?

if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
my $socket = connect(~$0, +$1);
}

It looks... weird. :)

Thanks,
/Autrijus/


pgpwUSQZmM4vw.pgp
Description: PGP signature

Re: C<::> in rules

2005-05-12 Thread Patrick R. Michaud

$rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
$rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;

On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote:
> On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote:
> > On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote:
> > > Your two examples fail in the same way because of the fact that the
> > > group IS the whole rule.
> > 
> > False.  In the first case the group is the whole rule.  In the second
> > case the group would not include the (implied) '.*?' at the start of
> > the rule.
> 
> That cannot be true. If it were, then:
>   s/[a]//
> and
>   s/a//
> would replace different things, and they MUST NOT. 

No, /[a]/ is still the same as /a/ here -- I'm not discussing that at
all, nor am I implying any special [] or rule semantics.   I'm simply
referring to the fact that the rule is free to step across the
characters in the string, same as you pointed out.

Let me backtrack(!) and try a slightly different example, 
first using a group and (::)

$r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/;

"abcdef"  ~~ $r1 # matches "abcdef"
"xyzghijkl" ~~ $r1   # matches "ghijkl"
"xyzabcghijkl" ~~ $r1# matches "ghijkl"

Why does the last one match?  Because it fails the group but
doesn't fail the rule -- i.e., the rule is still free to advance
its initial pointer to the next character and try again.  Contrast
this with:

$r2 = rx /abc ::: def | ghi ::: jkl | mn ::: op/;

"abcdef"  ~~ $r1 # matches "abcdef"
"xyzghijkl" ~~ $r1   # matches "ghijkl"
"xyzabcghijkl" ~~ $r1# fails!

This one fails, because once we match the "abc", we're commited
to completing the match or failing the rule altogether.

Does this work to convince you that the two expression are indeed 
different?

Pm

Re: Numification of captured match

2005-05-12 Thread Patrick R. Michaud

On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> Is it really intended that we get into habit of writing this?
> 
> if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
>   my $socket = connect(~$0, +$1);
> }
> 
> It looks... weird. :)

And it would have to be

 if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
my $socket = connect(~$0, ~$1);
 }

because +$1 still evaluates to 1.  (The ~ in front of $0 is 
probably optional.)

My suggestion is that a match object in numeric context is the
same as evaluating its string value in a numeric context.  If
we need a way to find out the number of match repetitions (what
the numeric context was intended to provide), it might be better
done with an explicit C<.matchcount> method or something like that.

Pm

Re: split /(..)*/, 1234567890

2005-05-12 Thread Jonathan Scott Duff

On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote:
> On Thu, May 12, 2005 at 12:03:55PM -0500, Jonathan Scott Duff wrote:
> : I think that the above split should generate a list like this:
> : 
> : ('', [ '12','34','56','78','90'])
> 
> Yes, though I would think of it more generally as
> 
> ('', $0, '', $0, '', $0, ...)
> 
> where in this case it just happens to be
> 
> ('', $0)
> 
> and $0 expands to ['12','34','56','78','90'] if you treat it as an array.

Exactly so.  Principle of least surprise wins again! ;)

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

Re: split /(..)*/, 1234567890

2005-05-12 Thread Autrijus Tang

On Thu, May 12, 2005 at 02:56:37PM -0500, Jonathan Scott Duff wrote:
> On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote:
> > Yes, though I would think of it more generally as
> > 
> > ('', $0, '', $0, '', $0, ...)
> > 
> > where in this case it just happens to be
> > 
> > ('', $0)
> > 
> > and $0 expands to ['12','34','56','78','90'] if you treat it as an array.
> 
> Exactly so.  Principle of least surprise wins again! ;)

Thanks, implemented as such.

pugs> map { ref $_ } split /(..)*/, 1234567890
(::Str, ::Array::Const)

Thanks,
/Autrijus/


pgpg4Zh4fjRw8.pgp
Description: PGP signature

BEGIN and lexical variables inside subroutines

2005-05-12 Thread Benjamin Smith

  sub foo { my $x; BEGIN { $x = 3 }; say $x }
  foo; foo; foo;

Currently in perl5 and pugs this prints "3\n\n\n".

Should BEGIN blocks be able to modify values in lexical variables that
don't really exist yet? (People can use state after all to get a
variable which does exist early enough for them to modify.)


Is there some kind of "prototype pad" (and lexicals) available inside
the BEGIN block, rather than a full runtime pad?

-- 
Benjamin Smith <[EMAIL PROTECTED], [EMAIL PROTECTED]>


pgpRkLbUAMFJ0.pgp
Description: PGP signature

Re: C<::> in rules

2005-05-12 Thread Aaron Sherman

On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote:
> $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ;
> $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ;
> 
> On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote:
> > On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote:

> > > False.  In the first case the group is the whole rule.  In the second
> > > case the group would not include the (implied) '.*?' at the start of
> > > the rule.

This was a very unfortunate choice of explanations, since an implied
".*?" would change the semantics of the match deeply. However, your
later explanation:

> $r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/;
> 
> "abcdef"  ~~ $r1 # matches "abcdef"
> "xyzghijkl" ~~ $r1   # matches "ghijkl"
> "xyzabcghijkl" ~~ $r1# matches "ghijkl"
> 
> Why does the last one match?  Because it fails the group but
> doesn't fail the rule -- i.e., the rule is still free to advance
> its initial pointer to the next character and try again. 

... is very understandable. Now I'm just left with a vague sense that I
never want to see anyone use :: :-)

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

Re: BEGIN and lexical variables inside subroutines

2005-05-12 Thread Dave Mitchell

On Thu, May 12, 2005 at 09:06:48PM +0100, Benjamin Smith wrote:
>   sub foo { my $x; BEGIN { $x = 3 }; say $x }
>   foo; foo; foo;
> 
> Currently in perl5 and pugs this prints "3\n\n\n".
> 
> Should BEGIN blocks be able to modify values in lexical variables that
> don't really exist yet? (People can use state after all to get a
> variable which does exist early enough for them to modify.)
> 
> 
> Is there some kind of "prototype pad" (and lexicals) available inside
> the BEGIN block, rather than a full runtime pad?

In perl5, the first instance of a lexical exists from the moment of
compilation through till first exit from the enclosing scope during
exection. If this wasn't the case then lots of closure-related stuff
wouldn't work, eg

{
my $count = 0;
sub inc { $count++ }
sub dec { $count-- }
}
...

-- 
print+qq&$}$"$/$s$,[EMAIL PROTECTED],$:$.$q$^$,[EMAIL 
PROTECTED];$.$q$m&if+map{m,^\d{0\,},,${$::{$'}}=chr($"+=$&||1)}q&10m22,42}6:[EMAIL
 PROTECTED];^2dg3q/s"&=~m*\d\*.*g

Re: S29: punt [pwned!]

2005-05-12 Thread Sam Vilain

Rod Adams wrote:
It looks like I'm going to have to punt on finishing S29.
On behalf of pugs committers, we will gladly adopt this task, which is in
the pugs repository already at docs/S29draft.pod, as well as having a set
of foundation classes that correspond to all these object methods in
docs/src/ (of course, most of the actual code is in src/Pugs/Prim.hs etc)
I'm finding myself in a perpetual state of either no time to work on it, 
or when there is time, having insufficient brain power left to properly 
assimilate everything that needs to be considered to do any of the 
functions justice. Looking ahead, I do not see this state changing for 
the better in the foreseeable future.
Drop the feeling of guilt for not having written enough and it will already
be better.  Thanks for what you have done, it is an outstanding achievement!
It's my hope that someone none(@Larry) can and will pick this effort up. 
I will give whatever assistance I can to anyone choosing to do so. Drop 
me a line.
If you could make sure your last revision corresponds to what is in the
pugs repository, that will be more than enough...
Sam.

Re: C<::> in rules

2005-05-12 Thread Patrick R. Michaud

On Thu, May 12, 2005 at 05:15:55PM -0400, Aaron Sherman wrote:
> On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote:
> > False.  In the first case the group is the whole rule.  In the second
> > case the group would not include the (implied) '.*?' at the start of
> > the rule.
> 
> This was a very unfortunate choice of explanations, since an implied
> ".*?" would change the semantics of the match deeply. 

I agree, my wording on this wasn't all that clear--I haven't found
a good phrase for "the stepping that takes place at the beginning
of an unanchored match".  And in earlier versions of PGE, the
stepping was actually performed by a '.*?' node at the beginning
of the expression tree that didn't participate in the captured
result.  

Anyway, we're in agreement as to what :: and ::: do, so I'll propose
changes to S05/A05 and we can go from there.  Thanks! :-)

Pm

Re: split /(..)*/, 1234567890

2005-05-12 Thread Rick Delaney

On Fri, May 13, 2005 at 04:05:23AM +0800, Autrijus Tang wrote:
> > On Thu, May 12, 2005 at 12:01:59PM -0700, Larry Wall wrote:
> > > Yes, though I would think of it more generally as
> > > 
> > > ('', $0, '', $0, '', $0, ...)
> > > 
> > > where in this case it just happens to be
> > > 
> > > ('', $0)
> > > 
> > > and $0 expands to ['12','34','56','78','90'] if you treat it as an array.
> 
> Thanks, implemented as such.
> 
> pugs> map { ref $_ } split /(..)*/, 1234567890
> (::Str, ::Array::Const)

Sorry if I'm getting ahead of the implementation but if it is returning
$0 then shouldn't ref($0) return ::Rule::Result or somesuch?  It would
just look like an ::Array::Const if you treat it as such.

-- 
Rick Delaney
[EMAIL PROTECTED]

Re: S29: punt [pwned!]

2005-05-12 Thread Aaron Sherman

On Fri, 2005-05-13 at 12:07 +1200, Sam Vilain wrote:
> Rod Adams wrote:
> > It looks like I'm going to have to punt on finishing S29.
> 
> On behalf of pugs committers, we will gladly adopt this task, which is in
> the pugs repository already at docs/S29draft.pod, as well as having a set
> of foundation classes that correspond to all these object methods in
> docs/src/ (of course, most of the actual code is in src/Pugs/Prim.hs etc)

I'll be your resident firehose drinker. Feel free to send me your
comments, concerns and death threats where they are relevant to S29
and / or the above mentioned foundation classes. I will summarize and
compile and forward on to the relevant parties as needed.

Re: split /(..)*/, 1234567890

2005-05-12 Thread Autrijus Tang

On Thu, May 12, 2005 at 08:33:40PM -0400, Rick Delaney wrote:
> On Fri, May 13, 2005 at 04:05:23AM +0800, Autrijus Tang wrote:
> > pugs> map { ref $_ } split /(..)*/, 1234567890
> > (::Str, ::Array::Const)
> 
> Sorry if I'm getting ahead of the implementation but if it is returning
> $0 then shouldn't ref($0) return ::Rule::Result or somesuch?  It would
> just look like an ::Array::Const if you treat it as such.

Er, where does this ::Rule::Result thing come from?

I was basing my implementation on Damian's:

Quantifiers (except C and C) cause a matched subrule or
subpattern to return an array of C objects, instead of just a
single object.

As well as the PGE's implementation of treating the quantified capture as a
simple PerlArray PMC.

Thanks,
/Autrijus/

pgpRO8XsPXeEA.pgp
Description: PGP signature

Re: split /(..)*/, 1234567890

2005-05-12 Thread Autrijus Tang

On Thu, May 12, 2005 at 08:33:40PM -0400, Rick Delaney wrote:
> Sorry if I'm getting ahead of the implementation but if it is returning
> $0 then shouldn't ref($0) return ::Rule::Result or somesuch?  It would
> just look like an ::Array::Const if you treat it as such.

...also note that the $0 here is $/[0], also known as Perl 5's $1...

Indeed, the entire match result, that is $/, will always be a
single ::Match object if a match succeeds.

Thanks,
/Autrijus/

pgpNjBKChAP3V.pgp
Description: PGP signature

Re: Numification of captured match

2005-05-12 Thread Jonathan Scott Duff

On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
> On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> > Is it really intended that we get into habit of writing this?
> > 
> > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> > my $socket = connect(~$0, +$1);
> > }
> > 
> > It looks... weird. :)
> 
> And it would have to be
> 
>  if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
>   my $socket = connect(~$0, ~$1);
>  }
> 
> because +$1 still evaluates to 1.  

That's some subtle evil.

> My suggestion is that a match object in numeric context is the
> same as evaluating its string value in a numeric context.  

While I agree that this would be the right behavior it still feels
special-casey, hackish and wrong.  

If, as an optimization, you could tell PGE that you didn't need Match
objects and only cared about the string results of your captures, that
might be better. For instance,

if 'localhost:80' ~~ m:s/^(.+)\:(\d+)$/ {
my $socket = connect($0, $1);
}
:s for :string  (assuming that hasn't already been taken)

> If
> we need a way to find out the number of match repetitions (what
> the numeric context was intended to provide), it might be better
> done with an explicit C<.matchcount> method or something like that.

Surely that would just be [EMAIL PROTECTED]  Or have I crossed the perl[56]
streams again?

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

Re: Numification of captured match

2005-05-12 Thread Rob Kinyon

On 5/12/05, Jonathan Scott Duff <[EMAIL PROTECTED]> wrote:
> On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
> > On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> > > Is it really intended that we get into habit of writing this?
> > >
> > > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> > > my $socket = connect(~$0, +$1);
> > > }
> > >
> > > It looks... weird. :)
> >
> > And it would have to be
> >
> >  if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> >   my $socket = connect(~$0, ~$1);
> >  }
> >
> > because +$1 still evaluates to 1.
> 
> That's some subtle evil.
> 
> > My suggestion is that a match object in numeric context is the
> > same as evaluating its string value in a numeric context.
> 
> While I agree that this would be the right behavior it still feels
> special-casey, hackish and wrong.
> 
> If, as an optimization, you could tell PGE that you didn't need Match
> objects and only cared about the string results of your captures, that
> might be better. For instance,
> 
> if 'localhost:80' ~~ m:s/^(.+)\:(\d+)$/ {
> my $socket = connect($0, $1);
> }
> :s for :string  (assuming that hasn't already been taken)

What about the fact that anything matching (\d+) is going to be an Int
and anything matching (.+) is going to be a String, and so forth.
There is sufficient information in the regex for P6 to know that $0
should smart-convert into a String and $1 should smart-convert into a
Int. Can't we just do that?

Rob

Re: Numification of captured match

2005-05-12 Thread Larry Wall

On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
: On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
: > Is it really intended that we get into habit of writing this?
: > 
: > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
: > my $socket = connect(~$0, +$1);
: > }
: > 
: > It looks... weird. :)
: 
: And it would have to be
: 
:  if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
:   my $socket = connect(~$0, ~$1);
:  }
: 
: because +$1 still evaluates to 1.  (The ~ in front of $0 is 
: probably optional.)
: 
: My suggestion is that a match object in numeric context is the
: same as evaluating its string value in a numeric context.  If
: we need a way to find out the number of match repetitions (what
: the numeric context was intended to provide), it might be better
: done with an explicit C<.matchcount> method or something like that.

I think we already said something like that once some number of
months ago.  +$1 simply has to be the numeric value of the match.
It's not as much of a problem as a Perl 5 programmer might think,
since ?$1 is still true even if +$1 is 0.  Anyway, while we could have
a method for the .matchcount, +$1[] should work fine too.  And maybe
even [EMAIL PROTECTED], presuming that "a match object can function as an array"
actually means "a match object knows when it's being asked to supply
an array reference".

Actually, it's not clear to me offhand why @1 shouldn't mean $1[]
and %1 shouldn't mean $1{}.

Larry

Re: Numification of captured match

2005-05-12 Thread Damian Conway

Larry Wall wrote:
I think we already said something like that once some number of
months ago.  +$1 simply has to be the numeric value of the match.
Agreed.

Anyway, while we could have
a method for the .matchcount, +$1[] should work fine too. 
Yep.

Actually, it's not clear to me offhand why @1 shouldn't mean $1[]
and %1 shouldn't mean $1{}.
It *does*. According to the recent capture semantics document:
   > Note that, outside a rule, C<@1> is simply a shorthand for C<@{$1}>,
and:
   > And, of course, outside the rule, C<%1> is a shortcut for C<%{$1}>:
Damian

Re: Question on "is chomped"

2005-05-12 Thread Larry Wall

On Thu, May 12, 2005 at 02:52:01PM -0400, Joshua Gatcomb wrote:
: While E02 states that "is chomped" sets the chomped property of
: afilehandle, I didn't find any detailed specifications in any of the
: As or Ss.
: 
: So - is "is chomped" always the equivalent of:
: 
: while (  ) {
: chomp;
: }
: 
: For instance - you have opened the file rw
: 
: Or, the idea of having mutator and non-mutator versions of chomp (and
: other functions) have been kicked around the list.
: 
: Any definitive word yet?

There isn't really an "is chomped".  If you set the line terminator
for a filehandle, it marks the input strings as to where it thinks
the the line should be chomped (which can vary from line to line).
Then you can either add an autochomp to the filehandle to automatically
remove that, or a subsequent call to ordinary chomp() removes that
sequence.  If a line is not marked by an input layer, then chomp()
will presumably try to remove any standard line terminator.

In early Perl it was important to leave the newline on and chop/chomp
it later as a separate operation, but that's because of use of
while(<>) for input (at least until we snuck in the definedness hack).
In Perl 6 we're switching everyone to for =<> instead, which does
not depend on whether =<> returns a true/defined value, but only on
whether the generator has run out, so it's perfectly fine for the
input layer to autochop boring line terminators for you.  It's really
only if you're interested in varying line terminators that you would
want to do your own chomp.

I call it "autochomp" on a handle, but it's probably just :chomp or
some such.  We really haven't worked out how the IO layers interact
yet, or what their arguments are.  Almost need a perl6-io list
to work all that out...

Larry

Re: Numification of captured match

2005-05-12 Thread Larry Wall

On Fri, May 13, 2005 at 02:00:10PM +1000, Damian Conway wrote:
: >Actually, it's not clear to me offhand why @1 shouldn't mean $1[]
: >and %1 shouldn't mean $1{}.
: 
: It *does*. According to the recent capture semantics document:
: 
:> Note that, outside a rule, C<@1> is simply a shorthand for C<@{$1}>,
: 
: and:
: 
:> And, of course, outside the rule, C<%1> is a shortcut for C<%{$1}>:

In that case it's very much less clear to me why it shouldn't mean that.  :-)

Larry

Re: C<::> in rules

2005-05-12 Thread Larry Wall

On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote:
: Also, A05 proposes incorrect alternatives to the above 
: 
: /[:w[]foo bar]/# null pattern illegal, use 
: /[:w()foo bar]/# null capture illegal, and probably undesirable
: /[:w\bfoo bar]/# not exactly the same as above
: 
: I'd like to remove those from A05, or at least put an "Update:"
: note there that doesn't lead people astray.  One option not
: mentioned in A05 that we can add there is
: 
: /[:wfoo bar]/  
: 
: which is admittedly ugly.

I would just like to point out that you are misreading those.
The [] and () above are part of pair syntax, not rule syntax.
Likewise your :w should be taken to :w('?null').  We used to
try to distinguish modifiers like :w that don't take an argument,
but that's a bad plan.  All colon pairs parse alike wherever they
occur.  That's why we've required space before bracket delimiters
outside, but the same constraint holds inside rules.

Which means, of course, that we should probably try to figure
what :w($x) actually means...  :-)

Speaking of which, it seems to me that :p and :c should allow an
argument that says where to start relative to the current position.
In other words, :p means :p(0) and :c means :c(0).  I could also see
uses for :p(-1) and :p(+1).

We could also pass positions as opaque objects, which is another
reason not to consider positions as mere numbers.

Larry

Re: C<::> in rules

2005-05-12 Thread Patrick R. Michaud

On Thu, May 12, 2005 at 08:56:39PM -0700, Larry Wall wrote:
> On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote:
> : Also, A05 proposes incorrect alternatives to the above 
> : 
> : /[:w[]foo bar]/# null pattern illegal, use 
> : /[:w()foo bar]/# null capture illegal, and probably undesirable
> : /[:w\bfoo bar]/# not exactly the same as above
> : 
> 
> I would just like to point out that you are misreading those.

Ouch, you're right!  I've been looking at patterns too long, I
guess -- thanks for the correction.  

> Speaking of which, it seems to me that :p and :c should allow an
> argument that says where to start relative to the current position.
> In other words, :p means :p(0) and :c means :c(0).  I could also see
> uses for :p(-1) and :p(+1).

Sounds good to me.

Pm

Re: character classes in p6 rules

2005-05-12 Thread Larry Wall

On Wed, May 11, 2005 at 08:00:20PM -0500, Patrick R. Michaud wrote:
: Somehow I'd like to get rid of those inner angles, so 
: that we always use  <+alpha>, <+digit>, <-sp>, <-punct> to 
: indicate named character classes, and specify combinations 
: with constructions like  <+alpha+punct-[aeiou]>  and  <+word-[_]>.  
: We'd still allow <[abc]> as a shortcut to <+[abc]>.

I like it.

: I haven't thought far ahead to the question of whether
: character classes would continue to occupy the same namespace
: as rules (as they do now) or if they become specialized kinds
: of rules or what.  I'll just leave it at this for now and
: see what the rest of p6l thinks.

Hmm, well, positive matches can be defined to traverse whatever the
longest sequence matched is, even if it's actually multiple characters
by some reckoning or other.  On the other hand, negative matches
can really only skip one character in the current view regardless of
how long the sequences in the class are, which function as a negative
lookahead for the subsequent character skip.  In other words, <-alpha>
really means something like [ .]

But then it's not entirely clear how character class set theory works.
Another thing we have to work out.  Obviously + and - are ordered,
and we probably want & and | for actual set operations.  But does
<-[a]> negate only a preceding 'a' or all characters that use 'a'
as the base character along with subsequent combining characters?
We're almost getting into a wildcarding situation there...

In any event, the takehome message here is that characters cannot
be assumed to be constant width any more.

I think this argues that character classes really are rules of a sort.

Larry

Re: Numification of captured match

2005-05-12 Thread Jonathan Scott Duff

On Thu, May 12, 2005 at 08:10:42PM -0700, Larry Wall wrote:
> On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
> : My suggestion is that a match object in numeric context is the
> : same as evaluating its string value in a numeric context.  If
> : we need a way to find out the number of match repetitions (what
> : the numeric context was intended to provide), it might be better
> : done with an explicit C<.matchcount> method or something like that.
> 
> I think we already said something like that once some number of
> months ago.  +$1 simply has to be the numeric value of the match.
> It's not as much of a problem as a Perl 5 programmer might think,
> since ?$1 is still true even if +$1 is 0.  Anyway, while we could have
> a method for the .matchcount, +$1[] should work fine too.  And maybe
> even [EMAIL PROTECTED], presuming that "a match object can function as an 
> array"
> actually means "a match object knows when it's being asked to supply
> an array reference".

So the "counting" idiom in S05 becomes one of:

$match_count += @{m:g/pattern/};
$match_count += list m:g/pattern/;
$match_count += m:g/pattern/.matchount;
$match_count += (m:g/pattern/)[];   # maybe

???

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

Re: Numification of captured match

2005-05-12 Thread Patrick R. Michaud

On Thu, May 12, 2005 at 08:10:42PM -0700, Larry Wall wrote:
> On Thu, May 12, 2005 at 02:55:36PM -0500, Patrick R. Michaud wrote:
> : On Fri, May 13, 2005 at 03:23:20AM +0800, Autrijus Tang wrote:
> : > Is it really intended that we get into habit of writing this?
> : > 
> : > if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> : >   my $socket = connect(~$0, +$1);
> : > }
> : > 
> : > It looks... weird. :)
> : 
> : And it would have to be
> : 
> :  if 'localhost:80' ~~ /^(.+)\:(\d+)$/ {
> : my $socket = connect(~$0, ~$1);
> :  }
> : 
> : because +$1 still evaluates to 1.  (The ~ in front of $0 is 
> : probably optional.)
> : 
> : My suggestion is that a match object in numeric context is the
> : same as evaluating its string value in a numeric context.  If
> : we need a way to find out the number of match repetitions (what
> : the numeric context was intended to provide), it might be better
> : done with an explicit C<.matchcount> method or something like that.
> 
> I think we already said something like that once some number of
> months ago.  

I guess I've been led astray (or downright confused) by the capture 
specs then, when it says:

A successful match returns a C object whose boolean value is
true, whose integer value is typically 1 (except under the C<:g> or
C<:x> flags; see L), whose string
value is the complete substring that was matched by the entire rule,
whose array component contains all subpattern (unnamed) captures, and
whose hash component contains all subrule (named) captures.

and later

If an named scalar alias is applied to a set of non-capturing 
brackets:
m:w/ $:=[ (<[A-E]>) (\d**{3..6}) (X?) ] /;
then the corresponding entry in the rule's hash is assigned a 
C object whose:
* Boolean value is true,
* Integer value is 1,
* String value is the complete substring matched by the 
  contents of the square brackets,
* Array and hash are both empty.

and under the :g option...

 if $text ~~ m:words:globally/ (\S+:)  / {
 say "Matched {+$/} different ways";
 say 'Full match context is:';
 say $/;
 }

So, are the Match objects returned from subpattern captures 
treated differently in numeric context than the Match objects
coming from named scalar aliases or the match itself... ?

> It's not as much of a problem as a Perl 5 programmer might think,
> since ?$1 is still true even if +$1 is 0.  Anyway, while we could have
> a method for the .matchcount, +$1[] should work fine too.  

With .matchcount I wasn't concerned about the number of repetitions
stored in $1 -- I was trying to get at the numeric value that $/
would've returned under the :g option.  But in re-reading the draft
of the :globally option I see we already have one --  
C< $/.matches > in numeric context should supply it for us.

So I'm guessing that we're all in agreement that +$/, +$1, and 
+$ all refer to the numeric value of the string matched, 
as opposed to what's currently written about their values in the 
draft...?  Or am I still missing the picture entirely?

Pm

Re: Numification of captured match

2005-05-12 Thread Damian Conway

Patrick surmised:
So I'm guessing that we're all in agreement that +$/, +$1, and 
+$ all refer to the numeric value of the string matched, 
as opposed to what's currently written about their values in the 
draft...?
Yes. The semantics proposed in the draft have proved to be too orthogonal for 
practical use. ;-)

Damian

46 matches

Mail list logo