Re: Does a string remember all Unicode levels?

2009-08-12 Thread Helmut Wollmersdorfer

Moritz Lenz wrote:

t/spec/S02-builtin_data_types/unicode.t has tests like this:



# LATIN CAPITAL LETTER A, COMBINING GRAVE ACCENT
my Str $u = "\x[0041,0300]";
is $u.bytes, 3, 'combining À is three bytes as utf8';
is $u.codes, 2, 'combining À is two codes';
is $u.graphs, 1, 'combining À is one graph';



Which seems to imply that a Str remembers its codepoints, even if it is
in grapheme mode (because that's the default).


IMHO it's necessary to store the original assertion. Conversion to NFG 
should be lazy.



Is this correct? I don't really think that's sensible. I'd expect  a
compiler to store strings in composed normalization (+ NFG), so $u.codes
would be 1.


If a string always stores NFG only - where can we store the result of a 
decomposition (NFD)?


Also it would be very confusing if a developer just reads a file, 
filters the lines, and writes them back, if the result is in another 
normalization form.


Helmut Wollmersdorfer


Should @x be defined after only "my @x"? (RT #64968)

2009-08-12 Thread Kyle Hasselbacher
use v6;

my $s;   #  ! $x.defined
my @a;  # @a.defined

That's the current Rakudo behavior.  RT #64968 suggests that this is a
bug.  In Perl 5, @a would not be defined until something was put into
it.  Which should it be?  I'd like to write a test for this.

Thanks.

Kyle.


Re: Should @x be defined after only "my @x"? (RT #64968)

2009-08-12 Thread Uri Guttman
> "KH" == Kyle Hasselbacher  writes:

  KH> use v6;
  KH> my $s;   #  ! $x.defined
  KH> my @a;  # @a.defined

  KH> That's the current Rakudo behavior.  RT #64968 suggests that this is a
  KH> bug.  In Perl 5, @a would not be defined until something was put into
  KH> it.  Which should it be?  I'd like to write a test for this.

i am not even sure why defined is a method on aggregates. it is a known
issue in p5 that you shouldn't test aggregates with defined since it
tests whether anything has ever been allocated for it vs being
empty. this comes from newbies seeing undef $foo and then some do undef
@bar and then think defined @bar makes sense. so maybe there is a new
reason to support defined on arrays and hashes but i think it should be
disallowed.

uri

-- 
Uri Guttman  --  u...@stemsystems.com    http://www.sysarch.com --
-  Perl Code Review , Architecture, Development, Training, Support --
- Free Perl Training --- http://perlhunter.com/college.html -
-  Gourmet Hot Cocoa Mix    http://bestfriendscocoa.com -


Re: comments as preserved meta-data (was Re: Embedded comments ...)

2009-08-12 Thread Timothy S. Nelson
	I had an interesting idea I wanted to put out there.  If I'm being a 
good boy and commenting my code, I do things like the following pseudocode:


# Get the stuff and do other stuff with it
@lines = slurp("file");
@otherlines = map { s/foo/bar/ } @lines
putfile("file", @lines);

$t = 3;


	Anyway, my point is, how much of the code does the comment apply to. 
I was thinking it might be a good idea to have a comment that indicates when 
the code that the previous comment applies to ends.  So you could have 
something like this:


#{ Get the stuff and do other stuff with it
@lines = slurp("file");
@otherlines = map { s/foo/bar/ } @lines
putfile("file", @lines);
#}

$t = 3;


	...and then both people and parsers would be able to determine which 
part of the code your comment applies to, and we'd be able to automatically 
determine the comment coverage of our code, as well as the test coverage.  I'm 
perfectly well aware that the syntax I suggested about conflicts with the 
existing Perl 6 spec, and I'd love it if people suggested a better 
alternative.


	Just to make myself clear, in the above example, I'm not commenting 
out any of the code, merely indicating which code the comment applies to.


Anyway, just some thoughts...

:).


-
| Name: Tim Nelson | Because the Creator is,|
| E-mail: wayl...@wayland.id.au| I am   |
-

BEGIN GEEK CODE BLOCK
Version 3.12
GCS d+++ s+: a- C++$ U+++$ P+++$ L+++ E- W+ N+ w--- V- 
PE(+) Y+>++ PGP->+++ R(+) !tv b++ DI D G+ e++> h! y-

-END GEEK CODE BLOCK-



Re: comments as preserved meta-data (was Re: Embedded comments ...)

2009-08-12 Thread Darren Duncan
Timothy, you raise a good point that I had been thinking about earlier as a 
consequence of my proposal about comments being preserved and attached as 
meta-data to what is most appropriate contextually.


I'm thinking that it should be formally defined somewhere (maybe as an extra 
section in the synopsis about POD, or maybe not as parsing POD is supposed to be 
agnostic to Perl versus other languages AFAIK) as to where exactly one should 
put comments or how they should be formatted such that they would be 
unambiguously attached to something specific.


In my mind, any of the following could have a comment meta-data associated with 
it specifically:


- A single statement in a routine.
- A single sub-expression within a statement.
- A contiguous sequence of statements.
- A single code block (which is brace-bounded).
- A single parameter or other trait of a routine.
- A single routine.
- A single class/role/etc attribute declaration.
- A single statement outside a routine (misc things inside package decl), or a 
contiguous sequence of said.

- A single package.

Now in my experience, a comment for something either appears just before that 
thing, or just after that thing, or in the case of a brace/bracket/etc-delimited 
thing, just inside the opening brace/bracket/etc of the thing.  Or thanks to 
Perl's unspace or embedded comment (name?) feature, otherwise in the middle of 
the thing it describes.


Generally speaking we want to support whatever commenting styles people are 
already accustomed to doing, and the formal definition of what to attach a 
comment to should work with all the common cases where possible, and have a 
reasonable fallback in the case of ambiguity.


For some examples, of varying ambiguity:

  @lines = slurp("file"); # attach to single stmt this follows
# also attach to single stmt this follows, due to indent

  @foo = # attach to single stmt which is assign stmt
map { ... } # attach to 'map' sub-expr
grep { ... } # attach to 'grep' sub-expr
map { ... } # attach to 'map' sub-expr
;

  # attach to routine 'foo', same indent level
  sub foo () {
# also attach to routine 'foo'
  }

  # attach to the contiguous seq of stmts until next same
  # kind of comment
  @lines = slurp("file");
  @otherlines = map { s/foo/bar/ } @lines
  putfile("file", @lines);

  # attach to whole if-elsif-else block etc
  if 1 {
# attach to the innermost code block this is inside
# which is the 'if' option block
...
  }

  else {
# attach to the 'else' option block
...
  }

Well there is still much to think about, and I could probably develop proposal 
details further, but some things could be made more explicit, eg, the comment 
just inside the opening 'if' block could be further tersely annotated, say with 
a differing punctuation char immediately following the #, to distinguish say 
whether it was for the 'if' block it was inside or for the set of statements it 
immediately precedes within said block.


I think this can be made to work without much fuss, and it will be valuable. 
Both for introspection of op-tree as well as the ability to regenerate the 
original or as-if-original Perl code from the op-tree with more pleasing results.


-- Darren Duncan

Timothy S. Nelson wrote:
I had an interesting idea I wanted to put out there.  If I'm being a 
good boy and commenting my code, I do things like the following pseudocode:


# Get the stuff and do other stuff with it
@lines = slurp("file");
@otherlines = map { s/foo/bar/ } @lines
putfile("file", @lines);

$t = 3;


Anyway, my point is, how much of the code does the comment apply to. 
I was thinking it might be a good idea to have a comment that indicates 
when the code that the previous comment applies to ends.  So you could 
have something like this:


#{ Get the stuff and do other stuff with it
@lines = slurp("file");
@otherlines = map { s/foo/bar/ } @lines
putfile("file", @lines);
#}

$t = 3;


...and then both people and parsers would be able to determine which 
part of the code your comment applies to, and we'd be able to 
automatically determine the comment coverage of our code, as well as the 
test coverage.  I'm perfectly well aware that the syntax I suggested 
about conflicts with the existing Perl 6 spec, and I'd love it if people 
suggested a better alternative.


Just to make myself clear, in the above example, I'm not commenting 
out any of the code, merely indicating which code the comment applies to.


Anyway, just some thoughts...