Re: r27106 - docs/Perl6/Spec

2009-06-18 Thread Aaron Crane
pugs-comm...@feather.perl6.nl writes:
> +The C type is derived from C, with the additional constraint
> +that it may only contain validly encoded UTF-8.  Likewise, C is
> +derived from C, and C from C.

What does "validly encoded UTF-8" mean in this context?  The following
questions come to mind:

1.  Four-byte UTF-8 sequences are enough to handle any Unicode
character.  Are the obvious five- and six-byte extensions
permitted?  If so, how about a seven-byte extension (needed to
allow any 32-bit value to be encoded)?

Whichever sequence length is chosen, is there an additional
constraint on the maximum permitted codepoint?  For example,
four-byte UTF-8 sequences can easily represent values up to
0x1f_, but Unicode stops at 0x10_.  Or if seven-byte
sequences are permitted, are codepoints limited to 2**32-1?

2.  Are over-wide encoded sequences (0xC0 0x41 for U+0041, and so on)
permitted?  (I hope not.)

3.  Are encoded codepoints corresponding to UTF-16 surrogates permitted?

4.  Are noncharacter codepoints (0xFFFE, 0x, etc) permitted?

5.  Are unallocated codepoints permitted?  If so, that doesn't seem
very "valid"; but if not, a program's behaviour might change under
a newer version of Unicode.  Perhaps programs should be given the
opportunity to declare which Unicode version's list of allocated
characters they want.

6.  Are values that begin with combining characters permitted?

Of those, question (3) applies to UTF-32, and questions (4), (5), and
(6) to both UTF-16 and UTF-32.  Further, a variant of (1) applies to
UTF-32: are code units greater than 0x10 permitted?

I assume that the C type forbids invalid surrogate sequences.

I'm also tempted to suggest that the type names should be C,
C, C.

-- 
Aaron Crane ** http://aaroncrane.co.uk/


Re: Array Dimensionality

2009-06-18 Thread yary
I think this proposal goes to far in the dwimmery direction-

On Sat, Jun 13, 2009 at 12:58 PM, John M. Dlugosz<2nb81l...@sneakemail.com>
wrote:
> Daniel Ruoso daniel-at-ruoso.com |Perl 6| wrote:
>>
>> So, how do I deal with a multidim array? Well, TIMTOWTDI...
>>
>>  my @a = 1,[2,[3,4]];
>>  say @a[1][1][1];
>>  say @a[1;1;1]; # I'm not sure this is correct
>>
>>
>
> I think that it should be.  That is, multi-dim subscript is always the
same
> as chained subscripts, regardless of whether the morphology is an array
> stored as an element, or a multi-dim container, or any mixture of that as
> you drill through them.
>
> I've not written out a full formalism yet, but I've thought about it.
> The multi-dim subscript would return a sub-array if there were fewer
> parameters than dimensions, an element if exact match, and recursively
apply
> the remaining subscripts to the element if too many.
>
>
>> Or.. (I'm using the proposed capture sigil here, which has '@%a' as its
>> expanded form)
>>
>>  my ¢a = 1,(2,(3,4);
>>  say ¢a[1][1][1];
>>  say ¢a[1;1;1];
>>
>> I think that makes the semantics of the API more clear...
>>
>> daniel
>>
>>
>>
>
> The plain Array would work too, in the nested morphology:
>
>   my @a = 1,[2,[3,4]];
>
> @a has 2 elements, the second of which is type Array.
>
>   say @a[1][1][1];
>
> naturally.
>
>   say @a[1;1;1];
>
> means the same thing, intentionally.
>
>   say @a[1][1;1];
>   say @a[1;1][1];
>
> ditto.

My thought is that captures, multi-D arrays, and arrays of arrays are all
different data structures, the programmer will pick them or some mix of them
for a reason, and expect consistent access semantics. I agree that the
various types should be transparently converted when necessary, but the
dwimmery proposed on indexing could make it hard to find bugs in code
dealing with complicated data structures.

The problem comes with nested structures. Let's talk about a multi-D array,
where each element is another multi-D array. This is also an example of my
 understanding of multi-D list initialization- the specs are silent on that
other than initializing elements one at a time eg. "@md[1;0] = 4;"-
apologies for squeezing two topics into one post.

# Build it piece by piece, first using explicitly dimensioned sub-arrays
# Doesn't matter if the initialization is a list, array, capture of arrays.
The RHS is in list context which flattens a capture, and the explicit
dimension will pour them all into a 2x2 array.
my @sub1[2;2]=(99,\('a',[]; 'c'; CC) ; 88, [1,2,3]);
my @sub2[2;2]=77,[], 66,[4,5,6];
my @sub3[2;2]=(55; [], 44, [7,8,9]);
# Use slice context to retain the 2x2 shape
my @@sub4=([], 33; [10,11,12], 22);

# A single column, two high
my @sub5[1;2]=([]; [13,14,15]);

# 3 ragged rows, 1 long then 2 long then 3 long
my @sub6[3;*]=('row1'; ; );

=begin comment
3 ragged columns, first 3 high, the 2 high, then 3 high
c1a c2a c3a
c1b c2b c3b
c1c c3c
=end comment
my @sub7[*;3]=(; ; 'c3a', Nil, 'c3c'); # Perilous?

# Simulate a sparse array, set two elements
my @sub8[*;*]; @sub8[(5;6),(8;0)]=;

# Now build a multi-dimensional array, each element of which is a multi-D
array
my @a[2;2;2]=\(@@sub1; @@sub2; @@sub3; @@sub4; @@sub5; @@sub6; @@sub7;
@@sub8);

# This also builds an 8-element 3D cube. Not sure about , vs ; below
my @@b=\( \( \(@@sub1; @@sub2); \(@@sub3; @@sub4));
\(\(@@sub5; @@sub6); \(@@sub7; @@sub8)));

# Same as above, but no captures, use slices all the way. Valid?
my @@c=@@( @@( @@(@@sub1; @@sub2); @@(@@sub3; @@sub4));
@@(@@(@@sub5; @@sub6); @@(@@sub7; @@sub8)));

Returning to John's post- In this case all these accessors return different
elements-

>   say @a[1][1][1];
BB
@a[1] is accessing @a as a flat array, so that returns the 2nd element of @a
which is \('a',[]; 'c'; CC), which is then treated as a flat list by
the next [1] subscript. The 2nd element of the 2nd element of that is BB.

>say @a[1;1;1];
@sub8

>   say @a[1][1;1];
CC
@a[1] is \('a',[]; 'c'; CC) which is now treated as a multi-D array.
[1;1] asks for the lower-right corner of that 2x2 array, which is CC.

>   say @a[1;1][1];
@sub8
S09 states:
You need not specify all the dimensions; if you don't, the unspecified
dimensions are "wildcarded".
So the above becomes
@a[1;1;*][1]
@a[1;1;*] is \(@@sub7;@@sub8), 2nd element of that is @sub8

S09's "Cascaded subscripting of multidimensional arrays" says the above
"will either fail or produce the same results as the equivalent semicolon
subscripts." Following that part of the spec, it should convert to @a[1;1;1]
and still return @sub8.

But what I would really like is a "strict array mode" that would give me an
error when using a subscript dimensioned different from the array's
dimensions. I think that if an array has explicit dimensions they need to be
obeyed, with 1D access a specific allowed exception. These examples shows a
necessity for distinctly different semantics for @a[1][1][1], @a[1][1;1],
and @a[1;1;1], which conflicts with S09's "Cascaded subscripting" sect

Re: Array Dimensionality

2009-06-18 Thread yary
Apologies for the long post with mistakes in it. I'm going to try
again, biting off less.

my @g[2;2];
@g[0;0]='r0c0';
@g[0;1]='r0c1';
@g[1;0]='r1c0';
@g[1;1]='r1c1';

@g[1] is  due to S09:

Multi-dimensional arrays, on the other hand, know how to handle a
multidimensional slice, with one subslice for each dimension. You need
not specify all the dimensions; if you don't, the unspecified
dimensions are "wildcarded".

@g[1] becomes @g[1;*] which is ('r1c0', 'r1c1')
(@g[1])[1] is then 'r1c1', which is the same result as @g[1;1]

Using that logic, I can't think of a case where @a[1;1;1] means
something different from ((@a[1])[1])[1]. @a[1] will become @a[1;*;*]
producing a 2d slice of the "2nd row" plane, then we get the "2nd to
the right" column of that from the next slice, and finally the "2nd
back" element of that.

In fact I'd suggest that 'unspecified dimensions are "wildcarded"'
means we don't need the "Cascaded subscripting of multidimensional
arrays" section.

I'd still like to have an error or warning on treating a multi-D array
as an array of arrays.


Rakudo Perl 6 development release #18 ("Pittsburgh")

2009-06-18 Thread Patrick R. Michaud

On behalf of the Rakudo development team, I'm pleased to announce
the June 2009 development release of Rakudo Perl #18 "Pittsburgh".
Rakudo is an implementation of Perl 6 on the Parrot Virtual Machine [1].
The tarball for the June 2009 release is available from
http://github.com/rakudo/rakudo/downloads .

Due to the continued rapid pace of Rakudo development and the
frequent addition of new Perl 6 features and bugfixes, we continue
to recommend that people wanting to use or work with Rakudo obtain
the latest source directly from the main repository at github.
More details are available at http://rakudo.org/how-to-get-rakudo .

Rakudo Perl follows a monthly release cycle, with each release code named
after a Perl Mongers group.  This release is named "Pittsburgh", which
is the host for YAPC|10 (YAPC::NA 2009) [2] and the Parrot Virtual Machine
Workshop [3].  Pittsburgh.pm has also sponsored hackathons for Rakudo 
Perl as part of the 2008 Pittsburgh Perl Workshop [4].

In this release of Rakudo Perl, we've focused our efforts on refactoring
many of Rakudo's internals; these refactors improve performance, 
bring us closer to the Perl 6 specification, operate more cleanly
with Parrot, and provide a stronger foundation for features to be
implemented in the near future.  Some of the specific major changes
and improvements in this release include:

* Rakudo is now passing 11,536 spectests, an increase of 194
  passing tests since the May 2009 release.  With this release
  Rakudo is now passing 68% of the available spectest suite.

* Method dispatch has been substantially refactored; the new dispatcher
  is significantly faster and follows the Perl 6 specification more
  closely.

* Object initialization via the BUILD and CREATE (sub)methods is
  substantially improved.

* All return values are now type checked (previously only explicit
  'return' statements would perform type checking).

* String handling is significantly improved: fewer Unicode-related
  bugs exist, and parsing speed is greatly improved for some programs 
  containing characters in the Latin-1 set.

* The IO .lines and .get methods now follow the specification more closely.

* User-defined operators now also receive some of their associated 
  meta variants.

* The 'is export' trait has been improved; more builtin functions
  and methods can be written in Perl 6 instead of PIR.

* Many Parrot changes have improved performance and reduced overall
  memory leaks (although there's still much more improvement needed).

The development team thanks all of our contributors and sponsors for
making Rakudo Perl possible.  If you would like to contribute,
see http://rakudo.org/how-to-help , ask on the perl6-compi...@perl.org
mailing list, or ask on IRC #perl6 on freenode.

The next release of Rakudo (#19) is scheduled for July 23, 2009.
A list of the other planned release dates and codenames for 2009 is
available in the "docs/release_guide.pod" file.  In general, Rakudo
development releases are scheduled to occur two days after each
Parrot monthly release.  Parrot releases the third Tuesday of each month.

Have fun!

References:
[1]  Parrot, http://parrot.org/
[2]  YAPC|10 http://yapc10.org/yn2009/
[3]  Parrot Virtual Machine Workshop, http://yapc10.org/yn2009/talk/2045
[4]  Pittsburgh Perl Workshop, http://pghpw.org/ppw2008/


Re: Why pass by reference?

2009-06-18 Thread Martin D Kealey

> Matthew Walton wrote:
> > If a user of your API contrives to make it change while you're
> > running, that's their own foot they've just shot, because they can
> > look at the signature and know the semantics of the parameter
> > passing being used and know that if they change the value externally
> > before you return Bad Things Could Happen.
>
On Tue, 16 Jun 2009, TSa wrote:
> I agree that the caller is responsible for the constness of the value
> he gives to a function. With this we get the best performance.

At the language level this is wrong. Programmers are BAD at this sort of
thing, unless the compiler *always* has enough to throw a compile-time
error, and even then it's dicey because we may defer compilation.

It seems to me this is pushing something onto the author of the caller
that they shouldn't have to deal with, especially when you consider that
the parameter they're passing into the function may come from somewhere
else, which hasn't been made -- and indeed CAN'T be made -- to promise
not to meddle with the value (note *1).

If the compiler can't spot it, how do you expect a fallible human being
to do so?

If a function requires an invariant parameter then the compiler should
ensure that that guarantee is met, and not rely on the programmer to do
something that is impossibly hard in the general case. A simple way
would be to call $parameter := $parameter.INVARIANT()  (*2) on the
caller's behalf before calling the function.

Conversely, when calling a function where the parameter is declared :rw,
the compiler can call $parameter := $parameter.LVALUE() (*3) on the
caller's behalf first if it needs to convert an immutable object to a
mutable one.  (Or throw up its hands and assert that it's not allowed.)

If we really expect the optimizer to make Perl6 run well on a CPU with
1024 cores (*4), we have to make it easy to write programs that will
allow the optimizer to do its job, and (at least a little bit) harder to
write programs that defeat the optimizer.

To that end I would propose that:
 - parameters should be read-only AND invariant by default, and
 - that invariance should be enforced passing a deep immutable clone
   (*5) in place of any object that isn't already immutable.

-Martin

Footnotes:

*1: There are many possible reasons, but for example the caller didn't
declare it :readonly in turn to its callers because it *did* plan to meddle
with it -- but just not by calling this function with its :readonly
parameter.


*2: Yes I made up "INVARIANT". The trick is that the compiler only needs
to insert the call if can't prove the invariance of $parameter, which it
*can* prove when:
 - it arrived in a :readonly parameter; or
 - it's locally scoped, and hasn't "escaped".

In addition the implementation of INVARIANT() could:
 - return $self for any "value" class; and
 - return the encapsulated immutable object for the case outlined in the
   following footnote.

Otherwise the default implementation of INVARIANT() would be like
deepclone().

(Declaring a "value class" would ideally be shorter than declaring a
"container class", but I'm a bit stuck as to how to achieve that. Ideas are
welcome...)


*3: The LVALUE method produces the sort of proxy object that others have
described, but with the reverse function: it acts as a scalar container
that can only hold immutable objects, and proxies all method calls to
it, but allows assignment to replace the contained object.  Calling
INVARIANT on such a container object simply returns the encapsulated
immutable object.


*4: As a generalization, the assumptions floating round that "the
compiler will optimize things" just aren't facing reality: programmers
are about the worst people when it comes to learning from the past
mistakes of others, and future generations of Perl6 programmers will
inevitably create evil container classes with no corresponding value
classes, and thus most parallelizing optimizations will be defeated.


*5: At the language level at least, copying is NOT the enemy of
optimization. On the contrary, if you always copy and *never* mutate,
that ensures that the compiler can always determine the provenance and
visibility of any given datum, and thus has *more* opportunities to
avoid *actually* copying anything. And it can parallelize to the full
extent of available hardware because it can guarantee that updates won't
overlap.


Re: Why pass by reference?

2009-06-18 Thread Martin D Kealey
On Fri, 19 Jun 2009, Martin D Kealey wrote:
> To that end I would propose that:
>  - parameters should be read-only AND invariant by default, and
>  - that invariance should be enforced passing a deep immutable clone
>(*5) in place of any object that isn't already immutable.

Sorry, typo: that last word should have been "invariant", meaning that it
*won't* change, rather than "immutable", meaning that it *can't*.

Compilers can rely on invariance to perform a range of very powerful
optimizations; immutability is one way to guarantee invariance, but not the
only way.

-Martin