Re: r27106 - docs/Perl6/Spec
pugs-comm...@feather.perl6.nl writes: > +The C type is derived from C, with the additional constraint > +that it may only contain validly encoded UTF-8. Likewise, C is > +derived from C, and C from C. What does "validly encoded UTF-8" mean in this context? The following questions come to mind: 1. Four-byte UTF-8 sequences are enough to handle any Unicode character. Are the obvious five- and six-byte extensions permitted? If so, how about a seven-byte extension (needed to allow any 32-bit value to be encoded)? Whichever sequence length is chosen, is there an additional constraint on the maximum permitted codepoint? For example, four-byte UTF-8 sequences can easily represent values up to 0x1f_, but Unicode stops at 0x10_. Or if seven-byte sequences are permitted, are codepoints limited to 2**32-1? 2. Are over-wide encoded sequences (0xC0 0x41 for U+0041, and so on) permitted? (I hope not.) 3. Are encoded codepoints corresponding to UTF-16 surrogates permitted? 4. Are noncharacter codepoints (0xFFFE, 0x, etc) permitted? 5. Are unallocated codepoints permitted? If so, that doesn't seem very "valid"; but if not, a program's behaviour might change under a newer version of Unicode. Perhaps programs should be given the opportunity to declare which Unicode version's list of allocated characters they want. 6. Are values that begin with combining characters permitted? Of those, question (3) applies to UTF-32, and questions (4), (5), and (6) to both UTF-16 and UTF-32. Further, a variant of (1) applies to UTF-32: are code units greater than 0x10 permitted? I assume that the C type forbids invalid surrogate sequences. I'm also tempted to suggest that the type names should be C, C, C. -- Aaron Crane ** http://aaroncrane.co.uk/
Re: Array Dimensionality
I think this proposal goes to far in the dwimmery direction- On Sat, Jun 13, 2009 at 12:58 PM, John M. Dlugosz<2nb81l...@sneakemail.com> wrote: > Daniel Ruoso daniel-at-ruoso.com |Perl 6| wrote: >> >> So, how do I deal with a multidim array? Well, TIMTOWTDI... >> >> my @a = 1,[2,[3,4]]; >> say @a[1][1][1]; >> say @a[1;1;1]; # I'm not sure this is correct >> >> > > I think that it should be. That is, multi-dim subscript is always the same > as chained subscripts, regardless of whether the morphology is an array > stored as an element, or a multi-dim container, or any mixture of that as > you drill through them. > > I've not written out a full formalism yet, but I've thought about it. > The multi-dim subscript would return a sub-array if there were fewer > parameters than dimensions, an element if exact match, and recursively apply > the remaining subscripts to the element if too many. > > >> Or.. (I'm using the proposed capture sigil here, which has '@%a' as its >> expanded form) >> >> my ¢a = 1,(2,(3,4); >> say ¢a[1][1][1]; >> say ¢a[1;1;1]; >> >> I think that makes the semantics of the API more clear... >> >> daniel >> >> >> > > The plain Array would work too, in the nested morphology: > > my @a = 1,[2,[3,4]]; > > @a has 2 elements, the second of which is type Array. > > say @a[1][1][1]; > > naturally. > > say @a[1;1;1]; > > means the same thing, intentionally. > > say @a[1][1;1]; > say @a[1;1][1]; > > ditto. My thought is that captures, multi-D arrays, and arrays of arrays are all different data structures, the programmer will pick them or some mix of them for a reason, and expect consistent access semantics. I agree that the various types should be transparently converted when necessary, but the dwimmery proposed on indexing could make it hard to find bugs in code dealing with complicated data structures. The problem comes with nested structures. Let's talk about a multi-D array, where each element is another multi-D array. This is also an example of my understanding of multi-D list initialization- the specs are silent on that other than initializing elements one at a time eg. "@md[1;0] = 4;"- apologies for squeezing two topics into one post. # Build it piece by piece, first using explicitly dimensioned sub-arrays # Doesn't matter if the initialization is a list, array, capture of arrays. The RHS is in list context which flattens a capture, and the explicit dimension will pour them all into a 2x2 array. my @sub1[2;2]=(99,\('a',[]; 'c'; CC) ; 88, [1,2,3]); my @sub2[2;2]=77,[], 66,[4,5,6]; my @sub3[2;2]=(55; [], 44, [7,8,9]); # Use slice context to retain the 2x2 shape my @@sub4=([], 33; [10,11,12], 22); # A single column, two high my @sub5[1;2]=([]; [13,14,15]); # 3 ragged rows, 1 long then 2 long then 3 long my @sub6[3;*]=('row1'; ; ); =begin comment 3 ragged columns, first 3 high, the 2 high, then 3 high c1a c2a c3a c1b c2b c3b c1c c3c =end comment my @sub7[*;3]=(; ; 'c3a', Nil, 'c3c'); # Perilous? # Simulate a sparse array, set two elements my @sub8[*;*]; @sub8[(5;6),(8;0)]=; # Now build a multi-dimensional array, each element of which is a multi-D array my @a[2;2;2]=\(@@sub1; @@sub2; @@sub3; @@sub4; @@sub5; @@sub6; @@sub7; @@sub8); # This also builds an 8-element 3D cube. Not sure about , vs ; below my @@b=\( \( \(@@sub1; @@sub2); \(@@sub3; @@sub4)); \(\(@@sub5; @@sub6); \(@@sub7; @@sub8))); # Same as above, but no captures, use slices all the way. Valid? my @@c=@@( @@( @@(@@sub1; @@sub2); @@(@@sub3; @@sub4)); @@(@@(@@sub5; @@sub6); @@(@@sub7; @@sub8))); Returning to John's post- In this case all these accessors return different elements- > say @a[1][1][1]; BB @a[1] is accessing @a as a flat array, so that returns the 2nd element of @a which is \('a',[]; 'c'; CC), which is then treated as a flat list by the next [1] subscript. The 2nd element of the 2nd element of that is BB. >say @a[1;1;1]; @sub8 > say @a[1][1;1]; CC @a[1] is \('a',[]; 'c'; CC) which is now treated as a multi-D array. [1;1] asks for the lower-right corner of that 2x2 array, which is CC. > say @a[1;1][1]; @sub8 S09 states: You need not specify all the dimensions; if you don't, the unspecified dimensions are "wildcarded". So the above becomes @a[1;1;*][1] @a[1;1;*] is \(@@sub7;@@sub8), 2nd element of that is @sub8 S09's "Cascaded subscripting of multidimensional arrays" says the above "will either fail or produce the same results as the equivalent semicolon subscripts." Following that part of the spec, it should convert to @a[1;1;1] and still return @sub8. But what I would really like is a "strict array mode" that would give me an error when using a subscript dimensioned different from the array's dimensions. I think that if an array has explicit dimensions they need to be obeyed, with 1D access a specific allowed exception. These examples shows a necessity for distinctly different semantics for @a[1][1][1], @a[1][1;1], and @a[1;1;1], which conflicts with S09's "Cascaded subscripting" sect
Re: Array Dimensionality
Apologies for the long post with mistakes in it. I'm going to try again, biting off less. my @g[2;2]; @g[0;0]='r0c0'; @g[0;1]='r0c1'; @g[1;0]='r1c0'; @g[1;1]='r1c1'; @g[1] is due to S09: Multi-dimensional arrays, on the other hand, know how to handle a multidimensional slice, with one subslice for each dimension. You need not specify all the dimensions; if you don't, the unspecified dimensions are "wildcarded". @g[1] becomes @g[1;*] which is ('r1c0', 'r1c1') (@g[1])[1] is then 'r1c1', which is the same result as @g[1;1] Using that logic, I can't think of a case where @a[1;1;1] means something different from ((@a[1])[1])[1]. @a[1] will become @a[1;*;*] producing a 2d slice of the "2nd row" plane, then we get the "2nd to the right" column of that from the next slice, and finally the "2nd back" element of that. In fact I'd suggest that 'unspecified dimensions are "wildcarded"' means we don't need the "Cascaded subscripting of multidimensional arrays" section. I'd still like to have an error or warning on treating a multi-D array as an array of arrays.
Rakudo Perl 6 development release #18 ("Pittsburgh")
On behalf of the Rakudo development team, I'm pleased to announce the June 2009 development release of Rakudo Perl #18 "Pittsburgh". Rakudo is an implementation of Perl 6 on the Parrot Virtual Machine [1]. The tarball for the June 2009 release is available from http://github.com/rakudo/rakudo/downloads . Due to the continued rapid pace of Rakudo development and the frequent addition of new Perl 6 features and bugfixes, we continue to recommend that people wanting to use or work with Rakudo obtain the latest source directly from the main repository at github. More details are available at http://rakudo.org/how-to-get-rakudo . Rakudo Perl follows a monthly release cycle, with each release code named after a Perl Mongers group. This release is named "Pittsburgh", which is the host for YAPC|10 (YAPC::NA 2009) [2] and the Parrot Virtual Machine Workshop [3]. Pittsburgh.pm has also sponsored hackathons for Rakudo Perl as part of the 2008 Pittsburgh Perl Workshop [4]. In this release of Rakudo Perl, we've focused our efforts on refactoring many of Rakudo's internals; these refactors improve performance, bring us closer to the Perl 6 specification, operate more cleanly with Parrot, and provide a stronger foundation for features to be implemented in the near future. Some of the specific major changes and improvements in this release include: * Rakudo is now passing 11,536 spectests, an increase of 194 passing tests since the May 2009 release. With this release Rakudo is now passing 68% of the available spectest suite. * Method dispatch has been substantially refactored; the new dispatcher is significantly faster and follows the Perl 6 specification more closely. * Object initialization via the BUILD and CREATE (sub)methods is substantially improved. * All return values are now type checked (previously only explicit 'return' statements would perform type checking). * String handling is significantly improved: fewer Unicode-related bugs exist, and parsing speed is greatly improved for some programs containing characters in the Latin-1 set. * The IO .lines and .get methods now follow the specification more closely. * User-defined operators now also receive some of their associated meta variants. * The 'is export' trait has been improved; more builtin functions and methods can be written in Perl 6 instead of PIR. * Many Parrot changes have improved performance and reduced overall memory leaks (although there's still much more improvement needed). The development team thanks all of our contributors and sponsors for making Rakudo Perl possible. If you would like to contribute, see http://rakudo.org/how-to-help , ask on the perl6-compi...@perl.org mailing list, or ask on IRC #perl6 on freenode. The next release of Rakudo (#19) is scheduled for July 23, 2009. A list of the other planned release dates and codenames for 2009 is available in the "docs/release_guide.pod" file. In general, Rakudo development releases are scheduled to occur two days after each Parrot monthly release. Parrot releases the third Tuesday of each month. Have fun! References: [1] Parrot, http://parrot.org/ [2] YAPC|10 http://yapc10.org/yn2009/ [3] Parrot Virtual Machine Workshop, http://yapc10.org/yn2009/talk/2045 [4] Pittsburgh Perl Workshop, http://pghpw.org/ppw2008/
Re: Why pass by reference?
> Matthew Walton wrote: > > If a user of your API contrives to make it change while you're > > running, that's their own foot they've just shot, because they can > > look at the signature and know the semantics of the parameter > > passing being used and know that if they change the value externally > > before you return Bad Things Could Happen. > On Tue, 16 Jun 2009, TSa wrote: > I agree that the caller is responsible for the constness of the value > he gives to a function. With this we get the best performance. At the language level this is wrong. Programmers are BAD at this sort of thing, unless the compiler *always* has enough to throw a compile-time error, and even then it's dicey because we may defer compilation. It seems to me this is pushing something onto the author of the caller that they shouldn't have to deal with, especially when you consider that the parameter they're passing into the function may come from somewhere else, which hasn't been made -- and indeed CAN'T be made -- to promise not to meddle with the value (note *1). If the compiler can't spot it, how do you expect a fallible human being to do so? If a function requires an invariant parameter then the compiler should ensure that that guarantee is met, and not rely on the programmer to do something that is impossibly hard in the general case. A simple way would be to call $parameter := $parameter.INVARIANT() (*2) on the caller's behalf before calling the function. Conversely, when calling a function where the parameter is declared :rw, the compiler can call $parameter := $parameter.LVALUE() (*3) on the caller's behalf first if it needs to convert an immutable object to a mutable one. (Or throw up its hands and assert that it's not allowed.) If we really expect the optimizer to make Perl6 run well on a CPU with 1024 cores (*4), we have to make it easy to write programs that will allow the optimizer to do its job, and (at least a little bit) harder to write programs that defeat the optimizer. To that end I would propose that: - parameters should be read-only AND invariant by default, and - that invariance should be enforced passing a deep immutable clone (*5) in place of any object that isn't already immutable. -Martin Footnotes: *1: There are many possible reasons, but for example the caller didn't declare it :readonly in turn to its callers because it *did* plan to meddle with it -- but just not by calling this function with its :readonly parameter. *2: Yes I made up "INVARIANT". The trick is that the compiler only needs to insert the call if can't prove the invariance of $parameter, which it *can* prove when: - it arrived in a :readonly parameter; or - it's locally scoped, and hasn't "escaped". In addition the implementation of INVARIANT() could: - return $self for any "value" class; and - return the encapsulated immutable object for the case outlined in the following footnote. Otherwise the default implementation of INVARIANT() would be like deepclone(). (Declaring a "value class" would ideally be shorter than declaring a "container class", but I'm a bit stuck as to how to achieve that. Ideas are welcome...) *3: The LVALUE method produces the sort of proxy object that others have described, but with the reverse function: it acts as a scalar container that can only hold immutable objects, and proxies all method calls to it, but allows assignment to replace the contained object. Calling INVARIANT on such a container object simply returns the encapsulated immutable object. *4: As a generalization, the assumptions floating round that "the compiler will optimize things" just aren't facing reality: programmers are about the worst people when it comes to learning from the past mistakes of others, and future generations of Perl6 programmers will inevitably create evil container classes with no corresponding value classes, and thus most parallelizing optimizations will be defeated. *5: At the language level at least, copying is NOT the enemy of optimization. On the contrary, if you always copy and *never* mutate, that ensures that the compiler can always determine the provenance and visibility of any given datum, and thus has *more* opportunities to avoid *actually* copying anything. And it can parallelize to the full extent of available hardware because it can guarantee that updates won't overlap.
Re: Why pass by reference?
On Fri, 19 Jun 2009, Martin D Kealey wrote: > To that end I would propose that: > - parameters should be read-only AND invariant by default, and > - that invariance should be enforced passing a deep immutable clone >(*5) in place of any object that isn't already immutable. Sorry, typo: that last word should have been "invariant", meaning that it *won't* change, rather than "immutable", meaning that it *can't*. Compilers can rely on invariance to perform a range of very powerful optimizations; immutability is one way to guarantee invariance, but not the only way. -Martin