RE: on parrot strings
Jarkko Hietaniemi: About the implementation of character classes: since the Unicode code point range is big, a single big bitmap won't work any more: firstly, it would be big. Secondly, for most cases, it would be wastefully sparse. A balanced binary tree of (begin, end) points of ranges is suggested. That would seem to give the required flexibility and reasonable compromise betwen speed and space for implementing the operations required by both the traditional regular expression character classes (complement, case-ignorance) and the new Unicode character class semantics (difference, intersection) (see the Unicode Technical Report #18, I, http://www.unicode.org/unicode/reports/tr18/ ) Another, possible simpler way would be to use inversion lists: 1-dimensional arrays where odd (starting from zero) indices store the beginnings of ranges belonging to the class, and and even indices store the beginnings of ranges not belonging to the class. Note "array" instead of (a linked) "list": with an array one can do binary search to determine membership (so an inversion list is in effect a flattened binary tree). Yet another way would be to use various two-level table schemes. The choice of the appropriate data structure, as always, depends on the expected operational (read vs modify) mix and the expected data distribution. ### Since I seem to be the main regex hacker for Parrot, I'll respond to this as best I can. Currently, we are using bitmaps for character classes. Well, sort of. A Bitmap in Parrot is defined like this: typedef struct bitmap_t { char* bmp; STRING* bigchars; } Bitmap; Characters <256 are stored as a bitmap in bmp; other characters are stored in bigchars and linear-searched. This is a temporary measure, since Parrot isn't yet dealing with many characters outside of ASCII. Several schemes have been proposed for the final version; I'm currently leaning towards an array of arrays of arrays of bitmaps (one level for each byte of the character): INTVAL ch; return bmp->bmp[FIRST_BYTE(ch)][SECOND_BYTE(ch)][THIRD_BYTE(ch)][FORTH_BYTE(ch) >>3] & (1<<(FORTH_BYTE(ch) & 7)); Ungainly, but it works. It would actually be a bit more complicated--only the arrays that we actually used would be allocated to save space--but you get the idea. (However, I'm quite flexible on the implementation chosen. I'll look at the ideas you propose in more detail; if anyone else has any suggestions, suggest them.) As for character encodings, we're forcing everything to UTF-32 in regular expressions. No exceptions. If you use a string in a regex, it'll be transcoded. I honestly can't think of a better way to guarantee efficient string indexing. --Brent Dax [EMAIL PROTECTED] Parrot Configure pumpking and regex hacker . hawt sysadmin chx0rs This is sad. I know of *a* hawt sysamin chx0r. I know more than a few. obra: There are two? Are you sure it's not the same one?
Does this mean we get Ruby/CLU-style iterators?
Reading this in Apoc 4 sub mywhile ($keyword, &condition, &block) { my $l = $keyword.label; while (&condition()) { &block(); CATCH { my $t = $!.tag; when X::Control::next { die if $t && $t ne $l); next } when X::Control::last { die if $t && $t ne $l); last } when X::Control::redo { die if $t && $t ne $l); redo } } } } Implies to me: A &foo prototype means you can have a bare block anywhere in the arg list (unlike the perl5 syntax). Calling &foo() does *not* effect the callstack, otherwise the above would not properly emulate a while loop. If that's true, can pull off my custom iterators? http:[EMAIL PROTECTED]/msg08343.html Will this: class File; sub foreach ($file, &block) { # yeah, I know. The RFC was all about exceptions and I'm # not using them in this example. open(FILE, $file) || die $!; while() { &block(); } close FILE; } allow this: File.foreach('/usr/dict/words') { print } or would the prototype be (&file, &block)? And would this: my $caller = caller; File.foreach('/usr/dict/words') { print $caller eq caller ? "ok" : "not ok" } be ok or not ok? It has to be ok if mywhile is going to emulate a while loop. -- Michael G. Schwern <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One navy ritual: first caulk the boards of the deck, then plug up my ass. -- japhy
Re: on parrot strings
Thanks, Jarrko. On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote: > The most important message is that give up on 8-bit bytes, already. > Time to move on, chop chop. Do you think/feel/wish/demand that the textual (string) APIs should differ from the binary (byte) APIs? (Both from an internal Parrot perspective and at the language level.) This may be beyond the scope of the document, but do you have an opinion on whether strings need to be entirely encapsulated within a single structure, or whether "virtual" strings (comprising several disparate substrings) are a viable addition? typedef struct { UINTVALsize; UINTVALindex; UINTVALindex_offset; UINTVALlast_offset; UINTVALsize_valid:1; UINTVALoffset_valid:1; UINTVALlast_valid:1; UINTVALcontinued:1; PARROT_STRING string; PARROT_SIZED_STRINGstring_continued; } PARROT_SIZED_STRING This was discussed earlier mostly for alleviating some of the headaches associated with variable-width encodings. -- Bryan C. Warnock [EMAIL PROTECTED]
Re: on parrot strings
On Fri, Jan 18, 2002 at 04:51:07AM -0500, Bryan C. Warnock wrote: > Thanks, Jarrko. > > On Thursday 17 January 2002 23:21, Jarkko Hietaniemi wrote: > > The most important message is that give up on 8-bit bytes, already. > > Time to move on, chop chop. > > Do you think/feel/wish/demand that the textual (string) APIs should differ > from the binary (byte) APIs? (Both from an internal Parrot perspective and > at the language level.) I tried to address this issue at two points in the document, "Of Bits and Bytes", and one paragraph in "TO DO" talking about encoding conversions and I/O. But I guess the answer is "yes and yes", I think the APIs should be different. It pains my UNIX heart but thinking in terms of just bytes was a convenient illusion that worked as long we kept ourselves to 8-bit byte character sets. I think the illusion works no more. > This may be beyond the scope of the document, but do you have an opinion on > whether strings need to be entirely encapsulated within a single structure, > or whether "virtual" strings (comprising several disparate substrings) are a > viable addition? > > typedef struct { >UINTVALsize; >UINTVALindex; >UINTVALindex_offset; >UINTVALlast_offset; >UINTVALsize_valid:1; >UINTVALoffset_valid:1; >UINTVALlast_valid:1; >UINTVALcontinued:1; >PARROT_STRING string; >PARROT_SIZED_STRINGstring_continued; > } PARROT_SIZED_STRING First off, I think virtual strings (if you define strings as "a linear collection of characters (or bytes)" are a great idea, that's why I suggested them a while ago even in the context of Perl 5 (though I admit I also simply liked the proposed name: VVs...) But I also think they are high-level enough that they probably should not be any of the low-level string structures. For example: one nifty thing you can do with virtual strings is that they can be read-only windows to another string, and I don't think the read-onlyness flag belongs to the low-level strings: it's something coming from above. Similarly from virtual strings composed of slices of several other strings: how do you manage the book-keeping of these other strings? Too complex: let's keep the low-level, ummm, low-level. > This was discussed earlier mostly for alleviating some of the headaches > associated with variable-width encodings. If we keep the low-level limited to just a handful of encodings (I proposed three), and the variable encodings well-behaved (UTF-8 as opposed to the gnarlier ones), I don't think the burden will be too bad. > -- > Bryan C. Warnock > [EMAIL PROTECTED] -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: on parrot strings
> Since I seem to be the main regex hacker for Parrot, I'll respond to > this as best I can. > > Currently, we are using bitmaps for character classes. Well, sort of. > A Bitmap in Parrot is defined like this: > > typedef struct bitmap_t { > char* bmp; > STRING* bigchars; > } Bitmap; > > Characters <256 are stored as a bitmap in bmp; other characters are > stored in bigchars and linear-searched. This is a temporary measure, This is similar to how Perl 5 does them: the low eight bits are in a 32-byte bitmap, the "wide characters" are stored after it (in a funky data structure, I won't go into more detail so that people won't lose their lunch/breakfast/meal) > since Parrot isn't yet dealing with many characters outside of ASCII. > Several schemes have been proposed for the final version; I'm currently > leaning towards an array of arrays of arrays of bitmaps (one level for > each byte of the character): > > INTVAL ch; > return > bmp->bmp[FIRST_BYTE(ch)][SECOND_BYTE(ch)][THIRD_BYTE(ch)][FORTH_BYTE(ch) > >>3] & (1<<(FORTH_BYTE(ch) & 7)); dup + dup * ... oh, you meant FOURTH. > Ungainly, but it works. It would actually be a bit more > complicated--only the arrays that we actually used would be allocated to > save space--but you get the idea. (However, I'm quite flexible on the > implementation chosen. I'll look at the ideas you propose in more > detail; if anyone else has any suggestions, suggest them.) Ungainly, yes. (1) There are 5.125 bytes in Unicode, not four. (2) I think the above would suffer from the same problem as one common suggestion, two-level bitmaps (though I think the above would suffer less, being of finer granularity): the problem is that a lot of space is wasted, since the "usage patterns" of Unicode character classes tend to be rather scattered and irregular. Yes, I see that you said: "only the arrays that we actually used would be allocated to save space"-- which reads to me: much complicated logic both in creation and access to make the data structure *look* simple. I'm a firm believer in getting the data structures right, after which the code to access them almost writes itself. I would suggest the inversion lists for the first try. As long as character classes are not very dynamic once they have been created (and at least traditionally that has been the case), inversion lists should work reasonably well. > As for character encodings, we're forcing everything to UTF-32 in > regular expressions. No exceptions. If you use a string in a regex, > it'll be transcoded. I honestly can't think of a better way to > guarantee efficient string indexing. I'm fine with that. The bloat is of course a shame, but as long as that's not a real problem for someone, let's not worry about it too much. > --Brent Dax > [EMAIL PROTECTED] > Parrot Configure pumpking and regex hacker > > . hawt sysadmin chx0rs > This is sad. I know of *a* hawt sysamin chx0r. > I know more than a few. > obra: There are two? Are you sure it's not the same one? -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Does this mean we get Ruby/CLU-style iterators?
Michael G Schwern <[EMAIL PROTECTED]> writes: > Reading this in Apoc 4 > > sub mywhile ($keyword, &condition, &block) { > my $l = $keyword.label; > while (&condition()) { > &block(); > CATCH { > my $t = $!.tag; > when X::Control::next { die if $t && $t ne $l); next } > when X::Control::last { die if $t && $t ne $l); last } > when X::Control::redo { die if $t && $t ne $l); redo } > } > } > } > > Implies to me: > > A &foo prototype means you can have a bare block anywhere in the > arg list (unlike the perl5 syntax). > > Calling &foo() does *not* effect the callstack, otherwise the > above would not properly emulate a while loop. > > If that's true, can pull off my custom iterators? > http:[EMAIL PROTECTED]/msg08343.html > > Will this: > > class File; > sub foreach ($file, &block) { > # yeah, I know. The RFC was all about exceptions and I'm > # not using them in this example. > open(FILE, $file) || die $!; > > while() { > &block(); > } > > close FILE; > } Hmm... making up some syntax on the fly. I sort of like the idea of being able to do class File; sub foreach ($file, &block) is Control { # 'is Control' declares this as a control sub, which, amongst # other things 'hides' itself from caller. (We can currently # do something like this already using Hooks::LexWrap type # tricks. open my $fh, $file or die $!; POST { close $fh } while () { my @ret = wantarray ?? list &block() :: (scalar &block()); given $! { when c::RETURN { return wantarray ?? @ret :: @ret[0] } } } } This is, of course, dependent on $! not being set to a RETURN control 'exception' in the case where we just fall off the end of the block. It's also dependent on being able to get continuations from caller (which would be *so* cool > allow this: > > File.foreach('/usr/dict/words') { print } Sounds plausible to me. > or would the prototype be (&file, &block)? I prefer the ($file, &block) prototype. > And would this: > > my $caller = caller; > File.foreach('/usr/dict/words') { > print $caller eq caller ? "ok" : "not ok" > } > > be ok or not ok? It has to be ok if mywhile is going to emulate a > while loop. In theory there's nothing to stop you writing it so that that is the case. I'd like it to be as simple as adding an attribute to the function declaration (and if it isn't that simple out of the box, it will almost certainly be, if not trivial, at least possible to write something to *make* it that simple...) -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?
Apoc 4?
Michael G Schwern wrote: > Reading this in Apoc 4 ... I looked on http://dev.perl.org/perl6/apocalypse/: no sign of Apoc4. Where do I find this latest installment? Dave.
Re: Apoc 4?
http://www.perl.com/pub/a/2002/01/15/apo4.html David Whipp wrote: > > Michael G Schwern wrote: > > > Reading this in Apoc 4 ... > > I looked on http://dev.perl.org/perl6/apocalypse/: no sign of Apoc4. Where > do I find this latest installment? > > Dave.
Re: Apoc 4?
>Michael G Schwern wrote: > >> Reading this in Apoc 4 ... > >I looked on http://dev.perl.org/perl6/apocalypse/: no sign of Apoc4. Where >do I find this latest installment? www.perl.com. dev.perl.org must just not have a link yet. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: on parrot strings
> (1) There are 5.125 bytes in Unicode, not four. > (2) I think the above would suffer from the same problem as one common > suggestion, two-level bitmaps (though I think the above would suffer > less, being of finer granularity): the problem is that a lot of > space is wasted, since the "usage patterns" of Unicode character > classes tend to be rather scattered and irregular. Yes, I see > that you said: "only the arrays that we actually used would be > allocated to save space"-- which reads to me: much complicated > logic both in creation and access to make the data > structure *look* > simple. I'm a firm believer in getting the data structures right, > after which the code to access them almost writes itself. > > I would suggest the inversion lists for the first try. As long as > character classes are not very dynamic once they have been created > (and at least traditionally that has been the case), inversion lists > should work reasonably well. My proposal is we should use mix method. The Unicode standard class, such as \p{IsLu}, can be handled by a standard splitbin table. Please see Java java.lang.Character or Python unicodedata_db.h. I did measurement on it, to handle all unicode category, simple casing, and decimal digit value, I need about 23KB table for Unicode 3.1 (0x0 to 0x10), about 15KB for (0x0 to 0x). For simple character class, such as [\p{IsLu}\p{InGreak}], the regex does not need to emit optimized bitmap. Instead, the regex just generate an union, the first one will use standard unicode category lookup, the second one is a simple range. If user mandate to use fast bitmap, and the character class is not extremely complicated, we will only probably need about several K for each char class. > > As for character encodings, we're forcing everything to UTF-32 in > > regular expressions. No exceptions. If you use a string in a regex, > > it'll be transcoded. I honestly can't think of a better way to > > guarantee efficient string indexing. I don't think UTF-32 will save you much. The unicode case map is variable length, combining character, canonical equivalence, and many other thing will require variable length mapping. For example, if I only want to parse /[0-9]+/, why you want to convert everything to UTF-32. Most of time, the regcomp() can find out whether this regexp will need complicated preprocessing. Another example, if I want to search for /resume/e, (equivalent matching), the regex engine can normalize the case, fully decompose input string, strip off any combining character, and do 8-bit Boyer-Moore search. I bet it will be simpler and faster than using UTF-32. (BTW, the equivalent matching means match English spelling against French spell, disregarding diacritics.) I think we should explore more choices and do some experiments. Hong
Re: on parrot strings
> I don't think UTF-32 will save you much. The unicode case map is variable > length, combining character, canonical equivalence, and many other thing > will require variable length mapping. For example, if I only want to This is true. > parse /[0-9]+/, why you want to convert everything to UTF-32. Most of > time, the regcomp() can find out whether this regexp will need complicated > preprocessing. Another example, if I want to search for /resume/e, > (equivalent matching), the regex engine can normalize the case, fully > decompose input string, strip off any combining character, and do 8-bit Hmmm. The above sounds complicated not quite what I had in mind for equivalence matching: I would have just said "both the pattern and the target need to normalized, as defined by Unicode". Then the comparison and searching reduce to the trivial cases of byte equivalence and searching (of which B-M is the most popular example). > Boyer-Moore search. I bet it will be simpler and faster than using UTF-32. > (BTW, the equivalent matching means match English spelling against French > spell, disregarding diacritics.) > > I think we should explore more choices and do some experiments. What do you mean by *we*? :-) I am not a p6-internals regular, nor do I intend to, there are only so many hours in a day. But yes, the sooner we get into exploration/experiment mode, the better. The Unicode mindset *must* be adopted sooner rather than later, "unwriting" 8-bit-byteism out of the code later is hell. Hopefully my little treatise will kick Parrot more or less in the right direction. > Hong -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: on parrot strings
On Fri, Jan 18, 2002 at 11:44:00AM -0800, Hong Zhang wrote: > > (1) There are 5.125 bytes in Unicode, not four. > > (2) I think the above would suffer from the same problem as one common > > suggestion, two-level bitmaps (though I think the above would suffer > > less, being of finer granularity): the problem is that a lot of > > space is wasted, since the "usage patterns" of Unicode character > > classes tend to be rather scattered and irregular. Yes, I see > > that you said: "only the arrays that we actually used would be > > allocated to save space"-- which reads to me: much complicated > > logic both in creation and access to make the data > > structure *look* > > simple. I'm a firm believer in getting the data structures right, > > after which the code to access them almost writes itself. > > > > I would suggest the inversion lists for the first try. As long as > > character classes are not very dynamic once they have been created > > (and at least traditionally that has been the case), inversion lists > > should work reasonably well. > > My proposal is we should use mix method. The Unicode standard class, > such as \p{IsLu}, can be handled by a standard splitbin table. Please > see Java java.lang.Character or Python unicodedata_db.h. I did > measurement on it, to handle all unicode category, simple casing, > and decimal digit value, I need about 23KB table for Unicode 3.1 > (0x0 to 0x10), about 15KB for (0x0 to 0x). Don't try to compete with inversion lists on the size: their size is measured in bytes. For example "Latin script", which consists of 22 separate ranges sprinkled between U+0041 and U+FF5A, encodes into 44 ints, or 176 bytes. Searching for membership in an inversion list is O(N log N) (binary search). "Encoding the whole range" is a non-issue bordering on a joke: two ints, or 8 bytes. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
RE: on parrot strings
> > preprocessing. Another example, if I want to search for /resume/e, > > (equivalent matching), the regex engine can normalize the case, fully > > decompose input string, strip off any combining character, and do 8-bit > > Hmmm. The above sounds complicated not quite what I had in mind > for equivalence matching: I would have just said "both the pattern > and the target need to normalized, as defined by Unicode". Then > the comparison and searching reduce to the trivial cases of byte > equivalence and searching (of which B-M is the most popular example). You are right in some sense. But "normalized, as defined by Unicode" may not be simple. I look at unicode regex tr18. It does not specify equivalence of "resume" vs "re`sume`", but user may want or may not want this kind of normalization. Hong
RE: on parrot strings
> > My proposal is we should use mix method. The Unicode standard class, > > such as \p{IsLu}, can be handled by a standard splitbin table. Please > > see Java java.lang.Character or Python unicodedata_db.h. I did > > measurement on it, to handle all unicode category, simple casing, > > and decimal digit value, I need about 23KB table for Unicode 3.1 > > (0x0 to 0x10), about 15KB for (0x0 to 0x). > > Don't try to compete with inversion lists on the size: their size is > measured in bytes. For example "Latin script", which consists of 22 > separate ranges sprinkled between U+0041 and U+FF5A, encodes into 44 > ints, or 176 bytes. Searching for membership in an inversion list is > O(N log N) (binary search). "Encoding the whole range" is a non-issue > bordering on a joke: two ints, or 8 bytes. When I said mixed method, I did intend to include binary search. The binary search is a win for sparse character class. But bitmap is better for large one. Python uses two level bitmap for first 64K character. Hong
Ex4, Apo5, when ?
Did u passed "Bermuda Triangle" :") raptor
Re: Ex4, Apo5, when ?
At 10:16 AM +0200 1/18/02, raptor wrote: >Did u passed "Bermuda Triangle" :") It may be a bit before Ex4 is done. Damian's on a cruise ship at the moment, so even if he's got the time (and I don't think he does) he's likely lacking connectivity. I expect he'll give us word at some point what the schedule is. As for A5, that's up to Larry's schedule. It's the RE apocalypse, though, so should hopefully be a bit less brain-bending. (And thus done sooner) No promises, of course, as I'm not Larry. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: on parrot strings
On Fri, Jan 18, 2002 at 12:20:53PM -0800, Hong Zhang wrote: > > > My proposal is we should use mix method. The Unicode standard class, > > > such as \p{IsLu}, can be handled by a standard splitbin table. Please > > > see Java java.lang.Character or Python unicodedata_db.h. I did > > > measurement on it, to handle all unicode category, simple casing, > > > and decimal digit value, I need about 23KB table for Unicode 3.1 > > > (0x0 to 0x10), about 15KB for (0x0 to 0x). > > > > Don't try to compete with inversion lists on the size: their size is > > measured in bytes. For example "Latin script", which consists of 22 > > separate ranges sprinkled between U+0041 and U+FF5A, encodes into 44 > > ints, or 176 bytes. Searching for membership in an inversion list is > > O(N log N) (binary search). "Encoding the whole range" is a non-issue > > bordering on a joke: two ints, or 8 bytes. > > When I said mixed method, I did intend to include binary search. The binary > search is a win for sparse character class. But bitmap is better for large > one. "Better" in what sense? Smaller? Certainly not. Faster? Maybe, maybe not. Yes, accessing the right bytes and doing the bit arithmetics is about as fast as one can hope doing anything in CPUs. But: the 15KB is quite a lot of stuff to move around for, say, [0-9]. Yes, bitmaps win in pathological cases where you, say, choose every other character of the Unicode. I guess I agree with you that a combination of bitmaps and binary searchable things (inversion lists or trees) is good, but I guess we differ in that my gut feeling is that the latter should be the default, not the bitmaps. I also think this low-level detail should be completely hidden from, say, the writers of the regex engine, all they should see is code_point_in_class(cp, cc), and that the low-level "character class engine" should dynamically pick whichever low-level implementation is "best", and naturally that only one of the low-level implementations is being used (for one character class) at a time: hybrids (meaning dual book-keeping) sound to me like a fruitful breeding area for bugs. > Python uses two level bitmap for first 64K character. And their Unicode implementation is doing how well? :-) > Hong -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Ex4, Apo5, when ?
On Fri, Jan 18, 2002 at 03:35:59PM -0500, Dan Sugalski wrote: > At 10:16 AM +0200 1/18/02, raptor wrote: > >Did u passed "Bermuda Triangle" :") > > It may be a bit before Ex4 is done. Damian's on a cruise ship at the > moment, so even if he's got the time (and I don't think he does) he's > likely lacking connectivity. I expect he'll give us word at some > point what the schedule is. They've got connectivity all right. We've been getting plenty of drunken ramblings on IRC from folks on the cruise. -- Michael G. Schwern <[EMAIL PROTECTED]>http://www.pobox.com/~schwern/ Perl Quality Assurance <[EMAIL PROTECTED]> Kwalitee Is Job One Your average appeasement engineer is about as clued-up on computers as the average computer "hacker" is about B.O. -- BOFH
Re: Ex4, Apo5, when ?
At 4:17 PM -0500 1/18/02, Michael G Schwern wrote: >On Fri, Jan 18, 2002 at 03:35:59PM -0500, Dan Sugalski wrote: >> At 10:16 AM +0200 1/18/02, raptor wrote: >> >Did u passed "Bermuda Triangle" :") >> >> It may be a bit before Ex4 is done. Damian's on a cruise ship at the >> moment, so even if he's got the time (and I don't think he does) he's >> likely lacking connectivity. I expect he'll give us word at some >> point what the schedule is. > >They've got connectivity all right. We've been getting plenty of >drunken ramblings on IRC from folks on the cruise. Well, so much for *that* excuse. :) Bet they're still hard up for free time, though. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Apo4: PRE, POST
Apo4, when introducing POST, mentions that there is a corresponding "PRE" block "for design-by-contract programmers". However, I see the POST block being used as a finalize; and thus allowing (encouraging?) it to have side effects. I can't help feeling that contract/assertion checking should not have side effects. Furthermore, there should be options to turn off PRE/POST processing for higher performance. Perhaps we'll learn more about contracts (inc. invariants, inheritance) in a later apo? Will we still use the Class::Contract module? Dave. -- Dave Whipp, Senior Verification Engineer, Fast-Chip inc., 950 Kifer Rd, Sunnyvale, CA. 94086 tel: 408 523 8071; http://www.fast-chip.com Opinions my own; statements of fact may be in error.
Re: on parrot strings
On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote: > ints, or 176 bytes. Searching for membership in an inversion list is > O(N log N) (binary search). "Encoding the whole range" is a non-issue > bordering on a joke: two ints, or 8 bytes. [Clarification from a noncombatant] You meant O(log N). I like the inversion list idea. But its speed is proportional to the toothiness of the character class, and while I have good intuition for what that means in 7-bit US-ASCII, I have no idea how bad it gets for other languages. "Vowels"? "Capital letters"? Would anyone ever want to select all Chinese characters with a particular radical? That's just lookup. We should also consider other character class operations: union, subtraction, intersection. They're pretty straightforward and fast (O(N)) for inversion lists. (Yes, all these operations can be postponed until lookup time, regardless of the underlying represention, in which case the time of union(C1,C2) is just the time of C1 + time of C2 + time of an 'or'.)
Re: on parrot strings
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote: > On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote: > > ints, or 176 bytes. Searching for membership in an inversion list is > > O(N log N) (binary search). "Encoding the whole range" is a non-issue > > bordering on a joke: two ints, or 8 bytes. > > [Clarification from a noncombatant] You meant O(log N). Duh, yes. At least someone is awake :-) > I like the inversion list idea. But its speed is proportional to the > toothiness of the character class, and while I have good intuition for Yup. > what that means in 7-bit US-ASCII, I have no idea how bad it gets for > other languages. "Vowels"? "Capital letters"? Would anyone ever want As far as I can see, and guestimate (watch out for waving hands), it would behave pretty well In Real Life. If we are talking about the predefined existing categories like Lu, or Greek script, or Cyrillic block, they are pretty well localized and not scattershot. User-specified characters are likely to be well localized to one or few scripts. > to select all Chinese characters with a particular radical? > > That's just lookup. We should also consider other character class > operations: union, subtraction, intersection. They're pretty > straightforward and fast (O(N)) for inversion lists. (Yes, all these Yes, since they are by definition sorted, merging (or negatively merging) them is pretty simple. > operations can be postponed until lookup time, regardless of the > underlying represention, in which case the time of union(C1,C2) is > just the time of C1 + time of C2 + time of an 'or'.) -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
RE: Apo4: PRE, POST
From: David Whipp [mailto:[EMAIL PROTECTED]] > > Apo4, when introducing POST, mentions that there is a > corresponding "PRE" block "for design-by-contract > programmers". > > However, I see the POST block being used as a finalize; > and thus allowing (encouraging?) it to have side effects. It may very well be the case that a procedure's POST block could have side effects. However, if Larry and Damian are on the same frequency... then a _method_'s PRE/POST blocks will not have side effects. At least that is what I perhaps incorrectly inferred from one of previous discussion which Damian participated in on the perl6-language list about subroutine wrappers, Hook::LexWrapper, or whatever the means to the ends were in that thread. > I can't help feeling that contract/assertion checking > should not have side effects. Furthermore, there should > be options to turn off PRE/POST processing for higher > performance. Perhaps we'll learn more about contracts > (inc. invariants, inheritance) in a later apo? I hope so. I am particularly interested to hear how PRE/POST blocks will work in the context of methods and inheritence. > Will we still use the Class::Contract module? Your guess is as good as mine. It looks like there will be fewer reasons for most people to use it. Especially if all you need is assertions. IMO: Its nice just to hear Larry say "design-by-contract" programmers, and know that he's still talking about Perl ;) We'll just have to see how Perl6 DBC support works out with regards to encapsulation, inheritence, and Class::Contract's other odds and ends. But I imagine support for things like a POST block checking an object against its previous state via &old, and things like shortening and flattening, etc. will still require a Class::Contract.
Re: Does this mean we get Ruby/CLU-style iterators?
At 3:37 PM + 1/18/02, Piers Cawley wrote: >Michael G Schwern <[EMAIL PROTECTED]> writes: > >Hmm... making up some syntax on the fly. I sort of like the idea of >being able to do > > class File; > sub foreach ($file, &block) is Control { > # 'is Control' declares this as a control sub, which, amongst > # other things 'hides' itself from caller. (We can currently > # do something like this already using Hooks::LexWrap type > # tricks. > > open my $fh, $file or die $!; POST { close $fh } > > while () { > my @ret = wantarray ?? list &block() :: (scalar &block()); > given $! { > when c::RETURN { return wantarray ?? @ret :: @ret[0] } > } > } > } > >This is, of course, dependent on $! not being set to a RETURN control >'exception' in the case where we just fall off the end of the block. I don't think you'll see $! being set to anything other than real errors. Larry may change that, but I'd doubt it. It's more a global status than anything else. Exceptions would go elsewhere, I'd hope. I personally would like to see subs be taggable as transparent to yielding, so if you call a sub, and it calls a sub, that inner sub could yied out of the caller if the caller was transparent. Not, mind, that the scheme doesn't have issues, but... >It's also dependent on being able to get continuations from caller >(which would be *so* cool) For some brainwarping version of cool. :) > > allow this: >> >> File.foreach('/usr/dict/words') { print } > >Sounds plausible to me. > >> or would the prototype be (&file, &block)? > >I prefer the ($file, &block) prototype. I think it'll be ($file, &block), as that makes the most sense. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: on parrot strings
On Fri, Jan 18, 2002 at 01:40:26PM -0800, Steve Fink wrote: > On Fri, Jan 18, 2002 at 10:08:40PM +0200, Jarkko Hietaniemi wrote: > > ints, or 176 bytes. Searching for membership in an inversion list is > > O(N log N) (binary search). "Encoding the whole range" is a non-issue > > bordering on a joke: two ints, or 8 bytes. > > [Clarification from a noncombatant] You meant O(log N). > > I like the inversion list idea. But its speed is proportional to the > toothiness of the character class, and while I have good intuition for > what that means in 7-bit US-ASCII, I have no idea how bad it gets for > other languages. "Vowels"? "Capital letters"? Would anyone ever want > to select all Chinese characters with a particular radical? > > That's just lookup. We should also consider other character class > operations: union, subtraction, intersection. They're pretty Complement of an inversion list is neat: insert 0 at the beginning (and append max+1), unless there already is one, in which case delete the 0 (and shift the list and delete the max+1). Again, O(N). (One could of course have a bit for a 'negative character class', but that would in turn complicate the computations.) -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: on parrot strings
On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote: > Complement of an inversion list is neat: insert 0 at the beginning > (and append max+1), unless there already is one, in which case delete > the 0 (and shift the list and delete the max+1). Again, O(N). > (One could of course have a bit for a 'negative character class', > but that would in turn complicate the computations.) If we have hybrid notation, we'll be stuck with not only a bit for that, but also a complete expression tree for character classes. (Which is necessary if we use a Unicode library that only exposes property test functions, not numeric ranges.) We *do* want to have (with some notation) [[:digit:]\p{FunkyLooking}aeiou except 7], right?
Re: on parrot strings
On Fri, Jan 18, 2002 at 02:22:49PM -0800, Steve Fink wrote: > On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote: > > Complement of an inversion list is neat: insert 0 at the beginning > > (and append max+1), unless there already is one, in which case delete > > the 0 (and shift the list and delete the max+1). Again, O(N). > > (One could of course have a bit for a 'negative character class', > > but that would in turn complicate the computations.) > > If we have hybrid notation, we'll be stuck with not only a bit for > that, but also a complete expression tree for character classes. > (Which is necessary if we use a Unicode library that only exposes > property test functions, not numeric ranges.) > > We *do* want to have (with some notation) > [[:digit:]\p{FunkyLooking}aeiou except 7], right? Of course. But that is all resolvable in regex compile time. No expression tree needed. [[:digit:]\p{FunkyLooking}aeiou$FooBar] is an ickier case, but even there the constant parts can be resolved in regex compile time. (Don't say "locales" or I'll ha've have to hurt you, for your own good. :-) -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: on parrot strings
On Sat, Jan 19, 2002 at 12:28:15AM +0200, Jarkko Hietaniemi wrote: > On Fri, Jan 18, 2002 at 02:22:49PM -0800, Steve Fink wrote: > > On Sat, Jan 19, 2002 at 12:11:06AM +0200, Jarkko Hietaniemi wrote: > > > Complement of an inversion list is neat: insert 0 at the beginning > > > (and append max+1), unless there already is one, in which case delete > > > the 0 (and shift the list and delete the max+1). Again, O(N). > > > (One could of course have a bit for a 'negative character class', > > > but that would in turn complicate the computations.) > > > > If we have hybrid notation, we'll be stuck with not only a bit for > > that, but also a complete expression tree for character classes. > > (Which is necessary if we use a Unicode library that only exposes > > property test functions, not numeric ranges.) > > > > We *do* want to have (with some notation) > > [[:digit:]\p{FunkyLooking}aeiou except 7], right? > > Of course. But that is all resolvable in regex compile time. > No expression tree needed. My point was that if inversion lists are insufficient for describing all the character classes we might be interested in, then we'll need the tree. And an example of why inversion lists would be insufficient is if we have a character API that only allows queries of the sort "is this character FunkyLooking or not?", rather than "what ranges of characters are FunkyLooking?" (Unless you want to do "is 0 FunkyLooking? is 1 FunkyLooking? ... is 4294967295 FunkyLooking?" at compile time.) > compile time. (Don't say "locales" or I'll ha've have to hurt you, > for your own good. :-) Was the ' in ha've unintentional, or is that an acute accent mark? :-)
Re: on parrot strings
> > > We *do* want to have (with some notation) > > > [[:digit:]\p{FunkyLooking}aeiou except 7], right? > > > > Of course. But that is all resolvable in regex compile time. > > No expression tree needed. > > My point was that if inversion lists are insufficient for describing > all the character classes we might be interested in, then we'll need > the tree. And an example of why inversion lists would be insufficient > is if we have a character API that only allows queries of the sort "is > this character FunkyLooking or not?", rather than "what ranges of > characters are FunkyLooking?" (Unless you want to do "is 0 > FunkyLooking? is 1 FunkyLooking? ... is 4294967295 FunkyLooking?" at > compile time.) I think the answer to that dilemma is obvious: we do want an API that tells which ranges FunkyLooking covers and guess what: the answers to such questions can be represented as inversion lists. > > compile time. (Don't say "locales" or I'll ha've have to hurt you, > > for your own good. :-) > > Was the ' in ha've unintentional, or is that an acute accent mark? :-) I was aiming for pirate accent. Arr. Discussing parrots and all. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: Apo4: PRE, POST
> [concerns over conflation of post-processing and post-assertions] Having read A4 thoroughly, twice, this was my only real concern (which contrasted with an overall sense of "wow, this is so cool"). --me
Re: [PATCH] gcc -ansi -pedantic unrealistically strict [APPLIED]
At 12:51 PM -0500 1/15/02, Andy Dougherty wrote: >I think the optimal fix here is simply to remove -ansi -pedantic. >-ansi may well have some uses, but even the gcc man pages say >"There is no reason to use this option [-pedantic]; it exists only >to satisfy pedants." Applied. thanks. (Though I have to believe there's some reason for pedantic) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [OBNOXIOUS PATCH] docs/running.pod [APPLIED]
At 9:30 AM -0800 1/15/02, Steve Fink wrote: >This patch add docs/running.pod, which lists the various executables >Parrot currently includes, examples of running them, and mentions of >where they fail to work. It's more of a cry for help than a useful >reference. :-) I've been having trouble recently when making changes >in figuring out whether I broke anything, because any non-default way >of running the system seems to be already broken. I can't tell what >brokenness is expected and what isn't. Applied, with some chagrin. Thanks. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Does this mean we get Ruby/CLU-style iterators?
Dan Sugalski <[EMAIL PROTECTED]> writes: > At 3:37 PM + 1/18/02, Piers Cawley wrote: >>Michael G Schwern <[EMAIL PROTECTED]> writes: >> >>Hmm... making up some syntax on the fly. I sort of like the idea of >>being able to do >> >> class File; >> sub foreach ($file, &block) is Control { >> # 'is Control' declares this as a control sub, which, amongst >> # other things 'hides' itself from caller. (We can currently >> # do something like this already using Hooks::LexWrap type >> # tricks. >> >> open my $fh, $file or die $!; POST { close $fh } >>while () { >> my @ret = wantarray ?? list &block() :: (scalar &block()); >> given $! { >> when c::RETURN { return wantarray ?? @ret :: @ret[0] } >> } >> } >> } >> >>This is, of course, dependent on $! not being set to a RETURN control >>'exception' in the case where we just fall off the end of the block. > > I don't think you'll see $! being set to anything other than real > errors. Larry may change that, but I'd doubt it. It's more a global > status than anything else. Exceptions would go elsewhere, I'd hope. Um... I'm not sure that's how I read the Apocalypse. And if it doesn't get set how on earth are we going to be able to tell how a block exited in the case of home rolled looping/iterating constructs where we're going to want to write: sub foo { ... File.foreach($file_path) { ... return ($someval) if /some_pattern/; ... } } and have foo return. Maybe we'll have to have something like: while () { try { temp c::RETURN is Error; temp c::NEXT is Error; temp c::REDO is Error; temp c::LAST is Error; wantarray ?? list &block() :: (scalar &block()); DEFAULT { throw }; } } Then, because the control structures are temporarily Errors within the scope of the try block they get thrown up to the first thing that can handle them. In the case of NEXT/REDO/LAST, that's the while loop, and in the case of the RETURN, that's the enclosing subroutine. But it seems kludgy as hell. > I personally would like to see subs be taggable as transparent to > yielding, so if you call a sub, and it calls a sub, that inner sub > could yied out of the caller if the caller was transparent. Not, mind, > that the scheme doesn't have issues, but... >[...] >>It's also dependent on being able to get continuations from caller >>(which would be *so* cool) > > For some brainwarping version of cool. :) Hmm... the example I wrote which might possibly have used continuations got wiped 'cos I realised I wasn't exactly clear on how they were going to work. But I still think being able to grab a continuation from up the stack somewhere could be handy, allowing syntax like: &block.call_from($continuation); Which is sort of nice, and sort of really, really evil. The thing is, given continuations and $continuation.want (so I can work out what context the continuation called in...) I can see how to implement it: class BLOCK; sub call_from ($continuation) { given $continuation.want { when LIST { $continuation.return(list .yield) } default { $continuation.return(scalar .yield) } } } Of course, I could have got *completely* the wrong end of the stick about continuations. And this example doesn't do the 'right thing' for caller, but hey, it's a start. -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?
Benchmarking regexps against perl5
A thought occurred to me a few days ago: If I remember correctly, attempts to benchmark parrot's developing regular expressions against perl's regular expressions are proving "disappointing". However, perl5 has the advantage of a regular expression optimiser as I understand it, or at least code to work out the optimal place to start a match, and interesting strategies to discard things that never match. How hard is it to "knobble" a perl5 to disable the regular expression optimiser? Surely that would level the playing field, so that parrot's regexp engine speed would be directly comparable with perl's regexp engine speed? And then later perl5 be allowed its optimiser back once parrot has one. Nicholas Clark -- ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html
Re: on parrot strings
On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote: > > As for character encodings, we're forcing everything to UTF-32 in > > regular expressions. No exceptions. If you use a string in a regex, > > it'll be transcoded. I honestly can't think of a better way to > > guarantee efficient string indexing. > > I'm fine with that. The bloat is of course a shame, but as long as > that's not a real problem for someone, let's not worry about it too > much. Forcing everything to UTF-32 in the API? Or just forcing everything to UTF-32 until perl 6.0 is released, as trying to do UTF-8 (and UTF-16 ...) regexps now is premature optimisation? To me it seems that making UTF-32 do everything correctly which the real world can use while encoding optimised versions are written is better than having a snazzy 4 encoding autoswitcher that is wrong and therefore not releasable to the world. But I don't know about how the internals of all these things work, so I may well be wrong on any technical detail. Nicholas Clark -- ENOCHOCOLATE http://www.ccl4.org/~nick/CV.html
Re: Apo4: PRE, POST
"Me" <[EMAIL PROTECTED]> writes: >> [concerns over conflation of post-processing and post-assertions] > > Having read A4 thoroughly, twice, this was my only real concern > (which contrasted with an overall sense of "wow, this is so cool"). I think that people have sort of got used to the fact that Perl 6 is not going to look quite as much like perl5 as they thought it was going to. Either that or they've all buggered off... Personally I'm loving it. The small changes in the syntax are all coming together to give us something that's going to be far easier to parse (and therefore far easier to mess with syntacticly, which is what excites me; I've long had the mathematician's view that stuff becomes so much easier when you have the right notation. A more mutable perl means that I build myself the right notation and then solve the problem -- I want to invent my own syntactic sugar if that makes sense...) -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?
A question
Okay boys and girls, what does this print: my @aaa = qw/1 2 3/; my @bbb = @aaa; try { print "$_\n"; } for @aaa; @bbb -> my $a; my $b { print "$a:$b"; } I'm guessing one of: 1:1 2:2 3:3 or a syntax error, complaining about something near C<@bbb -> my $a ; my $b {> In other words, how does the parser distinguish between postfix for followed by a semicolon, and the new semicolon enhanced 'normal' for? -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?
Re: on parrot strings
On Fri, Jan 18, 2002 at 11:40:17PM +, Nicholas Clark wrote: > On Fri, Jan 18, 2002 at 05:24:00PM +0200, Jarkko Hietaniemi wrote: > > > > As for character encodings, we're forcing everything to UTF-32 in > > > regular expressions. No exceptions. If you use a string in a regex, > > > it'll be transcoded. I honestly can't think of a better way to > > > guarantee efficient string indexing. > > > > I'm fine with that. The bloat is of course a shame, but as long as > > that's not a real problem for someone, let's not worry about it too > > much. > > Forcing everything to UTF-32 in the API? I think Brent meant UTF-32 internally for the regexen. When you say /a/, Parrot sees 0x00 0x00 0x00 0x41. > To me it seems that making UTF-32 do everything correctly which the real > world can use while encoding optimised versions are written is better than > having a snazzy 4 encoding autoswitcher that is wrong and therefore not > releasable to the world. Now, now. But yes, maybe selecting *one* first (and getting its implementation right) would be good, and in that case it's either UTF-16 (which is reasonably compact, but variable length), or UTF-32 (which is a bit asteful, but fixed length, and therefore easy to think in). So I guess UTF-32 wins. > But I don't know about how the internals of all these things work, so I > may well be wrong on any technical detail. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Re: A question
That particular example is flawed, because the try expression is turned into a try statement because the } stands alone on its line. But if you eliminate a couple newlines between } and for, then your question makes sense (but the code is not well structured, but hey, maybe you take out all the newlines for a one-liner...). The answer in that case is probably a syntax error, and to avoid it, you put a ; between the } and the for. Piers Cawley wrote: > Okay boys and girls, what does this print: > > my @aaa = qw/1 2 3/; > my @bbb = @aaa; > > try { > print "$_\n"; > } > > for @aaa; @bbb -> my $a; my $b { > print "$a:$b"; > } > > I'm guessing one of: > 1:1 > 2:2 > 3:3 > > or a syntax error, complaining about something near > C<@bbb -> my $a ; my $b {> > > In other words, how does the parser distinguish between postfix for > followed by a semicolon, and the new semicolon enhanced 'normal' for? > > -- > Piers > >"It is a truth universally acknowledged that a language in > possession of a rich syntax must be in need of a rewrite." > -- Jane Austen? -- Glenn = Due to the current economic situation, the light at the end of the tunnel will be turned off until further notice.
Re: Apo4: PRE, POST
Me wrote: > > [concerns over conflation of post-processing and post-assertions] > > Having read A4 thoroughly, twice, this was my only real concern > (which contrasted with an overall sense of "wow, this is so cool"). > > --me Yes, very, very cool. I especially liked how RFC 88 was "accepted with caveats" and RFC 119 was "rejected but assimilated", given my personal involvement in that topic. Seeing as how all the insufficiencies in RFC 88 that RFC 119 was trying to cure have been cured extremely well, I am quite a happy camper. I never cared what the words were as long as they make sense, and Larry picked good words. There are no non-object exceptions, but given the depth of object integration into the core concepts that seems to have been accepted for Perl 6 (but was uncertain at the time of RFC writing), that is not a problem. Also very cool was the resulting switch statement. Its integration with =~ and CATCH is brilliant. That RFC had a much too large table of DWIM cases to understand, and Perl 6 still has quite a few, but all of them seem to DWIM for me, whereas a number of the ones in the RFC seemed quite contrived and obscure to me. The only thing that seems somewhat questionable is the elimination of bare blocks... handy for defining short term variables... a common metaphor for reading a whole file was { local $/; $whole_file = ; } but I guess putting "do" in front isn't too onerous for the reduced ambiguity. -- Glenn = Due to the current economic situation, the light at the end of the tunnel will be turned off until further notice.
Re: Does this mean we get Ruby/CLU-style iterators?
Michael G Schwern writes: : Reading this in Apoc 4 : : sub mywhile ($keyword, &condition, &block) { : my $l = $keyword.label; : while (&condition()) { : &block(); : CATCH { : my $t = $!.tag; : when X::Control::next { die if $t && $t ne $l); next } : when X::Control::last { die if $t && $t ne $l); last } : when X::Control::redo { die if $t && $t ne $l); redo } : } : } : } : : Implies to me: : : A &foo prototype means you can have a bare block anywhere in the : arg list (unlike the perl5 syntax). That is correct. : Calling &foo() does *not* effect the callstack, otherwise the : above would not properly emulate a while loop. Maybe it's transparent to caller but not to caller($n). I'm not sure how much of a problem this will be. Inside &block it's a closure, which carries a lot of the context you need already. Continuations may be overkill. : If that's true, can pull off my custom iterators? : http:[EMAIL PROTECTED]/msg08343.html : : Will this: : : class File; : sub foreach ($file, &block) { : # yeah, I know. The RFC was all about exceptions and I'm : # not using them in this example. : open(FILE, $file) || die $!; That's my $FILE = open $file || die; and so on. : while() { : &block(); : } : : close FILE; : } : : allow this: : : File.foreach('/usr/dict/words') { print } File.foreach('/usr/dict/words', { print }) or even (presuming the prototype is available for parsing): File.foreach '/usr/dict/words' { print } : or would the prototype be (&file, &block)? : : And would this: : : my $caller = caller; : File.foreach('/usr/dict/words') { : print $caller eq caller ? "ok" : "not ok" : } : : be ok or not ok? It has to be ok if mywhile is going to emulate a : while loop. I don't see why the default caller has to be caller(1). In any event, user-define control code will need to be able to get out of the way of the programmer's expectations. A return certainly needs to return from the surrounding lexical sub block, not from a mere bare block. Larry
Re: Does this mean we get Ruby/CLU-style iterators?
Piers Cawley writes: : Hmm... making up some syntax on the fly. I sort of like the idea of : being able to do : : class File; : sub foreach ($file, &block) is Control { : # 'is Control' declares this as a control sub, which, amongst : # other things 'hides' itself from caller. (We can currently : # do something like this already using Hooks::LexWrap type : # tricks. Maybe, but we'll need more explicit parsing control for other things, so this may fall out of that. : open my $fh, $file or die $!; POST { close $fh } More like: my $fh = open $file or die; : while () { : my @ret = wantarray ?? list &block() :: (scalar &block()); : given $! { : when c::RETURN { return wantarray ?? @ret :: @ret[0] } : } : } That "given $!" would have to be a CATCH, or the code would never be executed on a control exception. : This is, of course, dependent on $! not being set to a RETURN control : 'exception' in the case where we just fall off the end of the block. I'd say that's correct. : It's also dependent on being able to get continuations from caller : (which would be *so* cool Hmm, might not need to go that far. : > allow this: : > : > File.foreach('/usr/dict/words') { print } : : Sounds plausible to me. We're not using Ruby syntax here. Any closure is a real argument with a real formal argument name, and is called via ordinary &block(...) syntax, not yield. : > or would the prototype be (&file, &block)? : : I prefer the ($file, &block) prototype. I don't see why it would ever be &file. It's just a string. : > And would this: : > : > my $caller = caller; : > File.foreach('/usr/dict/words') { : > print $caller eq caller ? "ok" : "not ok" : > } : > : > be ok or not ok? It has to be ok if mywhile is going to emulate a : > while loop. : : In theory there's nothing to stop you writing it so that that is the : case. I'd like it to be as simple as adding an attribute to the : function declaration (and if it isn't that simple out of the box, it : will almost certainly be, if not trivial, at least possible to write : something to *make* it that simple...) Precisely. Larry
Parrot strings
Anyone have any objection to adding a couple of calls to terminate and/or return null terminated strings from Parrot strings for places where an API expects a standard C string? I'm not sure of the preferred way to handle this. It would be nice to at least try to terminate the current string buffer first if there is room in the buffer and only if that fails to do an allocate or copy. Or. is it already there and I don't see it. -Melvin
Re: Does this mean we get Ruby/CLU-style iterators?
Larry Wall <[EMAIL PROTECTED]> writes: > Michael G Schwern writes: > : Reading this in Apoc 4 > : > : sub mywhile ($keyword, &condition, &block) { > : my $l = $keyword.label; > : while (&condition()) { > : &block(); > : CATCH { > : my $t = $!.tag; > : when X::Control::next { die if $t && $t ne $l); next } > : when X::Control::last { die if $t && $t ne $l); last } > : when X::Control::redo { die if $t && $t ne $l); redo } > : } > : } > : } > : > : Implies to me: > : > : A &foo prototype means you can have a bare block anywhere in the > : arg list (unlike the perl5 syntax). > > That is correct. > > : Calling &foo() does *not* effect the callstack, otherwise the > : above would not properly emulate a while loop. > > Maybe it's transparent to caller but not to caller($n). I'm not sure how > much of a problem this will be. Inside &block it's a closure, which > carries a lot of the context you need already. Continuations may be > overkill. I think having the caller($n) stack work so that control structures are transparent no matter where they came from is really, really important. But we can do that right now by pulling Hooks::LexWrap type tricks: temp &CORE::GLOBAL::caller = { ... }; Problem solved. I'd just hoped it was something we'd not have to do ourselves in the general case. > : If that's true, can pull off my custom iterators? > : http:[EMAIL PROTECTED]/msg08343.html > : > : Will this: > : > : class File; > : sub foreach ($file, &block) { > : # yeah, I know. The RFC was all about exceptions and I'm > : # not using them in this example. > : open(FILE, $file) || die $!; > > That's > > my $FILE = open $file || die; > > and so on. > > : while() { > : &block(); > : } > : > : close FILE; > : } > : > : allow this: > : > : File.foreach('/usr/dict/words') { print } > > File.foreach('/usr/dict/words', { print }) > > or even (presuming the prototype is available for parsing): > > File.foreach '/usr/dict/words' { print } Hmm... does this mean that control structures are just going to be normal expression (a la Ruby)? Or are if/for/loop etc going to be special cases? I really like them not being special cases, but I can also see that having: foreach foreach @a { ... } { ... } be legal syntax would be very weird indeed. Hmm... going the whole ruby hog would mean that: { ... }.foreach @ary; would be valid. Hmm... > : or would the prototype be (&file, &block)? > : > : And would this: > : > : my $caller = caller; > : File.foreach('/usr/dict/words') { > : print $caller eq caller ? "ok" : "not ok" > : } > : > : be ok or not ok? It has to be ok if mywhile is going to emulate a > : while loop. > > I don't see why the default caller has to be caller(1). In any event, > user-define control code will need to be able to get out of the way > of the programmer's expectations. A return certainly needs to return > from the surrounding lexical sub block, not from a mere bare block. And caller has to 'lie' about its stack, because otherwise methods that get called from within the loop that do caller($n) will get confused. -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?
Re: A question
[reformatting response for readability and giving Glenn a stiff talking to] Glenn Linderman <[EMAIL PROTECTED]> writes: > Piers Cawley wrote: > >> Okay boys and girls, what does this print: >> >> my @aaa = qw/1 2 3/; >> my @bbb = @aaa; >> >> try { >> print "$_\n"; >> } >> >> for @aaa; @bbb -> my $a; my $b { >> print "$a:$b"; >> } >> >> I'm guessing one of: >> 1:1 >> 2:2 >> 3:3 >> >> or a syntax error, complaining about something near >> C<@bbb -> my $a ; my $b {> >> >> In other words, how does the parser distinguish between postfix for >> followed by a semicolon, and the new semicolon enhanced 'normal' for? > > That particular example is flawed, because the try expression is turned > into a try statement because the } stands alone on its line. > > But if you eliminate a couple newlines between } and for, then your > question makes sense (but the code is not well structured, but hey, maybe > you take out all the newlines for a one-liner...). > > The answer in that case is probably a syntax error, and to avoid it, you > put a ; between the } and the for. Yeah, that's sort of where I got to as well. But I just wanted to make sure. I confess I'm somewhat wary of the ';' operator, especially where it's 'unguarded' by brackets, and once I start programming in Perl 6 then for (@aaa ; @bbb -> $a; $b) { ... } will be one of my personal style guidelines. -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?
Re: on parrot strings
Hong Zhang <[EMAIL PROTECTED]> writes: >> > preprocessing. Another example, if I want to search for /resume/e, >> > (equivalent matching), the regex engine can normalize the case, fully >> > decompose input string, strip off any combining character, and do 8-bit >> >> Hmmm. The above sounds complicated not quite what I had in mind >> for equivalence matching: I would have just said "both the pattern >> and the target need to normalized, as defined by Unicode". Then >> the comparison and searching reduce to the trivial cases of byte >> equivalence and searching (of which B-M is the most popular example). > > You are right in some sense. But "normalized, as defined by Unicode" > may not be simple. I look at unicode regex tr18. It does not specify > equivalence of "resume" vs "re`sume`", but user may want or may not > want this kind of normalization. But e` and e are different letters man. And re`sume` and resume are different words come to that. If the user wants something that'll match 'em both then the pattern should surely be: /r[ee`]sum[ee`]/ Of course, it might be nice to have something that lets us do /r\any_accented(e)sum\any_accented(e)/ (or some such, notation is terrible I know), but my point is that such searches should be explicit. -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?