Larry Wall <[EMAIL PROTECTED]> writes: > That all has to be looked at anyway. What does "5" mean when you > pass it to substr, anyway?
I was just going to ask about substrings, and then didn't because I figured that had been hashed out already and I'd missed it... > (I've been trying to make it assume some implicit unit based on the > current lexical scope's Unicode level, but issues remain.) We have > magical string positions that have different numeric values > depending on what units you view them as, but at what point does a > number like "5" get translated to such a magical string position? It would be possible to have right-associative operators (that bind at least more tightly than comma and possibly very tightly) and convert a number to one of these objects, so that we can do stuff like this: substr($string, 2 bytes, 4 bytes) = $substitute; Then if you pass a plain number to substr it could either assume something (possibly generating a warning) or spit an error, depending on some feature of the current lexical scope. The word "bytes" is clearly much too long, though, much less "graphemes" or "codepoints". I thought about this: substr($string, 2b, 4b) = $substitute; With presumably g and c for graphemes and codepoints, but I rather suspect that might conflict with some other existing syntax (though I can't think of anything in particular). And I can't think of another abbreviation that would be remotely intuitive. There's also the possibility of bsubstr and so on, but that leads us down the path of C, having a hillion bajillion functions with names like fgets, stoi, and fstrnclost. Having sprintf is quite enough of that, IMO. > I dunno--it reads pretty well. Maybe these'll be heavily enough > used that we should Huffmanize them down a bit: > > $str.bytes > $str.codes > $str.graphs > $str.letters codes and graphs is better than codepoints and graphemes, at least. > Though "letters" is a bit inadequate to describe language-dependent > graphemes, since it also divides any non-letters...I suppose we > could go with .characters if we don't mind forcing a heavily > overloaded word in one particular direction, culturally speaking. > Except, I'd kinda like to keep them starting with different letters. > (And maybe .chars should be reserved to mean whatever the default > unit is in the current lexical scope, as with substr() above.) You could coin the abbreviation ligs, for Language Independent Graphemes. Then some ingenious rascal can create a pragma or whatever that allows $str.b, $str.c, $str.g, and $str.l for fans of terseness. -- $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,"[EMAIL PROTECTED]/ --";$\=$ ;-> ();print$/