Re: The .bytes/.codepoints/.graphemes methods

2004-07-13 Thread David Green
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Larry Wall) wrote: >On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: >: :u0 # use bytes (. is byte) >: :u1 # level 1 support (. is codepoint) >: :u2 # level 1 support (.

Re: The .bytes/.codepoints/.graphemes methods

2004-07-12 Thread Jonadab the Unsightly One
Luke Palmer <[EMAIL PROTECTED]> writes: > Or, god forbid, a word? > > m:base/que mas/ > > We're not mathematicians: we're allowed to use more than one letter > in a row to designate something :-) Well, if it were *me*, *I* would have voted for keeping the core language 100% pure ASCII, untain

RE: The .bytes/.codepoints/.graphemes methods

2004-07-11 Thread Austin Hastings
> -Original Message- > From: Jonadab the Unsightly One [mailto:[EMAIL PROTECTED] > Austin Hastings <[EMAIL PROTECTED]> writes: > > > I think this is something that we'll want as a "mode", a la > > case-insensitivity. Think of it as "mark insensitivity." > > Makes sense to me, but... > > >

Re: The .bytes/.codepoints/.graphemes methods

2004-07-10 Thread Luke Palmer
Jonadab the Unsightly One writes: > Austin Hastings <[EMAIL PROTECTED]> writes: > > > I think this is something that we'll want as a "mode", a la > > case-insensitivity. Think of it as "mark insensitivity." > > Makes sense to me, but... > > > Maybe it can just roll into :i? > > It will probably

Re: The .bytes/.codepoints/.graphemes methods

2004-07-10 Thread Jonadab the Unsightly One
Austin Hastings <[EMAIL PROTECTED]> writes: > I think this is something that we'll want as a "mode", a la > case-insensitivity. Think of it as "mark insensitivity." Makes sense to me, but... > Maybe it can just roll into :i? It will probably get used in _conjunction_ with case-insensitivity qui

Re: The .bytes/.codepoints/.graphemes methods

2004-07-08 Thread Austin Hastings
--- Larry Wall <[EMAIL PROTECTED]> wrote: > On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: > > : Or was that to imply that a literal "a" in the RE would be > : interpretted as a "grapheme a" when :u2 is active? > > I don't know what you mean by "grapheme a" there. If you me

Re: The .bytes/.codepoints/.graphemes methods

2004-07-07 Thread Larry Wall
On Wed, Jul 07, 2004 at 08:09:51PM -0700, Larry Wall wrote: : On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: : : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: : : > This has no direct bearing on p6l, since performance is a p6i issue. : : > But perhaps in the

Re: The .bytes/.codepoints/.graphemes methods

2004-07-07 Thread Larry Wall
On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote: : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: : > This has no direct bearing on p6l, since performance is a p6i issue. : > But perhaps in the interests of performance as well as hackery we : > should explicitl

Re: The .bytes/.codepoints/.graphemes methods

2004-07-03 Thread Brent 'Dax' Royal-Gordon
Aaron Sherman wrote: On Tue, 2004-06-29 at 11:34, Austin Hastings wrote: (2) Perl6 should equitably support all its target locales; (3) we should set out to make sure the performance is damn fast no matter what locale we're using. Well, that's a nice theory, but you can prove that low-level encodin

Re: The .bytes/.codepoints/.graphemes methods

2004-07-02 Thread Aaron Sherman
On Tue, 2004-06-29 at 11:34, Austin Hastings wrote: > [...] when you switch to LC_ALL= language>, you just get really slow performance: Apparently the 'C' > locale is such a totally special case that the performance of LC_ALL=C > is one or more orders of magnitude better than LC_ALL=en_US.UTF-8,

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread John Williams
On Thu, 1 Jul 2004, Juerd wrote: > Matt Diephouse skribis 2004-06-30 20:51 (-0400): > > my $string = "Hello, World!"; > > say $string[0..4]; # prints "Hello\n" > > $string[7...] = "Larry!"; > > say $string; # prints "Hello, Larry!\n" > > And that "array" is one of bytes? graphemes? > > In gene

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread Matt Diephouse
Juerd wrote: Matt Diephouse skribis 2004-06-30 20:51 (-0400): my $string = "Hello, World!"; say $string[0..4]; # prints "Hello\n" $string[7...] = "Larry!"; say $string; # prints "Hello, Larry!\n" And that "array" is one of bytes? graphemes? I'm not really up on my unicode, but I think .chars is wh

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread Juerd
Matt Diephouse skribis 2004-06-30 20:51 (-0400): > my $string = "Hello, World!"; > say $string[0..4]; # prints "Hello\n" > $string[7...] = "Larry!"; > say $string; # prints "Hello, Larry!\n" And that "array" is one of bytes? graphemes? In general, I like the idea. In <[EMAIL PROTECTED]>, almo

Re: The .bytes/.codepoints/.graphemes methods

2004-07-01 Thread Matt Diephouse
Larry Wall wrote: On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote: : Issues: : * Limits lvalue substr (doesn't allow it to be a different size) : unless splice is used (or a substr method is also provided). That all has to be looked at anyway. What does "5" mean when

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonathan Scott Duff
On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote: > This has no direct bearing on p6l, since performance is a p6i issue. > But perhaps in the interests of performance as well as hackery we > should explicitly provide some sort of variant regex behavior: > > /a./ :bytes > /a.

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonadab the Unsightly One
Juerd <[EMAIL PROTECTED]> writes: > substr($string, 2 but graphemes, 4 but bytes); > > I think "but" even makes sense, if substr defaults to something. That could be combined with a smart substr that only needs the units once (err, only needs a position object for one of the args) and knows how t

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Austin Hastings
--- Jonadab the Unsightly One <[EMAIL PROTECTED]> wrote: > > Have the implications of the bytes/codepoints/graphemes/woohickies > distinction for the regular expression engine been discussed already? Not enough. One of my current clients just rolled on to redhat 9, and what a steaming pile of di

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonadab the Unsightly One
Austin Hastings <[EMAIL PROTECTED]> writes: > A couple of alternatives: > > substr.bytes($string, 2, 4) = $substitute; Well, that's arguably better than bsubstr. > substr($string.bytes, 2, 4) = $substitute; I could live with that, although it doesn't allow mixing units. (Someone will pop in

Re: The .bytes/.codepoints/.graphemes methods

2004-06-29 Thread Jonadab the Unsightly One
Dan Sugalski <[EMAIL PROTECTED]> writes: >> Hmm. Suppose that I have a system that is friendly to 80 byte >> records. I want to output "meaningful" strings, so I want to >> partition a buffer into 80-ish byte substrings, but preserve any >> graphemes (i.e., store the data in a legible format). >>

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Austin Hastings
--- Jonadab the Unsightly One <[EMAIL PROTECTED]> wrote: > Larry Wall <[EMAIL PROTECTED]> writes: > > > (I've been trying to make it assume some implicit unit based on the > > current lexical scope's Unicode level, but issues remain.) We have > > magical string positions that have different numer

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dan Sugalski
On Mon, 28 Jun 2004, Austin Hastings wrote: > --- Dan Sugalski <[EMAIL PROTECTED]> wrote: > > On Mon, 28 Jun 2004, Juerd wrote: > > > > > Dave Whipp skribis 2004-06-28 9:55 (-0700): > > > > > substr($string, 2 bytes, 4 bytes) = $substitute; > > > > substr($string, 2, 4 :bytes) > > > > > > substr(

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Austin Hastings
--- Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Mon, 28 Jun 2004, Juerd wrote: > > > Dave Whipp skribis 2004-06-28 9:55 (-0700): > > > > substr($string, 2 bytes, 4 bytes) = $substitute; > > > substr($string, 2, 4 :bytes) > > > > substr($string, 2 but graphemes, 4 but bytes); > > > > I think "but

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dan Sugalski
On Mon, 28 Jun 2004, Juerd wrote: > Dave Whipp skribis 2004-06-28 9:55 (-0700): > > > substr($string, 2 bytes, 4 bytes) = $substitute; > > substr($string, 2, 4 :bytes) > > substr($string, 2 but graphemes, 4 but bytes); > > I think "but" even makes sense, if substr defaults to something. I think

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Juerd
Dave Whipp skribis 2004-06-28 9:55 (-0700): > > substr($string, 2 bytes, 4 bytes) = $substitute; > substr($string, 2, 4 :bytes) substr($string, 2 but graphemes, 4 but bytes); I think "but" even makes sense, if substr defaults to something. Juerd

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dan Sugalski
On Mon, 28 Jun 2004, Larry Wall wrote: > On Mon, Jun 28, 2004 at 11:26:32AM -0400, Jonadab the Unsightly One wrote: > : You could coin the abbreviation ligs, for Language Independent > : Graphemes. Then some ingenious rascal can create a pragma or whatever > : that allows $str.b, $str.c, $str.g,

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Dave Whipp
"Jonadab The Unsightly One" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > It would be possible to have right-associative operators (that bind at > least more tightly than comma and possibly very tightly) and convert a > number to one of these objects, so that we can do stuff like th

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Larry Wall
On Mon, Jun 28, 2004 at 11:26:32AM -0400, Jonadab the Unsightly One wrote: : You could coin the abbreviation ligs, for Language Independent : Graphemes. Then some ingenious rascal can create a pragma or whatever : that allows $str.b, $str.c, $str.g, and $str.l for fans of terseness. Except they'd

Re: The .bytes/.codepoints/.graphemes methods

2004-06-28 Thread Jonadab the Unsightly One
Larry Wall <[EMAIL PROTECTED]> writes: > That all has to be looked at anyway. What does "5" mean when you > pass it to substr, anyway? I was just going to ask about substrings, and then didn't because I figured that had been hashed out already and I'd missed it... > (I've been trying to make

Re: The .bytes/.codepoints/.graphemes methods

2004-06-26 Thread Larry Wall
On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote: : As currently designed, the String::bytes, String::codepoints, and : String::graphemes methods return the number of bytes, codepoints, : and graphemes, respectively, in the string they were called on. I : would like to s

The .bytes/.codepoints/.graphemes methods

2004-06-26 Thread Brent 'Dax' Royal-Gordon
As currently designed, the String::bytes, String::codepoints, and String::graphemes methods return the number of bytes, codepoints, and graphemes, respectively, in the string they were called on. I would like to suggest that, when called in list context, these methods return an array of string