In article <[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (Larry Wall) wrote:
>On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
>: :u0 # use bytes (. is byte)
>: :u1 # level 1 support (. is codepoint)
>: :u2 # level 1 support (.
Luke Palmer <[EMAIL PROTECTED]> writes:
> Or, god forbid, a word?
>
> m:base/que mas/
>
> We're not mathematicians: we're allowed to use more than one letter
> in a row to designate something :-)
Well, if it were *me*, *I* would have voted for keeping the core
language 100% pure ASCII, untain
> -Original Message-
> From: Jonadab the Unsightly One [mailto:[EMAIL PROTECTED]
> Austin Hastings <[EMAIL PROTECTED]> writes:
>
> > I think this is something that we'll want as a "mode", a la
> > case-insensitivity. Think of it as "mark insensitivity."
>
> Makes sense to me, but...
>
> >
Jonadab the Unsightly One writes:
> Austin Hastings <[EMAIL PROTECTED]> writes:
>
> > I think this is something that we'll want as a "mode", a la
> > case-insensitivity. Think of it as "mark insensitivity."
>
> Makes sense to me, but...
>
> > Maybe it can just roll into :i?
>
> It will probably
Austin Hastings <[EMAIL PROTECTED]> writes:
> I think this is something that we'll want as a "mode", a la
> case-insensitivity. Think of it as "mark insensitivity."
Makes sense to me, but...
> Maybe it can just roll into :i?
It will probably get used in _conjunction_ with case-insensitivity
qui
--- Larry Wall <[EMAIL PROTECTED]> wrote:
> On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
>
> : Or was that to imply that a literal "a" in the RE would be
> : interpretted as a "grapheme a" when :u2 is active?
>
> I don't know what you mean by "grapheme a" there. If you me
On Wed, Jul 07, 2004 at 08:09:51PM -0700, Larry Wall wrote:
: On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: : On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
: : > This has no direct bearing on p6l, since performance is a p6i issue.
: : > But perhaps in the
On Tue, Jun 29, 2004 at 10:52:34AM -0500, Jonathan Scott Duff wrote:
: On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
: > This has no direct bearing on p6l, since performance is a p6i issue.
: > But perhaps in the interests of performance as well as hackery we
: > should explicitl
Aaron Sherman wrote:
On Tue, 2004-06-29 at 11:34, Austin Hastings wrote:
(2) Perl6 should equitably support all its target
locales; (3) we should set out to make sure the performance is damn
fast no matter what locale we're using.
Well, that's a nice theory, but you can prove that low-level encodin
On Tue, 2004-06-29 at 11:34, Austin Hastings wrote:
> [...] when you switch to LC_ALL= language>, you just get really slow performance: Apparently the 'C'
> locale is such a totally special case that the performance of LC_ALL=C
> is one or more orders of magnitude better than LC_ALL=en_US.UTF-8,
On Thu, 1 Jul 2004, Juerd wrote:
> Matt Diephouse skribis 2004-06-30 20:51 (-0400):
> > my $string = "Hello, World!";
> > say $string[0..4]; # prints "Hello\n"
> > $string[7...] = "Larry!";
> > say $string; # prints "Hello, Larry!\n"
>
> And that "array" is one of bytes? graphemes?
>
> In gene
Juerd wrote:
Matt Diephouse skribis 2004-06-30 20:51 (-0400):
my $string = "Hello, World!";
say $string[0..4]; # prints "Hello\n"
$string[7...] = "Larry!";
say $string; # prints "Hello, Larry!\n"
And that "array" is one of bytes? graphemes?
I'm not really up on my unicode, but I think .chars is wh
Matt Diephouse skribis 2004-06-30 20:51 (-0400):
> my $string = "Hello, World!";
> say $string[0..4]; # prints "Hello\n"
> $string[7...] = "Larry!";
> say $string; # prints "Hello, Larry!\n"
And that "array" is one of bytes? graphemes?
In general, I like the idea. In <[EMAIL PROTECTED]>, almo
Larry Wall wrote:
On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote:
: Issues:
: * Limits lvalue substr (doesn't allow it to be a different size)
: unless splice is used (or a substr method is also provided).
That all has to be looked at anyway. What does "5" mean when
On Tue, Jun 29, 2004 at 08:34:16AM -0700, Austin Hastings wrote:
> This has no direct bearing on p6l, since performance is a p6i issue.
> But perhaps in the interests of performance as well as hackery we
> should explicitly provide some sort of variant regex behavior:
>
> /a./ :bytes
> /a.
Juerd <[EMAIL PROTECTED]> writes:
> substr($string, 2 but graphemes, 4 but bytes);
>
> I think "but" even makes sense, if substr defaults to something.
That could be combined with a smart substr that only needs the units
once (err, only needs a position object for one of the args) and knows
how t
--- Jonadab the Unsightly One <[EMAIL PROTECTED]> wrote:
>
> Have the implications of the bytes/codepoints/graphemes/woohickies
> distinction for the regular expression engine been discussed already?
Not enough.
One of my current clients just rolled on to redhat 9, and what a
steaming pile of di
Austin Hastings <[EMAIL PROTECTED]> writes:
> A couple of alternatives:
>
> substr.bytes($string, 2, 4) = $substitute;
Well, that's arguably better than bsubstr.
> substr($string.bytes, 2, 4) = $substitute;
I could live with that, although it doesn't allow mixing units.
(Someone will pop in
Dan Sugalski <[EMAIL PROTECTED]> writes:
>> Hmm. Suppose that I have a system that is friendly to 80 byte
>> records. I want to output "meaningful" strings, so I want to
>> partition a buffer into 80-ish byte substrings, but preserve any
>> graphemes (i.e., store the data in a legible format).
>>
--- Jonadab the Unsightly One <[EMAIL PROTECTED]> wrote:
> Larry Wall <[EMAIL PROTECTED]> writes:
>
> > (I've been trying to make it assume some implicit unit based on the
> > current lexical scope's Unicode level, but issues remain.) We have
> > magical string positions that have different numer
On Mon, 28 Jun 2004, Austin Hastings wrote:
> --- Dan Sugalski <[EMAIL PROTECTED]> wrote:
> > On Mon, 28 Jun 2004, Juerd wrote:
> >
> > > Dave Whipp skribis 2004-06-28 9:55 (-0700):
> > > > > substr($string, 2 bytes, 4 bytes) = $substitute;
> > > > substr($string, 2, 4 :bytes)
> > >
> > > substr(
--- Dan Sugalski <[EMAIL PROTECTED]> wrote:
> On Mon, 28 Jun 2004, Juerd wrote:
>
> > Dave Whipp skribis 2004-06-28 9:55 (-0700):
> > > > substr($string, 2 bytes, 4 bytes) = $substitute;
> > > substr($string, 2, 4 :bytes)
> >
> > substr($string, 2 but graphemes, 4 but bytes);
> >
> > I think "but
On Mon, 28 Jun 2004, Juerd wrote:
> Dave Whipp skribis 2004-06-28 9:55 (-0700):
> > > substr($string, 2 bytes, 4 bytes) = $substitute;
> > substr($string, 2, 4 :bytes)
>
> substr($string, 2 but graphemes, 4 but bytes);
>
> I think "but" even makes sense, if substr defaults to something.
I think
Dave Whipp skribis 2004-06-28 9:55 (-0700):
> > substr($string, 2 bytes, 4 bytes) = $substitute;
> substr($string, 2, 4 :bytes)
substr($string, 2 but graphemes, 4 but bytes);
I think "but" even makes sense, if substr defaults to something.
Juerd
On Mon, 28 Jun 2004, Larry Wall wrote:
> On Mon, Jun 28, 2004 at 11:26:32AM -0400, Jonadab the Unsightly One wrote:
> : You could coin the abbreviation ligs, for Language Independent
> : Graphemes. Then some ingenious rascal can create a pragma or whatever
> : that allows $str.b, $str.c, $str.g,
"Jonadab The Unsightly One" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> It would be possible to have right-associative operators (that bind at
> least more tightly than comma and possibly very tightly) and convert a
> number to one of these objects, so that we can do stuff like th
On Mon, Jun 28, 2004 at 11:26:32AM -0400, Jonadab the Unsightly One wrote:
: You could coin the abbreviation ligs, for Language Independent
: Graphemes. Then some ingenious rascal can create a pragma or whatever
: that allows $str.b, $str.c, $str.g, and $str.l for fans of terseness.
Except they'd
Larry Wall <[EMAIL PROTECTED]> writes:
> That all has to be looked at anyway. What does "5" mean when you
> pass it to substr, anyway?
I was just going to ask about substrings, and then didn't because I
figured that had been hashed out already and I'd missed it...
> (I've been trying to make
On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote:
: As currently designed, the String::bytes, String::codepoints, and
: String::graphemes methods return the number of bytes, codepoints,
: and graphemes, respectively, in the string they were called on. I
: would like to s
As currently designed, the String::bytes, String::codepoints, and
String::graphemes methods return the number of bytes, codepoints, and
graphemes, respectively, in the string they were called on. I would
like to suggest that, when called in list context, these methods return
an array of string
30 matches
Mail list logo