Re: [perl #36794] [BUG] substr opcode segfault

Nicholas Clark Wed, 10 Aug 2005 03:24:57 -0700

On Wed, Aug 03, 2005 at 10:44:47PM +0200, Leopold Toetsch wrote:
> 
> On Aug 3, 2005, at 20:58, Will Coleda (via RT) wrote:
> 
> >
> >causes a segfault in the substr opcode (from tcl's lib/tclconst.pir),
> >and forces a few tcl-unicode escape tests into TODOs.
> >
> >A short PIR test that is equivalent:
> >
> >.sub main @MAIN
> >   $S0 = "\\u666"
> >   $I0 = 0x666
> >   $S1 = chr $I0         # works, but substr doesn't like this string.
> >   substr $S0, 0, 5, $S1
> >.end
> 
> >#1  0x0002d04c in string_replace (interpreter=0xd00180, src=0xe5b1c0,
> >offset=0, length=5, rep=0xe5a630, d=0x0) at src/string.c:1238
> 
> string_replace has still the old code relying on fixed-width encodings 
> with 1, 2, or 4 bytes per char, which is of course not true for utf8. 
> This needs fixing.


I thought that one thing Jarkko learned from perl 5's Unicode model was that
the amount of code and pain to support a variable length encoding was
greater than the space saving that that encoding gives.

In turn Dan had decided that Parrot should internally unpack to some form
of fixed width encoding. So all Unicode would be stored internally in the
shortest of ISO-8859-1, UCS-16 and UCS-32 that encompassed all the code
points used.

1: My memory may be wrong on this
2: It may not have been explicit
3: I may have missed an explicit change

But having dealt with the fun of variable length encodings, my gut feeling
is with Jarkko, that it's probably better to stay fixed width internally.

Nicholas Clark

Re: [perl #36794] [BUG] substr opcode segfault

Reply via email to