Re: [pugs] regexp "bug"?

Larry Wall Fri, 15 Apr 2005 11:56:09 -0700

On Fri, Apr 15, 2005 at 05:12:54PM +0000, [EMAIL PROTECTED] wrote:

: Isn't that what the difference between byte-level and codepoint-level
: access to strings is all about.  If you want to work with values that
: are illegal codepoints then you should be working at the byte-level
: not the codepoint-level, at least by default.


Sure, but there's no guarantee you have access to a lower level,
depending on the interface presented by the object in question, and
you shouldn't probably have to know that anyway, if there's a useful
abstraction level at which "illegal character" means something as
a unit to the higher level.  The fact is that U+FFFF is an illegal
character regardless of the encoding, and I'd like to be able to
talk about it as a character, without having to know whether it's
an illegal UTF-8 byte sequence, or an illegal UTF-16 byte sequence,
or a 256-bit integer stored somewhere that you just aren't allowed
to think about certain values of.

In short, "legal" Unicode strings should probably be viewed as a
constrained subtype of strings, not as a storage type.  I know you've
known Ada from its infancy. :-)  Perl 6 makes the same distinction, and
can presumably get at the unconstrained type for any constrained type.
So if you hand me a Unicode string with arbitrary value restrictions,
there had better be a way to view that string without the arbitrary
restrictions.  You need to be able to determine somehow that types
Even or Odd have a storage class of type Int.

Larry

Re: [pugs] regexp "bug"?

Reply via email to