(Here follows various comments and opinions on PDD28 draft, written
while reading it)

As has been pointed out, the expression «A grapheme is our concept» is
not really clear. I think «The term "grapheme" in this document
defines a concept local to Parrot» or some such.

I'm not sure that UTF-16 can be called a "fixed-width" encoding (what
with surrogate pairs and all that...)

«we don’t standardize on Unicode internally»: the intent is clear, but
the expression feels ambiguous to me. Do you mean "we don't fixate on
a UTF-*", "we don't use Unicode-specified semantics and tables", or
what? (I think the text is simply referring to encodings for internal
representations)

«Parrot_Rune»: whoever came up with this short-form for "grapheme" can
collect a beer from me at the next YAPC::Europe. Brilliant!

«out-of-band» usually does not mean "using special values in the same
stream as normal values"... again, the intent is clear enough, but the
terminology is misleading.

«"0x00000438 0x000000030F"» is not a byte-stream, it's an int-stream.

«need to take the overload of peeking» s/overload/overhead/ ?

Stupid serialization of Parrot_Rune arrays are not portable between
Parrot runs, right? That is, Parrot_Rune(-1) can refer to different
graphemes from one run to the next. Better bang it into the heads of
everyone from the earliest possible moment...

I've always defined an "encoding" as a function from streams of
characters to strings of bytes (and back, for "decoding"). Why not
include a similar definition at the beginning of the "IMPLEMENTATION"
section?

«encoding_get_codepoint» may return something which is not, strictly
speaking, what Unicode calls a "codepoint". Ok, calling it "runepoint"
might be seen as a pun, but confusion is (sadly) the norm whet dealing
with text nowadays, and overloading such a badly-understood term may
not help clear the issue...

Warnings to add to the checklist:

- arithmetical comparison of string data elements is a red flag
- string sorting is ill-defined generally, but it's well-defined
  inside a locale (that is, it's dependent on the language of the
  user, which may or may not have any relation with the language of
  the data, which in turn may or may not have any relation with the
  script of a character)
- tr/// or similar simple-minded table-based transformations are a red
  flag
- the Parrot_Rune value-space is not connected (that is, given that $a
  and $b are valid Parrot_Rune values, there may be a value $c ($a <
  $c < $b) that is not a valid Parrot_Rune), so don't use Parrot_Rune
  in for-loops
- string element count ("length") and string display width are quite
  unrelated (Han characters are wider than Latin characters almost
  always, for example)

Hope this helps, and is not too jumbled (I tend to brain-dump)

-- 
        Dakkar - <Mobilis in mobile>
        GPG public key fingerprint = A071 E618 DD2C 5901 9574
                                     6FE2 40EA 9883 7519 3F88
                            key id = 0x75193F88

To save a single life is better than to build a seven story pagoda.

Reply via email to