(Here follows various comments and opinions on PDD28 draft, written while reading it)
As has been pointed out, the expression «A grapheme is our concept» is not really clear. I think «The term "grapheme" in this document defines a concept local to Parrot» or some such. I'm not sure that UTF-16 can be called a "fixed-width" encoding (what with surrogate pairs and all that...) «we don’t standardize on Unicode internally»: the intent is clear, but the expression feels ambiguous to me. Do you mean "we don't fixate on a UTF-*", "we don't use Unicode-specified semantics and tables", or what? (I think the text is simply referring to encodings for internal representations) «Parrot_Rune»: whoever came up with this short-form for "grapheme" can collect a beer from me at the next YAPC::Europe. Brilliant! «out-of-band» usually does not mean "using special values in the same stream as normal values"... again, the intent is clear enough, but the terminology is misleading. «"0x00000438 0x000000030F"» is not a byte-stream, it's an int-stream. «need to take the overload of peeking» s/overload/overhead/ ? Stupid serialization of Parrot_Rune arrays are not portable between Parrot runs, right? That is, Parrot_Rune(-1) can refer to different graphemes from one run to the next. Better bang it into the heads of everyone from the earliest possible moment... I've always defined an "encoding" as a function from streams of characters to strings of bytes (and back, for "decoding"). Why not include a similar definition at the beginning of the "IMPLEMENTATION" section? «encoding_get_codepoint» may return something which is not, strictly speaking, what Unicode calls a "codepoint". Ok, calling it "runepoint" might be seen as a pun, but confusion is (sadly) the norm whet dealing with text nowadays, and overloading such a badly-understood term may not help clear the issue... Warnings to add to the checklist: - arithmetical comparison of string data elements is a red flag - string sorting is ill-defined generally, but it's well-defined inside a locale (that is, it's dependent on the language of the user, which may or may not have any relation with the language of the data, which in turn may or may not have any relation with the script of a character) - tr/// or similar simple-minded table-based transformations are a red flag - the Parrot_Rune value-space is not connected (that is, given that $a and $b are valid Parrot_Rune values, there may be a value $c ($a < $c < $b) that is not a valid Parrot_Rune), so don't use Parrot_Rune in for-loops - string element count ("length") and string display width are quite unrelated (Han characters are wider than Latin characters almost always, for example) Hope this helps, and is not too jumbled (I tend to brain-dump) -- Dakkar - <Mobilis in mobile> GPG public key fingerprint = A071 E618 DD2C 5901 9574 6FE2 40EA 9883 7519 3F88 key id = 0x75193F88 To save a single life is better than to build a seven story pagoda.