As currently designed, the String::bytes, String::codepoints, and String::graphemes methods return the number of bytes, codepoints, and graphemes, respectively, in the string they were called on. I would like to suggest that, when called in list context, these methods return an array of strings split by bytes, codepoints, and graphemes, respectively.

This would make it unambiguous whether certain string operations referred to bytes, codepoints, or graphemes:

    $str.bytes[0].ord
    $str.codepoints[0..4].join  #substr

As well as allowing some operations that are currently much more difficult:

    $str.bytes[3].ord
    $str.graphemes[144].lc

Issues:
  * Limits lvalue substr (doesn't allow it to be a different size)
    unless splice is used (or a substr method is also provided).
  * Memory consumption.
  * A bit odd-looking.

Benefits:
  * Removes ambiguity in an area that needs said ambiguity removed.
  * Allows us to reuse constructs (e.g. slicing).
  * Opens up a few previously-difficult constructs (like getting the
    ord() of an arbitrary character).

--
Brent "Dax" Royal-Gordon <[EMAIL PROTECTED]>
Perl and Parrot hacker

Oceania has always been at war with Eastasia.

Reply via email to