On Sat, Jun 26, 2004 at 12:27:38PM -0700, Brent 'Dax' Royal-Gordon wrote: : As currently designed, the String::bytes, String::codepoints, and : String::graphemes methods return the number of bytes, codepoints, : and graphemes, respectively, in the string they were called on. I : would like to suggest that, when called in list context, these : methods return an array of strings split by bytes, codepoints, and : graphemes, respectively. : : This would make it unambiguous whether certain string operations : referred to bytes, codepoints, or graphemes: : : $str.bytes[0].ord : $str.codepoints[0..4].join #substr : : As well as allowing some operations that are currently much more : difficult: : : $str.bytes[3].ord : $str.graphemes[144].lc : : Issues: : * Limits lvalue substr (doesn't allow it to be a different size) : unless splice is used (or a substr method is also provided).
That all has to be looked at anyway. What does "5" mean when you pass it to substr, anyway? (I've been trying to make it assume some implicit unit based on the current lexical scope's Unicode level, but issues remain.) We have magical string positions that have different numeric values depending on what units you view them as, but at what point does a number like "5" get translated to such a magical string position? : * Memory consumption. Not necessarily, if the method merely returns a "view" of the string without actually doing the split. : * A bit odd-looking. I dunno--it reads pretty well. Maybe these'll be heavily enough used that we should Huffmanize them down a bit: $str.bytes $str.codes $str.graphs $str.letters Though "letters" is a bit inadequate to describe language-dependent graphemes, since it also divides any non-letters...I suppose we could go with .characters if we don't mind forcing a heavily overloaded word in one particular direction, culturally speaking. Except, I'd kinda like to keep them starting with different letters. (And maybe .chars should be reserved to mean whatever the default unit is in the current lexical scope, as with substr() above.) : Benefits: : * Removes ambiguity in an area that needs said ambiguity removed. : * Allows us to reuse constructs (e.g. slicing). : * Opens up a few previously-difficult constructs (like getting the : ord() of an arbitrary character). I'd also point out that the scalar definitions fall out of it naturally. One other downside is that you might have to insert + in various places to get the numeric interpretation. But that could be construed as self-dedocumentation. Larry