Brian, Thanks for re-directing me back to my original question, and thank again for your well researched comment. I intend to dig deeper into string...hopefully know a bit of the internals.
Emeka On Tue, Aug 2, 2011 at 3:39 AM, Brian Fraser <frase...@gmail.com> wrote: > On Mon, Aug 1, 2011 at 9:14 AM, Emeka <emekami...@gmail.com> wrote: > >> >> In some languages string type is just array/list of characters. What is it >> in Perl? >> >> > There's no string type in Perl, internals notwithstanding. There's scalars, > and a scalar can hold a string - If you care to dig deeper, that string is > stored a series of octets, which may or may not be encoded in UTF-8. > > The problem with thinking of strings as arrays of characters is pretty > simple: What's a character? õ, "\x{F5}" (LATIN SMALL LETTER O WITH TILDE) is > a character, but what about õ, "o\x{303}", (LATIN SMALL LETTER O, COMBINING > TILDE)? What should string[0] return there? Can you get everyone to agree on > that? : ) (For example, I had the displeasure of working with a fairly > ancient version of Ruby where string[0] returned an octet, and there was no > simple way of changing this. It was absolute hell. Never versions appear to > be quite an improvement over that, though admittedly I haven't used those > much) > > Also, this is a bit of a nitpick, but every solution shown so far is fairly > worthless with Unicode data: > Consider ȭ, "o\x{303}\x{304}", LATIN SMALL LETTER O, COMBINING TILDE, > COMBINING MACRON. Blindly doing /(.)/g on that will return a list with those > three characters, and be mostly worthless. > What you generally want is /(\X)/g, which will return the a single element > list, that element being a string with all three components of the actual > character (i.e. an extended grapheme cluster). > > Tom Christiansen explains this and much more in his Unicode Essentials > talk, which you can read here: > http://training.perl.com/OSCON2011/index.html and is probably the newest > tour de force for almost any Perl programmer. > > -- *Satajanus Nig. Ltd *