On Mon, Aug 1, 2011 at 9:14 AM, Emeka <emekami...@gmail.com> wrote: > > In some languages string type is just array/list of characters. What is it > in Perl? > > There's no string type in Perl, internals notwithstanding. There's scalars, and a scalar can hold a string - If you care to dig deeper, that string is stored a series of octets, which may or may not be encoded in UTF-8.
The problem with thinking of strings as arrays of characters is pretty simple: What's a character? õ, "\x{F5}" (LATIN SMALL LETTER O WITH TILDE) is a character, but what about õ, "o\x{303}", (LATIN SMALL LETTER O, COMBINING TILDE)? What should string[0] return there? Can you get everyone to agree on that? : ) (For example, I had the displeasure of working with a fairly ancient version of Ruby where string[0] returned an octet, and there was no simple way of changing this. It was absolute hell. Never versions appear to be quite an improvement over that, though admittedly I haven't used those much) Also, this is a bit of a nitpick, but every solution shown so far is fairly worthless with Unicode data: Consider ȭ, "o\x{303}\x{304}", LATIN SMALL LETTER O, COMBINING TILDE, COMBINING MACRON. Blindly doing /(.)/g on that will return a list with those three characters, and be mostly worthless. What you generally want is /(\X)/g, which will return the a single element list, that element being a string with all three components of the actual character (i.e. an extended grapheme cluster). Tom Christiansen explains this and much more in his Unicode Essentials talk, which you can read here: http://training.perl.com/OSCON2011/index.html and is probably the newest tour de force for almost any Perl programmer.