On Mon, Aug 1, 2011 at 9:14 AM, Emeka <emekami...@gmail.com> wrote:

>
> In some languages string type is just array/list of characters. What is it
> in Perl?
>
>
There's no string type in Perl, internals notwithstanding. There's scalars,
and a scalar can hold a string - If you care to dig deeper, that string is
stored a series of octets, which may or may not be encoded in UTF-8.

The problem with thinking of strings as arrays of characters is pretty
simple: What's a character? õ, "\x{F5}" (LATIN SMALL LETTER O WITH TILDE) is
a character, but what about õ, "o\x{303}", (LATIN SMALL LETTER O, COMBINING
TILDE)? What should string[0] return there? Can you get everyone to agree on
that? : ) (For example, I had the displeasure of working with a fairly
ancient version of Ruby where string[0] returned an octet, and there was no
simple way of changing this. It was absolute hell. Never versions appear to
be quite an improvement over that, though admittedly I haven't used those
much)

Also, this is a bit of a nitpick, but every solution shown so far is fairly
worthless with Unicode data:
Consider ȭ, "o\x{303}\x{304}", LATIN SMALL LETTER O, COMBINING TILDE,
COMBINING MACRON. Blindly doing /(.)/g on that will return a list with those
three characters, and be mostly worthless.
What you generally want is /(\X)/g, which will return the a single element
list, that element being a string with all three components of the actual
character (i.e. an extended grapheme cluster).

Tom Christiansen explains this and much more in his Unicode Essentials talk,
which you can read here: http://training.perl.com/OSCON2011/index.html and
is probably the newest tour de force for almost any Perl programmer.

Reply via email to