Brian,

Thanks for re-directing me back to my original question, and thank again for
your well researched comment. I intend to dig deeper into string...hopefully
know a bit of the internals.

Emeka

On Tue, Aug 2, 2011 at 3:39 AM, Brian Fraser <frase...@gmail.com> wrote:

> On Mon, Aug 1, 2011 at 9:14 AM, Emeka <emekami...@gmail.com> wrote:
>
>>
>> In some languages string type is just array/list of characters. What is it
>> in Perl?
>>
>>
> There's no string type in Perl, internals notwithstanding. There's scalars,
> and a scalar can hold a string - If you care to dig deeper, that string is
> stored a series of octets, which may or may not be encoded in UTF-8.
>
> The problem with thinking of strings as arrays of characters is pretty
> simple: What's a character? õ, "\x{F5}" (LATIN SMALL LETTER O WITH TILDE) is
> a character, but what about õ, "o\x{303}", (LATIN SMALL LETTER O, COMBINING
> TILDE)? What should string[0] return there? Can you get everyone to agree on
> that? : ) (For example, I had the displeasure of working with a fairly
> ancient version of Ruby where string[0] returned an octet, and there was no
> simple way of changing this. It was absolute hell. Never versions appear to
> be quite an improvement over that, though admittedly I haven't used those
> much)
>
> Also, this is a bit of a nitpick, but every solution shown so far is fairly
> worthless with Unicode data:
> Consider ȭ, "o\x{303}\x{304}", LATIN SMALL LETTER O, COMBINING TILDE,
> COMBINING MACRON. Blindly doing /(.)/g on that will return a list with those
> three characters, and be mostly worthless.
> What you generally want is /(\X)/g, which will return the a single element
> list, that element being a string with all three components of the actual
> character (i.e. an extended grapheme cluster).
>
> Tom Christiansen explains this and much more in his Unicode Essentials
> talk, which you can read here:
> http://training.perl.com/OSCON2011/index.html and is probably the newest
> tour de force for almost any Perl programmer.
>
>


-- 
*Satajanus  Nig. Ltd


*

Reply via email to