On 25 Sep 2014, at 01:04, Alain Rastoul <alf.mmm....@gmail.com> wrote:

> Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit :
>> Alain,
> 
>> The character encoding situation in Pharo is pretty good actually. The only 
>> problem is that there is some old school code left that encodes strings into 
>> strings, but today you can easily write much better and conceptually correct 
>> code.
>> 
>> You could have a look at this draft chapter of the upcoming 'Enterprise 
>> Pharo' book that I am currently writing:
>> 
>>   http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/
>> 
>> Concerning file system paths, FilePathEncoder and FilePluginPrimitives 
>> already do the right thing.
>> 
>> Now, your idea about using UTF-8 to represent internal Strings is something 
>> that has been discussed before and in many other languages as well. The 
>> short answer is that due to it being variable length, the inefficiency is 
>> (probably) just too high. Simple indexed access becomes a problem, let alone 
>> more complex string manipulations. I am not saying that it cannot be done, I 
>> think it is just not worth the trouble. The current solution in Pharo with 
>> ByteString and WideString is quite nice (check the chapter I mentioned 
>> before).
>> 
>> Sven
>> 
> Very interesting !
> It seems that most of what I was saying is already here :)
> I was not saying that Pharo should use utf8 (I mentionned utf8 because it is 
> a standard, but I find the variable length encoding very weird), I was rather 
> talking of using WideString in UTF 16 or 32 and that's done.
> I saw asWideString but didn't know about automatic convertion or codepoint 
> selector and internal wide string support.
> Does it means that Pharo Greek users (for example) use WideString for Strings 
> without having to specify it or make explicit convertions (except of course 
> when dealing with bytes if they want to) ?
> If yes, very good, job is almost done :)
> (personnally I would also deprecate ByteString, and get rid of it, just my 
> opinion).
> Thanks for the link, another good chapter .
> 
> Regards,
> 
> Alain

Yes, the Greek users won't notice a difference, it is all transparent. 
ByteString is important because it is an optimalization of the most common 
case. As a normal user you should only think of abstract Strings and never use 
#asByteString (but use proper encoding).

Feedback on the chapter is always welcome.

Sven

Reply via email to