Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit :
Alain,
The character encoding situation in Pharo is pretty good actually. The only
problem is that there is some old school code left that encodes strings into
strings, but today you can easily write much better and conceptually correct
code.
You could have a look at this draft chapter of the upcoming 'Enterprise Pharo'
book that I am currently writing:
http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/
Concerning file system paths, FilePathEncoder and FilePluginPrimitives already
do the right thing.
Now, your idea about using UTF-8 to represent internal Strings is something
that has been discussed before and in many other languages as well. The short
answer is that due to it being variable length, the inefficiency is (probably)
just too high. Simple indexed access becomes a problem, let alone more complex
string manipulations. I am not saying that it cannot be done, I think it is
just not worth the trouble. The current solution in Pharo with ByteString and
WideString is quite nice (check the chapter I mentioned before).
Sven
Very interesting !
It seems that most of what I was saying is already here :)
I was not saying that Pharo should use utf8 (I mentionned utf8 because
it is a standard, but I find the variable length encoding very weird), I
was rather talking of using WideString in UTF 16 or 32 and that's done.
I saw asWideString but didn't know about automatic convertion or
codepoint selector and internal wide string support.
Does it means that Pharo Greek users (for example) use WideString for
Strings without having to specify it or make explicit convertions
(except of course when dealing with bytes if they want to) ?
If yes, very good, job is almost done :)
(personnally I would also deprecate ByteString, and get rid of it, just
my opinion).
Thanks for the link, another good chapter .
Regards,
Alain