On 25 Sep 2014, at 01:04, Alain Rastoul <alf.mmm....@gmail.com> wrote:
> Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit : >> Alain, > >> The character encoding situation in Pharo is pretty good actually. The only >> problem is that there is some old school code left that encodes strings into >> strings, but today you can easily write much better and conceptually correct >> code. >> >> You could have a look at this draft chapter of the upcoming 'Enterprise >> Pharo' book that I am currently writing: >> >> http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/ >> >> Concerning file system paths, FilePathEncoder and FilePluginPrimitives >> already do the right thing. >> >> Now, your idea about using UTF-8 to represent internal Strings is something >> that has been discussed before and in many other languages as well. The >> short answer is that due to it being variable length, the inefficiency is >> (probably) just too high. Simple indexed access becomes a problem, let alone >> more complex string manipulations. I am not saying that it cannot be done, I >> think it is just not worth the trouble. The current solution in Pharo with >> ByteString and WideString is quite nice (check the chapter I mentioned >> before). >> >> Sven >> > Very interesting ! > It seems that most of what I was saying is already here :) > I was not saying that Pharo should use utf8 (I mentionned utf8 because it is > a standard, but I find the variable length encoding very weird), I was rather > talking of using WideString in UTF 16 or 32 and that's done. > I saw asWideString but didn't know about automatic convertion or codepoint > selector and internal wide string support. > Does it means that Pharo Greek users (for example) use WideString for Strings > without having to specify it or make explicit convertions (except of course > when dealing with bytes if they want to) ? > If yes, very good, job is almost done :) > (personnally I would also deprecate ByteString, and get rid of it, just my > opinion). > Thanks for the link, another good chapter . > > Regards, > > Alain Yes, the Greek users won't notice a difference, it is all transparent. ByteString is important because it is an optimalization of the most common case. As a normal user you should only think of abstract Strings and never use #asByteString (but use proper encoding). Feedback on the chapter is always welcome. Sven