Re: [Pharo-users] UTF-8 encoding

Sven Van Caekenberghe Mon, 16 Apr 2018 08:13:53 -0700


> On 16 Apr 2018, at 17:06, Benoit St-Jean via Pharo-users 
> <pharo-users@lists.pharo.org> wrote:
> 
> 
> From: Benoit St-Jean <bstj...@yahoo.com>
> Subject: UTF-8 encoding
> Date: 16 April 2018 at 17:06:28 GMT+2
> To: Any question about pharo is welcome <pharo-users@lists.pharo.org>
> Reply-To: Benoit St-Jean <bstj...@yahoo.com>
> 
> 
> Regarding the problems I have with my firstname and file paths and utf-8 
> encoding, I found something weird in the UTF-8 encoding.  In fact, to be more 
> precise, I found something strange when converting a String to a ByteArray 
> (which UTF-8 encoders convert from)
> 
> If I look at the example in the comment of ByteArray>>utf8Decoded, 'Les 
> élèves français' is encoded as: 
> 
> #[76 101 115 32 195 169 108 195 168 118 101 115 32 102 114 97 110 195 167 97 
> 105 115]
> 
> NOW, if I take that very same string, 'Les élèves français' , and convert it 
> to a ByteArray, I get :
> 'Les élèves français' asByteArray printString.


You cannot do that, that is using a null encoding, which is almost always wrong.

Please read (the first part) of

https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html

carefully.

> #[76 101 115 32 233 108 232 118 101 115 32 102 114 97 110 231 97 105 115]
> 
> The 2 don't match!
> 
> By the way, this problem exists on Pharo 5.1, 6.1 and 7.1 (on Windows 10 )
> 
> Can anyone confirm/infirm on another platform to see if this is 
> Windows-specific?

This behaviour is not-windows specific, it is like that on all platforms, and 
it is correct ;-)

> ----------------- 
> Benoît St-Jean 
> Yahoo! Messenger: bstjean 
> Twitter: @BenLeChialeux 
> Pinterest: benoitstjean 
> Instagram: Chef_Benito
> IRC: lamneth 
> Blogue: endormitoire.wordpress.com 
> "A standpoint is an intellectual horizon of radius zero".  (A. Einstein)
> 
>

Re: [Pharo-users] UTF-8 encoding

Reply via email to