Hi Craig,

> On 3 Mar 2022, at 23:15, craig <cr...@hivemind.net> wrote:
> 
> Hi Guys,
> 
>  
> I'm reading a text file which is supposed to be ASCII encoded.  This file 
> contains a list of filepaths and was created by a Python program.
> 
>  
> Well, it turns-out that file names on Windows can contain illegal UTF-8 
> characters.  This causes ZnUTF8Encoder to signal 'Illegal leading byte for 
> utf-8 encoding' and crash the program.
> 
>  
> I would like to handle this situation more elegantly, is there a more 
> appropriate code-page to use for the Windows filesystem?
> 
> <4b5aa143.png>
> 
>  
>  
> Craig

We support more than 80 different character encoders. Of course, you should 
first know what encoder is being used, after that, it is easy to use a 
different encoder. Consider:

'/tmp/foo.txt' asFileReference readStreamEncoded: #utf8 do: [ :in | in upToEnd 
].

'/tmp/foo.txt' asFileReference readStreamEncoded: #windows1252 do: [ :in | in 
upToEnd ]. 

'/tmp/foo.txt' asFileReference readStreamEncoded: #latin1 do: [ :in | in 
upToEnd ]. 

ZnCharacterEncoder knownEncodingIdentifiers.

#windows1252 asZnCharacterEncoder.

If you could post a small example of your file, I could try to help. It will 
probably be #windows1252 or #latin1.

HTH,

Sven


Reply via email to