I'm not sure, but maybe useful... * http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/ * https://github.com/chardet/chardet/tree/master/tests
cheers -ben On Wed, May 3, 2017 at 6:18 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote: > Hi Cyril, > > I want to try to write such a detector. I'll get back to you. > > Any chance you could give me (part of) a file that causes you trouble (one > that is legal latin1, yet does not fail utf-8 while doing it wrong in > utf-8) ? > > Sven > > > On 3 May 2017, at 11:40, Cyril Ferlicot D. <cyril.ferli...@gmail.com> > wrote: > > > > Hello, > > > > We have a problem using Moose because we have files which we don't know > > the encoding. Currently we have this implementation to get the content > > of a file: > > > > completeText > > self fileReference exists ifFalse: [ ^ '' ]. > > ^ self fileReference readStreamDo: [ :s | > > [ s contents ] > > on: Error > > do: [ [ s converter: Latin1TextConverter new; contents ] > > on: Error > > do: [ '' ] ] ] > > > > But, we have a problem because we have currently some files at > > Synectique in ISO-8859-1. The problem is that #contents is able to read > > some of the files without throwing an error, but the content is not > > right because it is not the good encoding. > > > > Thus I wonder if it is possible to get the Encoding of a FileReference > > in Pharo to be able to read the file with the right encoding? Something > > like the bash command `file -I myFile.txt`. > > > > -- > > Cyril Ferlicot > > https://ferlicot.fr > > > > http://www.synectique.eu > > 2 rue Jacques Prévert 01, > > 59650 Villeneuve d'ascq France > > > > >