I'm not sure, but maybe useful...
* http://source.icu-project.org/repos/icu/data/trunk/charset/data/ucm/
* https://github.com/chardet/chardet/tree/master/tests

cheers -ben

On Wed, May 3, 2017 at 6:18 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote:

> Hi Cyril,
>
> I want to try to write such a detector. I'll get back to you.
>
> Any chance you could give me (part of) a file that causes you trouble (one
> that is legal latin1, yet does not fail utf-8 while doing it wrong in
> utf-8) ?
>
> Sven
>
> > On 3 May 2017, at 11:40, Cyril Ferlicot D. <cyril.ferli...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > We have a problem using Moose because we have files which we don't know
> > the encoding. Currently we have this implementation to get the content
> > of a file:
> >
> > completeText
> >  self fileReference exists ifFalse: [ ^ '' ].
> >  ^ self fileReference readStreamDo: [ :s |
> >    [ s contents ]
> >      on: Error
> >      do: [ [ s converter: Latin1TextConverter new; contents ]
> >        on: Error
> >        do: [ '' ] ] ]
> >
> > But, we have a problem because we have currently some files at
> > Synectique in ISO-8859-1. The problem is that #contents is able to read
> > some of the files without throwing an error, but the content is not
> > right because it is not the good encoding.
> >
> > Thus I wonder if it is possible to get the Encoding of a FileReference
> > in Pharo to be able to read the file with the right encoding? Something
> > like the bash command `file -I myFile.txt`.
> >
> > --
> > Cyril Ferlicot
> > https://ferlicot.fr
> >
> > http://www.synectique.eu
> > 2 rue Jacques Prévert 01,
> > 59650 Villeneuve d'ascq France
> >
>
>
>

Reply via email to