Hi sven

this is cool. We are always losing time for crlf/lf/cr.... I lost most of
the time in the SRT2VTT on that part.
will you add a little paragraph to the Zinc chapter?

Stef

On Wed, May 3, 2017 at 8:18 PM, Norbert Hartl <norb...@hartl.name> wrote:

>
>
> > Am 03.05.2017 um 18:10 schrieb Cyril Ferlicot D. <
> cyril.ferli...@gmail.com>:
> >
> >> Le 03/05/2017 à 16:41, Sven Van Caekenberghe a écrit :
> >>
> >>> On 3 May 2017, at 12:18, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> >>>
> >>> Hi Cyril,
> >>>
> >>> I want to try to write such a detector. I'll get back to you.
> >>
> >> I added the following (Zn #bleedingEdge):
> >>
> >> ===
> >> Name: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.49
> >> Author: SvenVanCaekenberghe
> >> Time: 3 May 2017, 4:30:44.081888 pm
> >> UUID: fe8b083d-010b-0d00-9df5-fde304bccfdc
> >> Ancestors: Zinc-Character-Encoding-Core-SvenVanCaekenberghe.48
> >>
> >> Add ZnCharacterEncoder class>>#detectEncoding: to try to heuristically
> and unreliably guess the encoding used by a collection of bytes
> >>
> >> Add ZnCharacterEncoderTests>>#testDetectEncoding
> >>
> >> Add #= and #hash to ZnSimplifiedByteEncoder and
> ZnEndianSensitiveUTFEncoder
> >>
> >> Always use canonical name in ZnSimplifiedByteEncoder
> class>>#newForEncoding:
> >> ===
> >> Name: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.31
> >> Author: SvenVanCaekenberghe
> >> Time: 3 May 2017, 4:31:09.469852 pm
> >> UUID: 30ef8b3e-010b-0d00-9df6-4a9304bccfdc
> >> Ancestors: Zinc-Character-Encoding-Tests-SvenVanCaekenberghe.30
> >>
> >> Add ZnCharacterEncoder class>>#detectEncoding: to try to heuristically
> and unreliably guess the encoding used by a collection of bytes
> >>
> >> Add ZnCharacterEncoderTests>>#testDetectEncoding
> >>
> >> Add #= and #hash to ZnSimplifiedByteEncoder and
> ZnEndianSensitiveUTFEncoder
> >>
> >> Always use canonical name in ZnSimplifiedByteEncoder
> class>>#newForEncoding:
> >> ===
> >>
> >>
> >> Now you can do the following:
> >>
> >> ZnCharacterEncoder detectEncoding: ((FileLocator desktop / 'some.data')
> binaryReadStreamDo: [ :in | in upToEnd ]).
> >>
> >> (FileLocator desktop / 'some.data') binaryReadStreamDo: [ :in |
> >>    | bytes encoder |
> >>    bytes := in upToEnd.
> >>    encoder := ZnCharacterEncoder detectEncoding: bytes.
> >>    encoder decodeBytes: bytes ].
> >>
> >> It works on the test file you gave me, but this process is just a
> guess, a heuristic that is unreliable and often wrong (especially for very
> similar byte encodings), see https://en.wikipedia.org/wiki/
> Charset_detection.
> >>
> >> You can give the whole contents to the detector, or just a header.
> >>
> >> I was a bit too optimistic though, this is basically an unsolvable
> problem. It is MUCH better to somehow know up front what the encoding used
> is, or to know something useable about the contents (like the header of
> HTML or XML).
> >>
> >> Sven
> >>
> >
> > Thank you! I'll try this tomorrow. If it works well I wonder if we can
> > still includes it in Pharo6. Since it's only a little feature unused in
> > Pharo it should not break anything but it would be cool addition for
> Moose.
> >
> > But since it is feature freeze if people do not want I'll not push it
> > for Pharo 6 :)
> >
> It shouldn't be included. There no such thing as side-effect-free change.
> Moose can load a newer version of zinc. That is how it is supposed to be.
>
> Norbert
> > --
> > Cyril Ferlicot
> > https://ferlicot.fr
> >
> > http://www.synectique.eu
> > 2 rue Jacques Prévert 01,
> > 59650 Villeneuve d'ascq France
> >
>
>
>

Reply via email to