Re: [Pharo-users] file encodings handling

Sven Van Caekenberghe Thu, 18 Aug 2016 12:35:22 -0700

This is easy enough (IIUC your problem): when using #nextLine while reading 
from a stream, all 3 EOL conventions are handled transparently, you just get 
the line's contents back until you are done. Then you write the lines back out 
with your preferred EOL convention.


> On 18 Aug 2016, at 20:41, stepharo <steph...@free.fr> wrote:
> 
> Hi
> 
> for the mooc I'm working on a srt to vtt converter.
> 
> 1
> 00:00:07,040 --> 00:00:10,440
> Hello. This week,
> we'll get to the heart of the matter,
> 
> 2
> 00:00:10 600 --> 00:00:12,160
> about syntax especially.
> 
> into
> 
> WEBVTT
> 
> 00:00:07.040 --> 00:00:10.440 align:middle
> Hello. This week,
> we'll get to the heart of the matter,
> 
> 00:00:10.600 --> 00:00:12.160 align:middle
> about syntax especially.
> 
> 
> It works more or less. Now I face the problem that the files people provided 
> me have different encodings. (I guess) because when I do not treat the input 
> (for example withLinuxLineEndings) I get some CRs after the conversion 
> eventhough I copy some file content and all the line ending I output are lf 
> (or can be customizable.
> 
> I cannot apply garbage in gabrage out because the files should work.
> 
> So I thought that I should just convert first the string I read using 
> withLinuxLineEndings so that all cr, crlf are converted into lf. But since 
> files have different encodings I end up something to issues too many lf.
> 
> Does any of you have an idea how to handle this.
> 
> I did not find a way to know the encoding of a file (not the bom) just the 
> file ending.
> 
> Stef
> 
>

Re: [Pharo-users] file encodings handling

Reply via email to