Hi

for the mooc I'm working on a srt to vtt converter.

1
00:00:07,040 --> 00:00:10,440
Hello. This week,
we'll get to the heart of the matter,

2
00:00:10 600 --> 00:00:12,160
about syntax especially.

into

WEBVTT

00:00:07.040 --> 00:00:10.440 align:middle
Hello. This week,
we'll get to the heart of the matter,

00:00:10.600 --> 00:00:12.160 align:middle
about syntax especially.


It works more or less. Now I face the problem that the files people provided me have different encodings. (I guess) because when I do not treat the input (for example withLinuxLineEndings) I get some CRs after the conversion eventhough I copy some file content and all the line ending I output are lf (or can be customizable.

I cannot apply garbage in gabrage out because the files should work.

So I thought that I should just convert first the string I read using withLinuxLineEndings so that all cr, crlf are converted into lf. But since files have different encodings I end up something to issues too many lf.

Does any of you have an idea how to handle this.

I did not find a way to know the encoding of a file (not the bom) just the file ending.

Stef


Reply via email to