On Sat, 24 Mar 2018 17:50:53 +0100 Carl Eugen Hoyos <ceffm...@gmail.com> wrote:
> 2018-03-24 17:42 GMT+01:00, wm4 <nfx...@googlemail.com>: > > On Sat, 24 Mar 2018 17:05:41 +0100 > > Carl Eugen Hoyos <ceffm...@gmail.com> wrote: > > > >> 2018-03-24 15:48 GMT+01:00, wm4 <nfx...@googlemail.com>: > >> > Subtitles which contained styled UTF-8 subtitles (i.e. not just 7 bit > >> > ASCII characters) were not handled correctly. The spec mandates that > >> > styling start/end ranges are in "characters". It's not quite clear what > >> > a "character" is supposed to be, but maybe they mean unicode codepoints. > >> > > >> > FFmpeg's decoder treated the style ranges as byte idexes, which could > >> > lead to UTF-8 sequences being broken, and the common code dropping the > >> > whole subtitle line. > >> > > >> > Change this and count the codepoint instead. This also means that even > >> > if this is somehow wrong, the decoder won't break UTF-8 sequences > >> > anymore. The sample which led me to investigate this now appears to work > >> > correctly. > >> > >> Could you confirm that this is also what QT does? > > > > I can't test with QT. VLC seems to behave like with this patch applied. > > > >> Or is it impossible that the patch breaks something? > > > > Could probably break movtext subtitles generated by ffmpeg (I didn't > > fix the movtext encoder, and it seems to have the same bug). But these > > will most likely be broken on other players too. Tough the worst case > > is just that the styles get shifted. > > Thank you. > > >> [...] > >> > >> > @@ -388,17 +405,24 @@ static int text_to_ass(AVBPrint *buf, const char > >> > *text, const char *text_end, > >> > } > >> > } > >> > > >> > - switch (*text) { > >> > - case '\r': > >> > - break; > >> > - case '\n': > >> > - av_bprintf(buf, "\\N"); > >> > - break; > >> > - default: > >> > - av_bprint_chars(buf, *text, 1); > >> > - break; > >> > + len = get_utf8_length_at(text, text_end); > >> > + if (len < 1) { > >> > + av_log(avctx, AV_LOG_ERROR, "invalid UTF-8 byte in > >> > subtitle\n"); > >> > + len = 1; > >> > + } > >> > + for (i = 0; i < len; i++) { > >> > >> > + switch (*text) { > >> > + case '\r': > >> > + break; > >> > + case '\n': > >> > + av_bprintf(buf, "\\N"); > >> > + break; > >> > + default: > >> > + av_bprint_chars(buf, *text, 1); > >> > + break; > >> > >> Imo, the reindentation is not ok but this isn't my code. > > > > Why not? > > Because the patch is much easier to read without it. git repo viewers can show commits without whitespaces, so I don't think it matters anymore for this patch. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel