2018-03-24 15:48 GMT+01:00, wm4 <nfx...@googlemail.com>: > Subtitles which contained styled UTF-8 subtitles (i.e. not just 7 bit > ASCII characters) were not handled correctly. The spec mandates that > styling start/end ranges are in "characters". It's not quite clear what > a "character" is supposed to be, but maybe they mean unicode codepoints. > > FFmpeg's decoder treated the style ranges as byte idexes, which could > lead to UTF-8 sequences being broken, and the common code dropping the > whole subtitle line. > > Change this and count the codepoint instead. This also means that even > if this is somehow wrong, the decoder won't break UTF-8 sequences > anymore. The sample which led me to investigate this now appears to work > correctly.
Could you confirm that this is also what QT does? Or is it impossible that the patch breaks something? [...] > @@ -388,17 +405,24 @@ static int text_to_ass(AVBPrint *buf, const char > *text, const char *text_end, > } > } > > - switch (*text) { > - case '\r': > - break; > - case '\n': > - av_bprintf(buf, "\\N"); > - break; > - default: > - av_bprint_chars(buf, *text, 1); > - break; > + len = get_utf8_length_at(text, text_end); > + if (len < 1) { > + av_log(avctx, AV_LOG_ERROR, "invalid UTF-8 byte in > subtitle\n"); > + len = 1; > + } > + for (i = 0; i < len; i++) { > + switch (*text) { > + case '\r': > + break; > + case '\n': > + av_bprintf(buf, "\\N"); > + break; > + default: > + av_bprint_chars(buf, *text, 1); > + break; Imo, the reindentation is not ok but this isn't my code. Carl Eugen _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel