On Sat, Mar 24, 2018 at 4:48 PM, wm4 <nfx...@googlemail.com> wrote: > Subtitles which contained styled UTF-8 subtitles (i.e. not just 7 bit > ASCII characters) were not handled correctly. The spec mandates that > styling start/end ranges are in "characters". It's not quite clear what > a "character" is supposed to be, but maybe they mean unicode codepoints. > > FFmpeg's decoder treated the style ranges as byte idexes, which could > lead to UTF-8 sequences being broken, and the common code dropping the > whole subtitle line. > > Change this and count the codepoint instead. This also means that even > if this is somehow wrong, the decoder won't break UTF-8 sequences > anymore. The sample which led me to investigate this now appears to work > correctly. > --- > https://github.com/mpv-player/mpv/issues/5675
For reference, the relevant specification for MOV/3GPP Timed Text seems to be ETSI TS 126 245, which is currently at version 14 (2017-04), available at http://www.etsi.org/deliver/etsi_ts/126200_126299/126245/14.00.00_60/ts_126245v140000p.pdf . It is indeed rather ambiguous in 5.2 regarding what a "character" is in the context of UTF-8 or UTF-16. Best regards, Jan _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel