On Fri, Feb 04, 2022 at 23:24:58 +0000, Soft Works wrote: > You want to "pollute" gazillions of subtitle streams in the > world from multiple subtitle formats with invisible > characters in order to solve an escaping problem in ffmpeg?
I do not consider using characters that are explicitly recommended to be used by Unicode to be “polluting”. Further consider that as mentioned invisible characters in ASS are not uncommon anyway already and conversion from ASS to something else are rare due to being generally lossy. Lossy with regards to typesetting that is, removing breaking hints in form of plain Unicode characters would be a new form of lossyness. > [From the other mail:] > I'm not into changing ffmpeg's ass output, it's all > about the internally used ass format and the escaping is > a central problem there. I’m not interested in reworking ffmpeg’s internal subtitle handling. The proposed patch is a clear improvement over the status quo which is plain incorrect. Within reasonable effort and sound arguments for it adjustments to the patch can be made; reworking ffmpeg internals is imo not “reasonable” effort to correct an uncontestedly wrong escape. You have two options: Either finally tell me what I asked about: where (as in which file and function) removing wordjoiners should even happen and where possible lingering “\\ → \” conversions presumably are and if it’s simple enough I can add a removal accompanied by a comment pointing out that this can go wrong. Or go ahead and create your own patch. ~~~~~~ > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > > those chars - which could be verified of course. > > > > Since it's in the BMP and ffmpeg already seems happy to assume some > > UTF-8 > > support by converting everything to it, I'm not worried about this > > until > > proven wrong. > > Proven wrong: https://github.com/libass/libass/issues/507 This issue is not at all wordjoiner specific despite the name. As far as I recall this never lead to wrong rendering. With HarfBuzz, the only fully featured shaping backend of libass, control characters were and are handled by HarfBuzz. And even with FriBiDi U+2060 was ignored since long before (2012) the linked issue was opened. What that issue really is about is a combination of two more general issues. libass is currently not caching failure to lookup a glyph leading to multiple messages and at worst a perf degradation if no font on the font pool contained a glyph for a particular glyph. And the realisation that libass’ font-fallback strategy is not ideal for prefix-type control characters, characters which visibly affect both neighbours and a few others. The word-joiner is only highlighted here as due to its usage as an backslash escape its commonly passed to libass and a high enough percentage of fonts doesn’t contain it to create reports about it. For further reference: U+2060 was added in Unicode 3.2 released 2002. If you want to strip it because it might not render correctly you should also strip most emoji, the uppercase eszett ẞ and several actively used writing systems in their entirety. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".