> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of > Oneric > Sent: Saturday, February 5, 2022 2:20 AM > To: FFmpeg development discussions and patches <ffmpeg- > de...@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > On Fri, Feb 04, 2022 at 23:24:58 +0000, Soft Works wrote: > > You want to "pollute" gazillions of subtitle streams in the > > world from multiple subtitle formats with invisible > > characters in order to solve an escaping problem in ffmpeg? > > I do not consider using characters that are explicitly recommended to > be > used by Unicode to be “polluting”. Further consider that as mentioned > invisible characters in ASS are not uncommon anyway already and > conversion > from ASS to something else are rare due to being generally lossy. > Lossy > with regards to typesetting that is, removing breaking hints in form > of > plain Unicode characters would be a new form of lossyness. > > > [From the other mail:] > > I'm not into changing ffmpeg's ass output, it's all > > about the internally used ass format and the escaping is > > a central problem there. > > I’m not interested in reworking ffmpeg’s internal subtitle handling. > The proposed patch is a clear improvement over the status quo which > is plain incorrect. Within reasonable effort and sound arguments for > it adjustments to the patch can be made; reworking ffmpeg internals is > imo not “reasonable” effort to correct an uncontestedly wrong escape. > > You have two options: > Either finally tell me what I asked about: > where (as in which file and function) removing wordjoiners should > even happen and where possible lingering “\\ → \” conversions > presumably > are and if it’s simple enough I can add a removal accompanied by a > comment > pointing out that this can go wrong. > Or go ahead and create your own patch. > > ~~~~~~ > > > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > > > those chars - which could be verified of course. > > > > > > Since it's in the BMP and ffmpeg already seems happy to assume > some > > > UTF-8 > > > support by converting everything to it, I'm not worried about this > > > until > > > proven wrong. > > > > Proven wrong: https://github.com/libass/libass/issues/507 > > This issue is not at all wordjoiner specific despite the name. > As far as I recall this never lead to wrong rendering. > With HarfBuzz, the only fully featured shaping backend of libass, > control characters were and are handled by HarfBuzz. > And even with FriBiDi U+2060 was ignored since long before (2012) > the linked issue was opened. > > What that issue really is about is a combination of two more general > issues. libass is currently not caching failure to lookup a glyph > leading > to multiple messages and at worst a perf degradation if no font on the > font pool contained a glyph for a particular glyph. And the > realisation > that libass’ font-fallback strategy is not ideal for prefix-type > control > characters, characters which visibly affect both neighbours and a few > others. > The word-joiner is only highlighted here as due to its usage as an > backslash escape its commonly passed to libass and a high enough > percentage of fonts doesn’t contain it to create reports about it. > > > For further reference: U+2060 was added in Unicode 3.2 released 2002. > If you want to strip it because it might not render correctly you > should > also strip most emoji, the uppercase eszett ẞ and several actively > used writing systems in their entirety.
Let's try to approach this from a different side. Which case is your [1/2] commit actually supposed to fix? How did you test your patch? Can we please go over an example? Thanks, sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".