> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Soft > Works > Sent: Friday, February 4, 2022 6:34 AM > To: FFmpeg development discussions and patches <ffmpeg- > de...@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: fix > handling of backslashes > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of > Soft > > Works > > Sent: Friday, February 4, 2022 2:58 AM > > To: FFmpeg development discussions and patches <ffmpeg- > > de...@ffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > fix > > handling of backslashes > > > > > > > > > -----Original Message----- > > > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of > > > Oneric > > > Sent: Friday, February 4, 2022 2:01 AM > > > To: FFmpeg development discussions and patches <ffmpeg- > > > de...@ffmpeg.org> > > > Subject: Re: [FFmpeg-devel] [PATCH 1/2] avcodec/{ass, webvttdec}: > > fix > > > handling of backslashes > > > > > > On Thu, Feb 03, 2022 at 20:51:16 +0000, Soft Works wrote: > > > > I think when you inject that word-joiner as a workaround for ass > > > > parsing, you'll also need to make sure that it gets removed > > > > when encoding to other formats. > > > > > > There's no way of knowing whether the word-joiner comes from > > > a conversion performed by ffmpeg in the past or already existed > > > in the original source. > > > > That might be true, but I think it's valid to say that such > characters > > are very unusual "original" subtitle sources and that's why I don't > > think it's a good idea for ffmpeg to start injecting them. > > > > Subtitle implementations are often rather minimal, especially in > > hardware devices and might not always cover the full range of > > UTF-8 specifics. > > > > > However, the wordjoiner does not alter the visually appearance and > > > is unlikely to change line-breaking properties; that's why I chose > > > a word-joiner. Therefore I don't think removing (only) the > inserted > > > word-joiners is possible, > > > > Why not? As it seems to be required for ASS encoding only, all other > > output formats should remain unaffected. > > > > > but also not necessary. > > > > I'm not sure whether all ffmpeg text-sub encoders can handle > > those chars - which could be verified of course. > > > > But what remains is the question about the effect on end devices > > which are consuming that output. > > > > Finally, those chars are a pest. I'm using them myself for a > > specific use case, but when you don't know they are there, it can > > drive you totally mad, eventually even thinking your system or > > software is faulty. > > > > Example: > > > > Open your patch file [2/2] and search for the string > > "123456\NAscending". You can see the string in two lines, but search > > will only find one of them. > > > > Or just look at the two lines directly. They are preceded by + and - > > even though both appear identical. > > > > > > So, this also needs consideration of the consequences, like how > > many developers (inside and outside of ffmpeg) this would be driving > > nuts over the years and make them start hating ffmpeg for doing so > > once they've found out. > > As I really hate how many devs on this ML keep saying 'no' to > submitted > code without having a better suggestion, assuming that this is all > that > it takes, I don't want to assimilate in this regard. > > Hence I want to propose the following solution: > > First of all, the existing code in ff_ass_bprint_text_event() is > totally > wrong already. Not only with regard to the backslash escaping (as you > have already pointed out), but also the curly brace escaping is > invalid. > There is no curly-brace escaping in ASS either. > > In fact it is impossible with ASS to display an opening curly brace > followed > by a closing curly brace at a subsequent position (each one alone may > work > depending on implementation). > > If it was about ASS alone, we might just drop those braces, so we > could > at least avoid the text in-between from being hidden (when outputting > ASS), but ASS is also the internal ("uncompressed/raw") subtitle > format > in ffmpeg that is used for conversion (and subtitle filtering). > So it would be hard-to-sell when curly braces would get lost when > converting from one text-sub format to another with none of them > even being ASS. > > What we need is to stop creating invalid ASS and at the same time > ensure proper conversion of curly braces. How? We substitute them! > > And still, UTF-8 can come to the rescue. There are two suitable > candidates for that: > > SMALL LEFT CURLY BRACKET (U+FE5B, Ps): ﹛ > SMALL RIGHT CURLY BRACKET (U+FE5C, Pe): ﹜ > FULLWIDTH LEFT CURLY BRACKET (U+FF5B, Ps): { > FULLWIDTH RIGHT CURLY BRACKET (U+FF5D, Pe): } > > Substitution of curly braces with one of those will prevent ASS from > treating > any possible subtitle content as override code. > > What remains to be handled now is the backslash case. Now that we can > be sure > that we are never inside a sequence that ASS would consider an > override code, > only 3 cases are remaining where the backslash has a meaning in ASS > dialog > text: '\n', '\N' and '\h'. > > We can simply escape those sequences by inserting a (no-op) override > code > between the backslash and the char. Suitable for this is: {\r} > This code resets inline styles, but since we are coming from plain > text subs > in ff_ass_bprint_text_event(), we know that we don't have any inline > styles > and it's a no-op to reset the style. > > Needless to say that we will of course change the substituted curly > braces > back to the regular ones at the encoding side for all but ASS. > Remains the question what to do when encoding to ASS: We can either > keep the alternate brace characters or just remove them (or maybe > replace > with square brackets). > > I'm not sure about that last point, but in total, this will be a clean > solution > without injecting any weird chars into the subtitle output, and it > will fix > multiple incorrect behaviors in the current implementation.
I've found out where the \{ and \} escaping has come from: libass They decided at some time to introduce this kind of escaping which is actually incompatible with normal ASS syntax and libass specific: https://github.com/libass/libass/issues/194#issuecomment-352213210 This doesn't mean though, that the ffmpeg internal ASS format needs to follow the libass route in this regard. It only matters for the libass output encoder, because \{\r}N is broken by that libass decision, so for this case, we'll need a different way. sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".