> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Daniel > Cantarín > Sent: Saturday, December 11, 2021 4:18 PM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add > new graphicsub2text filter (OCR) > > Hi there softworkz. > > Having worked before with OCR filter output, I suggest you a > modification for your new filter. > It's not something that should delay the patch, but just a nice addenum. > Could be done in another patch, or could even do it myself in the > future. But I let the comment here anyways, for you to consider. > > If you take a look at vf_ocr, you'll see that it sets > "lavfi.ocr.confidence" metadata field. > Well... downstream filters can check that field in order to just > consider certain confidence threshold, discarding the rest. > This is very useful when doing OCR with non-ascii chars, like I do with > Spanish language. > > So I propose an option like this: > > { "confidence", "Sets the confidence threshold for valid OCR. Default > 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS }, > > Then you do an average of all confidences detected by tesseract after > OCR but before converting to text subtitle frame, and compare that > option value to the average result. > Something like this: > > int average = sum_of_all_confidences / number_of_confidence_items; > if (average >= s->confidence) { > do_your_thing(); > } else { > av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold. > Text detected: '%s'\n", average, text); > } > > Also, I would like to do some tests with spanish OCR, as I had to > explicitly allowlist our non-ascii chars when using OCR filter, and > don't know how yours will behave in that situation. Maybe having the > chars allowlist option here too is a good idea. But, again: none of this > this should delay the patch, as your work is much more important than > this kind of nice to have functionalities, which could be easily > implemented later by anyone. >
Hi Daniel, I don't think that any of that will be necessary. For the generic ocr filter, this might make sense, because it is meant to work in many different situations, different text sizes, different (not necessarily uniform) backgrounds, static or moving, a wide spectrum of colours, and no quantization in the time dimension, etc. But for subtitle-ocr, we have a fixed and static background, we have palette colours from like 4 to 32 only, we know when it starts and that it doesn’t change until the next event and we have a pixel density relative to the text height that is a multiple of what you get when you scan a letter for example. Basically, this is like a pre-school situation for an OCR. If it can't recognize that in a reliable way and you would end up needing to dissect results by confidence level, then the OCR wouldn't be worth a penny and this filter kind of pointless ;-) IIUC, you haven't tried graphicsub2text yet. I suggest, you to look at filters.texi for instructions to set up the model data. There's an example with a test stream that you can run right away. With that example, I haven't been able to spot a single incorrectly recognized character. Somebody who tried my filter had contacted me last week as he was getting rather bad recognition results. It turned out that the text in this case had strong outlines and the inner text was black. After removing the outlines and inverting the text, the recognition result was close to perfect. The crucial part is the preparation of the image before doing OCR. When this is not done right, you can't remedy later with confidence level evaluation. What's working fine already is bright text without outlines. Left for me to do is automatic detection of outline colours and removing those before running recognition. Second part is detection of the text (fill) color and depending on that - replace the transparency either with a light or dark background colour (and invert in the latter case). When you get a chance to try, please let me know about your results. PS: When positive, post here - otherwise contact me privately...LOL Just joking..whatever you prefer. Kind regards,' softworkz Application: Microsoft.Office.Interop.Outlook.ApplicationClass Class: 43 Session: System.__ComObject Parent: System.__ComObject Actions: System.__ComObject Attachments: System.__ComObject BillingInformation: Body: > -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Daniel > Cantarín > Sent: Saturday, December 11, 2021 4:18 PM > To: ffmpeg-devel@ffmpeg.org > Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add > new graphicsub2text filter (OCR) > > Hi there softworkz. > > Having worked before with OCR filter output, I suggest you a > modification for your new filter. > It's not something that should delay the patch, but just a nice addenum. > Could be done in another patch, or could even do it myself in the > future. But I let the comment here anyways, for you to consider. > > If you take a look at vf_ocr, you'll see that it sets > "lavfi.ocr.confidence" metadata field. > Well... downstream filters can check that field in order to just > consider certain confidence threshold, discarding the rest. > This is very useful when doing OCR with non-ascii chars, like I do with > Spanish language. > > So I propose an option like this: > > { "confidence", "Sets the confidence threshold for valid OCR. Default > 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS }, > > Then you do an average of all confidences detected by tesseract after > OCR but before converting to text subtitle frame, and compare that > option value to the average result. > Something like this: > > int average = sum_of_all_confidences / number_of_confidence_items; > if (average >= s->confidence) { > do_your_thing(); > } else { > av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold. > Text detected: '%s'\n", average, text); > } > > Also, I would like to do some tests with spanish OCR, as I had to > explicitly allowlist our non-ascii chars when using OCR filter, and > don't know how yours will behave in that situation. Maybe having the > chars allowlist option here too is a good idea. But, again: none of this > this should delay the patch, as your work is much more important than > this kind of nice to have functionalities, which could be easily > implemented later by anyone. > Hi Daniel, I don't think that any of that will be necessary. For the generic ocr filter, this might make sense, because it is meant to work in many different situations, different text sizes, different (not necessarily uniform) backgrounds, static or moving, a wide spectrum of colours, and no quantization in the time dimension, etc. But for subtitle-ocr, we have a fixed and static background, we have palette colours from like 4 to 32 only, we know when it starts and that it doesn’t change until the next event and we have a pixel density relative to the text height that is a multiple of what you get when you scan a letter for example. Basically, this is like a pre-school situation for an OCR. If it can't recognize that in a reliable way and you would end up needing to dissect results by confidence level, then the OCR wouldn't be worth a penny and this filter kind of pointless ;-) IIUC, you haven't tried graphicsub2text yet. I suggest, you to look at filters.texi for instructions to set up the model data. There's an example with a test stream that you can run right away. With that example, I haven't been able to spot a single incorrectly recognized character. Somebody who tried my filter had contacted me last week as he was getting rather bad recognition results. It turned out that the text in this case had strong outlines and the inner text was black. After removing the outlines and inverting the text, the recognition result was close to perfect. The crucial part is the preparation of the image before doing OCR. When this is not done right, you can't remedy later with confidence level evaluation. What's working fine already is bright text without outlines. Left for me to do is automatic detection of outline colours and removing those before running recognition. Second part is detection of the text (fill) color and depending on that - replace the transparency either with a light or dark background colour (and invert in the latter case). When you get a chance to try, please let me know about your results. PS: When positive, post here - otherwise contact me privately...LOL Just joking..whatever you prefer. Kind regards,' softworkz Categories: Companies: ConversationIndex: 0101D7EE0E321DD45EEE605BD34994EF6CE3443CE288AC2D685A00800014B360 ConversationTopic: [PATCH 1/1] Test ref file change CreationTime: 11 Dec 2021 17:31:42 EntryID: 00000000BEFAAEED30DFDF43976487F12A562A600700DEB68488D92E8146A5C16995B1AE958D00000000010F0000DEB68488D92E8146A5C16995B1AE958D000570B7A2E40000 FormDescription: System.__ComObject GetInspector: System.__ComObject Importance: 1 LastModificationTime: 11 Dec 2021 17:31:42 MessageClass: IPM.Note Mileage: NoAging: False OutlookInternalVersion: 1614701 OutlookVersion: 16.0 Saved: False Sensitivity: 0 Size: 12438 Subject: RE: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR) UnRead: True UserProperties: System.__ComObject AlternateRecipientAllowed: True AutoForwarded: False BCC: CC: DeferredDeliveryTime: 1 Jan 4501 00:00:00 DeleteAfterSubmit: False ExpiryTime: 1 Jan 4501 00:00:00 FlagRequest: HTMLBody: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META NAME="Generator" CONTENT="MS Exchange Server version 16.0.14701.20038"> <TITLE></TITLE> </HEAD> <BODY> <!-- Converted from text/plain format --> <BR> <BR> <P><FONT SIZE=2>> -----Original Message-----<BR> > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Daniel<BR> > Cantarín<BR> > Sent: Saturday, December 11, 2021 4:18 PM<BR> > To: ffmpeg-devel@ffmpeg.org<BR> > Subject: Re: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add<BR> > new graphicsub2text filter (OCR)<BR> ><BR> > Hi there softworkz.<BR> ><BR> > Having worked before with OCR filter output, I suggest you a<BR> > modification for your new filter.<BR> > It's not something that should delay the patch, but just a nice addenum.<BR> > Could be done in another patch, or could even do it myself in the<BR> > future. But I let the comment here anyways, for you to consider.<BR> ><BR> > If you take a look at vf_ocr, you'll see that it sets<BR> > "lavfi.ocr.confidence" metadata field.<BR> > Well... downstream filters can check that field in order to just<BR> > consider certain confidence threshold, discarding the rest.<BR> > This is very useful when doing OCR with non-ascii chars, like I do with<BR> > Spanish language.<BR> ><BR> > So I propose an option like this:<BR> ><BR> > { "confidence", "Sets the confidence threshold for valid OCR. Default<BR> > 80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },<BR> ><BR> > Then you do an average of all confidences detected by tesseract after<BR> > OCR but before converting to text subtitle frame, and compare that<BR> > option value to the average result.<BR> > Something like this:<BR> ><BR> > int average = sum_of_all_confidences / number_of_confidence_items;<BR> > if (average >= s->confidence) {<BR> > do_your_thing();<BR> > } else {<BR> > av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold.<BR> > Text detected: '%s'\n", average, text);<BR> > }<BR> ><BR> > Also, I would like to do some tests with spanish OCR, as I had to<BR> > explicitly allowlist our non-ascii chars when using OCR filter, and<BR> > don't know how yours will behave in that situation. Maybe having the<BR> > chars allowlist option here too is a good idea. But, again: none of this<BR> > this should delay the patch, as your work is much more important than<BR> > this kind of nice to have functionalities, which could be easily<BR> > implemented later by anyone.<BR> ><BR> <BR> Hi Daniel,<BR> <BR> I don't think that any of that will be necessary. For the generic ocr<BR> filter, this might make sense, because it is meant to work in<BR> many different situations, different text sizes, different (not<BR> necessarily uniform) backgrounds, static or moving, a wide spectrum<BR> of colours, and no quantization in the time dimension, etc.<BR> <BR> But for subtitle-ocr, we have a fixed and static background, we have<BR> palette colours from like 4 to 32 only, we know when it starts and<BR> that it doesn’t change until the next event and we have a pixel<BR> density relative to the text height that is a multiple of what<BR> you get when you scan a letter for example.<BR> <BR> Basically, this is like a pre-school situation for an OCR. If it<BR> can't recognize that in a reliable way and you would end up needing<BR> to dissect results by confidence level, then the OCR wouldn't be<BR> worth a penny and this filter kind of pointless ;-)<BR> <BR> IIUC, you haven't tried graphicsub2text yet. I suggest, you to<BR> look at filters.texi for instructions to set up the model data.<BR> There's an example with a test stream that you can run right<BR> away. With that example, I haven't been able to spot a single<BR> incorrectly recognized character.<BR> <BR> Somebody who tried my filter had contacted me last week as he<BR> was getting rather bad recognition results. It turned out<BR> that the text in this case had strong outlines and the inner<BR> text was black. After removing the outlines and inverting the<BR> text, the recognition result was close to perfect.<BR> <BR> The crucial part is the preparation of the image before doing<BR> OCR. When this is not done right, you can't remedy later with<BR> confidence level evaluation.<BR> <BR> What's working fine already is bright text without outlines.<BR> Left for me to do is automatic detection of outline colours<BR> and removing those before running recognition. Second part is<BR> detection of the text (fill) color and depending on that - replace<BR> the transparency either with a light or dark background colour<BR> (and invert in the latter case).<BR> <BR> When you get a chance to try, please let me know about your<BR> results.<BR> <BR> PS: When positive, post here - otherwise contact me privately...LOL<BR> <BR> Just joking..whatever you prefer.<BR> <BR> Kind regards,'<BR> softworkz<BR> <BR> </FONT> </P> </BODY> </HTML> OriginatorDeliveryReportRequested: False ReadReceiptRequested: False ReceivedByEntryID: ReceivedByName: ReceivedOnBehalfOfEntryID: ReceivedOnBehalfOfName: ReceivedTime: 11 Dec 2021 18:39:00 RecipientReassignmentProhibited: False Recipients: System.__ComObject ReminderOverrideDefault: False ReminderPlaySound: False ReminderSet: False ReminderSoundFile: ReminderTime: 1 Jan 4501 00:00:00 RemoteStatus: 0 ReplyRecipientNames: ReplyRecipients: System.__ComObject SaveSentMessageFolder: System.__ComObject SenderName: Sent: False SentOn: 1 Jan 4501 00:00:00 SentOnBehalfOfName: softwo...@hotmail.com Submitted: False To: FFmpeg development discussions and patches VotingOptions: VotingResponse: ItemProperties: System.__ComObject BodyFormat: 1 DownloadState: 1 InternetCodepage: 65001 MarkForDownload: 0 IsConflict: False AutoResolvedWinner: False Conflicts: System.__ComObject SenderEmailAddress: softwo...@hotmail.com SenderEmailType: EX Permission: 0 PermissionService: 0 PropertyAccessor: System.__ComObject SendUsingAccount: System.__ComObject TaskSubject: RE: [FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR) TaskDueDate: 1 Jan 4501 00:00:00 TaskStartDate: 1 Jan 4501 00:00:00 TaskCompletedDate: 1 Jan 4501 00:00:00 ToDoTaskOrdinal: 1 Jan 4501 00:00:00 IsMarkedAsTask: False ConversationID: Sender: System.__ComObject RTFBody: System.Byte[] RetentionExpirationDate: 1 Jan 4501 00:00:00 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".