Hi Daniel,
I don't think that any of that will be necessary. For the generic ocr
filter, this might make sense, because it is meant to work in
many different situations, different text sizes, different (not
necessarily uniform) backgrounds, static or moving, a wide spectrum
of colours, and no quantization in the time dimension, etc.
But for subtitle-ocr, we have a fixed and static background, we have
palette colours from like 4 to 32 only, we know when it starts and
that it doesn’t change until the next event and we have a pixel
density relative to the text height that is a multiple of what
you get when you scan a letter for example.
I see. That's a good point: this isn't generic OCR, but pretty
specific. Didn't considered that before.
Basically, this is like a pre-school situation for an OCR. If it
can't recognize that in a reliable way and you would end up needing
to dissect results by confidence level, then the OCR wouldn't be
worth a penny and this filter kind of pointless ;-)
Well... I respectfully disagree, because reality's pretty effective
when it's about messing with common sense, making that paragraph
simply too optimistic. I'm sure we'll find some subtitle provider
with awful fonts and/or subtitling practices more sooner than later,
and that day those words will become sour.
Yet, I get your point. Please just ignore my previous comments
about the new filter. I'll test it properly eventually, and give you
some feedback. If any change is needed, I'll try to apply it myself,
so you don't have to do extra work. But just forget about it in the
meantime, as your point stands so far.
IIUC, you haven't tried graphicsub2text yet. I suggest, you to
look at filters.texi for instructions to set up the model data.
(...)
The crucial part is the preparation of the image before doing
OCR. When this is not done right, you can't remedy later with
confidence level evaluation.
I'm aware, thanks. No expert, but have some experience with the stuff.
I'm actually using vf_ocr, taking dvbsubs and doing some alchemy with
lavfi using fps filter for the sparseness (and OCR CPU usage), color
tuning, creating a proper background for the ocr process, and so on.
I got OK results with image prep, and lots of noise without it. So
I kinda know the deal. Insights are cool anyways, and your code give
some good ideas too.
What's working fine already is bright text without outlines.
Left for me to do is automatic detection of outline colours
and removing those before running recognition. Second part is
detection of the text (fill) color and depending on that - replace
the transparency either with a light or dark background colour
(and invert in the latter case).
Bright (white) background over dark (black) characters had the best
results for me so far.
When you get a chance to try, please let me know about your
results.
Most likely next week I'll take a look at it. It's easier now that you
let a public fork online (in another thread). I'm still getting used
to the patch and mail list dynamics.
PS: When positive, post here - otherwise contact me privately...LOL
Just joking..whatever you prefer.
Kind regards,'
softworkz
I try not to be rude, because I know it feels awful on the other side,
and I value feelings. I also tend to be chatty, in order to try to
understand and be understood. However, I fear replying a lot may be
seen as spaming the mailing list, so I'll keep my interactions to a
minimum. Please know there are people like me reading your work, even
when we may keep silent for different reasons.
Thanks,
Daniel.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".