Oh, and before I forget, in your case where text is at KNOWN POSITIONS,
I've seen others have very good results by cutting up the 'page' into
sections, one for each 'text' on that 'page' and then feeding tesseract
these sections one at a time as individual images, then recompositing the
OCRed 'page' afterwards by merging the tesseract outputs.

This, in my mind, is part of preprocess step 2 ('local tweaks') but anyway.

That way, you can of course easily *scale* one or more if those sections'
images as needed to make it look like all of them are '20px text lines on a
page'. You get the drift. ;-)



(mind the cropping remark at end of previous message when you do this:
always leave (or add) white border in each (extracted) image or you get
worse results once again)


Alternatively, one could go and MASK the areas of the image where text will
never appear but that would work best if all your texts are about the same
x-height, so in your case I'ld go with cutting up the screen 'page' into
sections and then preprocess each as necessary.


On Tue, Nov 10, 2020, 10:40 player1 <vorpal.hellf...@gmail.com> wrote:

> Hi Folks
>
> Im new to Tesseract and need some pointers on how to improve the ouput
> from a game screen dump.
>
> It has some game stats with different types of fonts, at different sizes
> and one font is skewed to the side.
>
> The screendump has background graphics but its toned down as not to
> disturb human readings the page.
>
> The screendump might have different resolutions but the position of texts
> are fixed to particular regions.
>
> So far I have tried reading the page (with tess4J) at 120 DPI and only the
> simplest text which looks to be about 20pt in size is read out correctly,
> bigger fonts are completely lost.
>
> What options do I have to improve the output form Tesseract?
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/a02592b6-6736-49a9-a5fb-904115645678n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/a02592b6-6736-49a9-a5fb-904115645678n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAFP60fo6NgZs%3DAVLaTTzE6PKwfU8MW1dpg-gK5VBT0UZFXc1vQ%40mail.gmail.com.

Reply via email to