Hello, I am having trouble getting numbers recognized.
I am using Tess4J from http://tess4j.sourceforge.net/ 
which, if I am not wrong, is using Tesseract 3.05 in the background.

I followed the instructions outlined here:
http://tess4j.sourceforge.net/tutorial/
(using the command line version, no eclipse, maven or other sh't)

I can modify the TesseractExample.java file without an issue and doing the 
2 command line commands mentioned in the site above, can do an tesseract 
ocr scan on any png or jpg I want.

Now you see what I in the end want to do is use ocr to make my program 
"read" the balance of an online casino and with that balance now given as a 
string variable, I will do all kinds of actions based on it.
so reading the numbers properly is important.

Now for test purposes I took 2 screenshots that together include all the 
different digits that can appear, so 0-9.

when I do the normal ocr as instructed in the page above, (from my 
knowledge, it then uses the pre-trained standard eng.traineddata file)
sadly both the digits 4 and 6 in the image are read as 5.
the euro sign € is also as the pound sign isntead but that is of minor 
importance to me.
the ocr not being able to distinguish between 4 and 6 really sucks.

The pictures used are these ones:
https://ibb.co/ZTRFqVg
https://ibb.co/p23w7nj

As said, they are basically screenshots of the casino site and so I cant 
influence the font or size or anything.

as said, the ocr reads the "4,6" part as "5,5".

which is bad.
So I thought, why not use the 2 images to train tesseract, as obviously 
tesseract having seen all the possible digits should give it 100% accuracy, 
right?
well, I got myself jtessboxeditor, got myself serrak tesseract trainer, did 
a ton of stuff and created the traineddata from the image.
and made the ocr file use it to try to ocr the image again.
well, I wrote a line in my code to System.out.print the string and also 
write down its length.
I dont know what ocr does. but the stuff written as a result in the command 
line window is an empty line (where the result string should stand) and 
string length is claimed  to be 6 (it should be 11 with all the digits. and 
, involved).
so I dont know watf ocr is doing, is sucks way harder than with the 
standard eng language.

so I did some bit of googling, apparently the font "Alte DIN 1451 
Mittelschrift" is VERY similar to my number, the casino (for the balance 
display at least) uses this font or a very similar one.
so while I know about a font worth training with (I also already downloaded 
it's ttf file) I havent the slightest idea how to train with the font.

Can someone please help me, explain to me why the ocr result can be that 
bad after training with the actual image to ocr?
(was a pain to perfectly fit the rectangles to the digits!)
or how to train tess4j with the given font?
google even tells me about such a one click service but sadly it is 
apparently gone by now.

can someone help me please? :-)



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d40524b4-602c-4450-b4dd-924667afb9dfn%40googlegroups.com.

Reply via email to