Hello, I am having trouble getting numbers recognized. I am using Tess4J from http://tess4j.sourceforge.net/ which, if I am not wrong, is using Tesseract 3.05 in the background.
I followed the instructions outlined here: http://tess4j.sourceforge.net/tutorial/ (using the command line version, no eclipse, maven or other sh't) I can modify the TesseractExample.java file without an issue and doing the 2 command line commands mentioned in the site above, can do an tesseract ocr scan on any png or jpg I want. Now you see what I in the end want to do is use ocr to make my program "read" the balance of an online casino and with that balance now given as a string variable, I will do all kinds of actions based on it. so reading the numbers properly is important. Now for test purposes I took 2 screenshots that together include all the different digits that can appear, so 0-9. when I do the normal ocr as instructed in the page above, (from my knowledge, it then uses the pre-trained standard eng.traineddata file) sadly both the digits 4 and 6 in the image are read as 5. the euro sign € is also as the pound sign isntead but that is of minor importance to me. the ocr not being able to distinguish between 4 and 6 really sucks. The pictures used are these ones: https://ibb.co/ZTRFqVg https://ibb.co/p23w7nj As said, they are basically screenshots of the casino site and so I cant influence the font or size or anything. as said, the ocr reads the "4,6" part as "5,5". which is bad. So I thought, why not use the 2 images to train tesseract, as obviously tesseract having seen all the possible digits should give it 100% accuracy, right? well, I got myself jtessboxeditor, got myself serrak tesseract trainer, did a ton of stuff and created the traineddata from the image. and made the ocr file use it to try to ocr the image again. well, I wrote a line in my code to System.out.print the string and also write down its length. I dont know what ocr does. but the stuff written as a result in the command line window is an empty line (where the result string should stand) and string length is claimed to be 6 (it should be 11 with all the digits. and , involved). so I dont know watf ocr is doing, is sucks way harder than with the standard eng language. so I did some bit of googling, apparently the font "Alte DIN 1451 Mittelschrift" is VERY similar to my number, the casino (for the balance display at least) uses this font or a very similar one. so while I know about a font worth training with (I also already downloaded it's ttf file) I havent the slightest idea how to train with the font. Can someone please help me, explain to me why the ocr result can be that bad after training with the actual image to ocr? (was a pain to perfectly fit the rectangles to the digits!) or how to train tess4j with the given font? google even tells me about such a one click service but sadly it is apparently gone by now. can someone help me please? :-) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d40524b4-602c-4450-b4dd-924667afb9dfn%40googlegroups.com.