In case it helps someone, Yes there is a way to change the behaviour of 'minimum number of characters' I struggled with the same problem you have as well for a while
In this file, https://github.com/tesseract-ocr/tesseract/blob/master/ccmain/osdetect.cpp change the value of this constant to something like 5. Recompile and you are done. const int kMinCharactersToTry = 50 I have asked the developers to make a command line setting of that internal constant. If/When they do it I dont know. Enjoy Hakan On Saturday, April 19, 2014 at 8:13:25 PM UTC+3, Chris Nevin wrote: > > Hello, > > I am having some trouble getting Tesseract to recognize individual > characters. Whenever I think I have overcome actual errors, I get the line > "Too few characters. Skipping this page" > > Because I am using Tess4J I have been struggling to find out exactly what > all of the different options you can set for Tesseract actually are. Would > anyone be able to tell me if there is a way to set it to not limit the > minimum number of characters on a page? > > Also, I am trying to get Tesseract to recognise characters from chemical > elements (example attached.) Will Tesseract be able to ignore the structure > and just pick up on the characters? > > Basically any advice as to what would be a good way to go about this would > be helpful! Even if I should look at training Tesseract or creating a word > list with the chemical elements or something? > > Thanks a lot! > > Chris > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b95edf04-1155-4a5f-9c5b-08d4cfb5271d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.