I also tried different size and I have been able to make it work with any. Regarding doing OCR with OpenCV, I won't have enough time to do that. Moreover, as I already use Tesseract for other fonts, I'd like to use it for this one too (and the guys who did the tutorial said in the comments that Tesseract is more powerful :/ )
Le mardi 7 juillet 2015 21:11:21 UTC+2, Art Rhyno a écrit : > > When tesseract can’t find a matching blob, it gets trickier but at least > it is working with something. I am guessing some of the gaps between > segments are passing a threshold for belonging to a single character. I > tried a few different sizes, but I couldn’t get the “B” recognized and I > wonder if opencv might be a better route if the source of the characters is > fairly static. There’s an example here of using opencv with handwritten > numbers [1]. > > > > art > > --- > > 1. http://blog.damiles.com/2008/11/basic-ocr-in-opencv/ > > > > *From:* tesser...@googlegroups.com <javascript:> [mailto: > tesser...@googlegroups.com <javascript:>] *On Behalf Of *Pierre-Henri > DAUVERGNE > *Sent:* Tuesday, July 07, 2015 8:41 AM > *To:* tesser...@googlegroups.com <javascript:> > *Subject:* Re: [tesseract-ocr] Train tesseract for 14-segment display > > > > I actually can't show you all the characters but I can give you a sample. > I have the 10 digits and all letters. I tried to decrease the size of the > characters but it still didn't work. Tesseract didn't say "Empty page!!" > but "Failure ! Couldn't find a matching blob" for all letters, the digits > worked fine. > > Here is a small sample : http://i.imgur.com/NeYBKrj.png the letters are V > X B C D. > > Thank you for your help :) > > > Le mardi 7 juillet 2015 13:40:24 UTC+2, Art Rhyno a écrit : > > Could you attach the “my_font_exp0.png” and “my_font_exp0.box” that are > producing the “Empty page!!” message? > > > > art > > > > *From:* tesser...@googlegroups.com [mailto:tesser...@googlegroups.com] *On > Behalf Of *Pierre-Henri DAUVERGNE > *Sent:* Tuesday, July 07, 2015 3:26 AM > *To:* tesser...@googlegroups.com > *Subject:* Re: [tesseract-ocr] Train tesseract for 14-segment display > > > > Acutally I followed this guide > <http://blog.ayoungprogrammer.com/2013/01/equation-ocr-part-2-training-characters.html> > > which is essentially the same as the one you gave me. I am doing all that. > I use qt-box-editor to manually set the boxes over the characters then I > use the command "tesseract my_font_exp0.png my_font_exp0 nobatch box.train" > but it says "Empty page!!" and nothing else. It creates an empty .txt file. > Whenever I try to train with linked segments, it works. > That's why I'm looking for an image-processing way of linking all the > segments as they should be or a tesseract way of training it with unlinked > segments. > > > > Le lundi 6 juillet 2015 14:41:22 UTC+2, Art Rhyno a écrit : > > Hi, > > > > I am guessing my attachment didn’t make it to the list but the character I > used is about 17x25 pixels. I resaved the sample as a PNG (instead of a > TIFF) and am trying again. Remember that you can (and often have to) edit > the box files for training. Tesseract may split your character into more > than one blob, but you can override this. By default, the “makebox” > produced: > > > > l 45 254 53 279 0 > > ’ 55 267 62 277 0 > > > > But I modified this to be: > > V 45 254 62 279 0 > > > > I found this blog post really helpful for training [1]. You can contact me > off-list if you want the entire training set I used, but I only did the one > character. > > > > art > > --- > > 1. > http://michaeljaylissner.com/blog/adding-new-fonts-to-tesseract-3-ocr-engine > > > > *From:* tesser...@googlegroups.com [mailto:tesser...@googlegroups.com] *On > Behalf Of *Pierre-Henri DAUVERGNE > *Sent:* Monday, July 06, 2015 4:29 AM > *To:* tesser...@googlegroups.com > *Subject:* Re: [tesseract-ocr] Train tesseract for 14-segment display > > > > Ok so I just tried after resizing my image by 2 and by 4 and it still > doesn't work : tesseract says "Empty page!!". > However, if I manually link the segments (with the brush tool in Gimp, see > here : http://i.imgur.com/akVmAgh.png ), it works but it doesn't feel > like it's a good training for tesseract. > Any advice ? > > Thank you > > Le lundi 6 juillet 2015 09:18:44 UTC+2, Pierre-Henri DAUVERGNE a écrit : > > Hi, thank you for your answer :) > > Each character is about 100x160 pixels, is that too low ? I'll try with > bigger ones and I'll post the results here > > Le samedi 4 juillet 2015 04:10:18 UTC+2, Art Rhyno a écrit : > > Hi, > > > > I wonder if it has something to do with the sizing of the characters in > the image that you are using for font training. I swapped out the character > without the linked segments for a character in a set I am using and it > seemed to work ok. The set is too big for the list but I have attached the > image I used. > > > > art > > > > *From:* tesser...@googlegroups.com [mailto:tesser...@googlegroups.com] *On > Behalf Of *Pierre-Henri DAUVERGNE > *Sent:* Friday, July 03, 2015 10:20 AM > *To:* tesser...@googlegroups.com > *Subject:* [tesseract-ocr] Train tesseract for 14-segment display > > > > Hello everyone. > > I've posted on stackoverflow already but haven't had an answer yet ( > http://stackoverflow.com/questions/31131796/14-segment-display-and-tesseract-ocr-with-opencv > ). > > I'm looking for a way to accurately OCR 14-segment display. As you can see > in my SO thread, I trained tesseract with dilated characters which link all > of its segments together. My issue is that when I read from my webcam a > character, I have to erode it first to remove noise. After that, I dilate > it. > However, I can't do it enough to link all the segments together without > having issues with letters like 'B' and 'D' and the letter 'V' is not > recognized at all (I believe it is because of the space between the > diagonal being too long). > > · What I trained tesseract with (that's the "V" letter) : > http://i.imgur.com/NbmVqkb.png (segments are all linked) > > · What I feed tesseract with : http://i.imgur.com/0E4iXXk.png > (some segments are linked, some aren't) > > I tried to train tesseract with characters where all the segments aren't > linked but it says "Empty page !!". When I manually link the segments, the > training works fine (it feels weird that tesseract can't be trained with > blanck space inside characters since some of the existing languages (ie. > arabic or chineese) already have some). > > To bypass this issue, I've been trying different kind of image processing > algorithms (like thinning, in order to dilate "in height" but not in > "width") but gave more accurate results. > > Thank you for your help ! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-oc...@googlegroups.com. > To post to this group, send email to tesser...@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/451dbd65-20b7-437a-8b5b-a0a726bdad06%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-oc...@googlegroups.com. > To post to this group, send email to tesser...@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/4f0135b3-ced6-439c-8272-66299e6c2a03%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/4f0135b3-ced6-439c-8272-66299e6c2a03%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-oc...@googlegroups.com. > To post to this group, send email to tesser...@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/44f83e75-7a97-4d1e-a6dc-68533fc75b2f%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/44f83e75-7a97-4d1e-a6dc-68533fc75b2f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-oc...@googlegroups.com <javascript:>. > To post to this group, send email to tesser...@googlegroups.com > <javascript:>. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/831536ec-bbc5-44e8-b273-0118e287049d%40googlegroups.com > > <https://groups.google.com/d/msgid/tesseract-ocr/831536ec-bbc5-44e8-b273-0118e287049d%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2e54acb2-2505-475b-8fa2-846ecf3ce36b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.