Re: [tesseract-ocr] Line level training

2018-11-12 Thread Lorenzo Bolzani
Tesseract 4.x uses lines, not chars. Bye Lorenzo Il giorno lun 12 nov 2018 alle ore 05:42 ha scritto: > Dear All, > > Currently, tesseract training is based on the pair (tiff and box). > It's not easy to make box file (char level) if we try to train some scanned > document images not ge

Re: [tesseract-ocr] Line level training

2018-11-12 Thread favpdf
That means we can label some existing images with text line boxes instead of individual char boxes in current tesseract 4.0? I checked the box files generated by the training process and found that char boxes were still there. Thanks, Jun 在 2018年11月12日星期一 UTC+8下午5:26:48,Lorenzo Blz写道: > > Tes

Re: [tesseract-ocr] Line level training

2018-11-12 Thread Lorenzo Bolzani
Il giorno lun 12 nov 2018 alle ore 11:53 ha scritto: > That means we can label some existing images with text line boxes instead > of individual char boxes in current tesseract 4.0? I checked the box files > generated by the training process and found that char boxes were still > there. > Yes it

[tesseract-ocr] Images with text in white color

2018-11-12 Thread raghunath rs
Hi, I recently experienced that Tesseract 4 is not identifying images with text in white and background colored Is there any specific preprocessing? Thanks, Raghu -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this grou

Re: [tesseract-ocr] Images with text in white color

2018-11-12 Thread Zdenko Podobny
Can you please provide images for testing? Zdenko po 12. 11. 2018 o 12:38 raghunath rs napísal(a): > Hi, > > I recently experienced that Tesseract 4 is not identifying images with > text in white and background colored > > Is there any specific preprocessing? > > Thanks, > Raghu > > -- > You r

Re: [tesseract-ocr] -c textord_min_linesize 3.25 in tesseract 4 give Errormessage

2018-11-12 Thread Zdenko Podobny
What kind of error message you get? Please share your image for testing too. Zdenko ne 11. 11. 2018 o 15:39 Martin Jenniges napísal(a): > Hello, > > > I have found the follow Tip for tesseract; but when I give this parameter > with -c *textord_min_linesize 3.25 in tesseract 4, I receive a err

Re: [tesseract-ocr] Images with text in white color

2018-11-12 Thread Seokbong Choi
Use Otsu Inverse from OpenCV. https://www.meccanismocomplesso.org/en/opencv-python-otsu-binarization-thresholding/ On Mon, Nov 12, 2018 at 6:38 AM raghunath rs wrote: > Hi, > > I recently experienced that Tesseract 4 is not identifying images with > text in white and background colored > > Is

Re: [tesseract-ocr] Line level training

2018-11-12 Thread favpdf
It's clear now. Thanks for the information. Jun 在 2018年11月12日星期一 UTC+8下午7:38:19,Lorenzo Blz写道: > Il giorno lun 12 nov 2018 alle ore 11:53 > > ha scritto: > >> That means we can label some existing images with text line boxes instead >> of individual char boxes in current tesseract 4.0? I check

Re: [tesseract-ocr] run text2image failed ,text2image not support chinese name fonts?

2018-11-12 Thread bruce
hi,zdenop My origin output of chcp is "936" As you said,I think it should be a problem with console coding.But i don't know how to solve this coding problem. In the end, I solved this problem in another way.I use software named "fontcreator" to modify the name of the fonts and changed the name t

[tesseract-ocr] Reducing output image quality to make PDF smaller

2018-11-12 Thread xtron16tx
I've not used Tesseract in many years until today. I'm very impressed with what I see now. I need to process a PNG 300 DPI b/w image and have it create an Indexed PDF file. I've run a command line to do this and was very happy with the quality of the result but I would like to be able to feed