Re: [tesseract-ocr] Not getting results with numbers and currency simbols in tables

2019-03-23 Thread kotomi . niu
Hi, i feel confused why upscaling works.Actually, in the tesseract, it also has the process to prescale the image to height 36pix. 在 2018年7月30日星期一 UTC+8下午11:19:23,Emiliano Isaza Villamizar写道: > > Lorenzo, Thank you so much for your help. I did everything step by step > and got a very good resu

Re: [tesseract-ocr] Re: Dot Matrix Fonts and Tesseract's Connected Component Analysis

2019-03-23 Thread Shree Devi Kumar
> > That's interesting that you tried replacing the top layer. I haven't > tried that yet. How many iterations did you use? > >> In this case the unicharset was limited to UPPERCASE letters, 0-9 numbers , : and /. I used a training_text which followed the pattern of the image - lines starting wit

Re: [tesseract-ocr] General strategies for dealing with problem images

2019-03-23 Thread Shree Devi Kumar
https://github.com/tesseract-ocr/tesseract/pull/2294 by @bertsky adds the whitelist/blacklist functionality for Tesseract4. It has not been merged yet. On Sat, Mar 23, 2019 at 2:58 PM Lorenzo Bolzani wrote: > Il giorno mar 19 mar 2019 alle ore 06:03 Jonathan Muller < > jmul...@pukogames.com> ha

Re: [tesseract-ocr] General strategies for dealing with problem images

2019-03-23 Thread Lorenzo Bolzani
Il giorno mar 19 mar 2019 alle ore 06:03 Jonathan Muller < jmul...@pukogames.com> ha scritto: > 5 - Create a whitelist based on the zone of probable characters (this one > improves accuracy a lot !) > Ho do you do whitelisting with tesseract 4.x? As far as I know is not yet supported. I do the

[tesseract-ocr] Re: Dot Matrix Fonts and Tesseract's Connected Component Analysis

2019-03-23 Thread ameerahrahrah
Hi Shree, Thanks for the files! That's interesting that you tried replacing the top layer. I haven't tried that yet. How many iterations did you use? I was thinking today that it is difficult to create a single strong learner with tesseract because training from scratch requires so much data