[tesseract-ocr] hocr bbox set to 0,0,xmax,ymax

2018-04-25 Thread Sreenath BH
Hi We are using tesseract 4.0 on debian x64 tesseract 4.00.00alpha leptonica-1.74.4 libpng 1.5.4 : zlib 1.2.11 We have observed that several words have bbox set to 0,0,3400,4400 These are dimensions of the legal paper size. The words are correctly extracted, and next words in same line have

Re: [tesseract-ocr] Problem facing with tessearct training 4 with arabic

2018-04-25 Thread ShreeDevi Kumar
You are trying to train only digits but then using the unicharset which has these numbers only for compressing the wordlist (which uses Arabic alphabet) to a 'dawg'. The command you have used only creates the starter traineddata for LSTM training. Please follow the instructions given in the wiki

Re: [tesseract-ocr] just installed, get error messages

2018-04-25 Thread Zdenko Podobny
Why are you building project from source if you have no clue what you do? Based on your other post: you decided to build leptonica without support of common image formats. Dňa št 26. 4. 2018, 7:01 Rolf Schumacher napísal(a): > I just installed from git repository > > tesseract --version shows:

[tesseract-ocr] just installed, get error messages

2018-04-25 Thread Rolf Schumacher
I just installed from git repository tesseract --version shows: sc@rolf29 ~ $ tesseract /home/rsc/log/2018-04-26/in.png $LOGDIR Error in pixReadMemTiff: function not present Error in pixReadMem: tiff: no pix returned Error in pixaGenerateFontFromString: pix not made Error in bmfCreate: font pixa

[tesseract-ocr] Problem facing with tessearct training 4 with arabic

2018-04-25 Thread Amir Raouf
First The arabic is read by tesseract with good accuracy but NO DIGITS read so I decided to train only numbers with specific font I need This is the question https://stackoverflow.com/questions/50029477/issue-with-training-tesseract-4-0 Any advice -- You received this message because you are

Re: [tesseract-ocr] error: required directory

2018-04-25 Thread Marius Amado-Alves
Zdenko, your latest fix of the makefile has solved this problem:-) Thanks a lot. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googl

Re: [tesseract-ocr] error: required directory

2018-04-25 Thread Zdenko Podobny
We are making reorganization of tesseract. Using the latest code is not recommended at all especially if you do not follow developers communications. Zdenko 2018-04-25 19:59 GMT+02:00 Marius Amado-Alves : > Trying to install on a Mac, cannot pass the autogen.sh step. Any tips > highly apprecia

[tesseract-ocr] error: required directory

2018-04-25 Thread Marius Amado-Alves
Trying to install on a Mac, cannot pass the autogen.sh step. Any tips highly appreciated. Current directory is /tesseract bash-3.2# ./autogen.sh Running aclocal Running /opt/local/bin/glibtoolize glibtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'config'. glibtoolize: copying file '

Re: [tesseract-ocr] tesseract performs wrong auto-correction sometimes : how to disable it?

2018-04-25 Thread ShreeDevi Kumar
Which version of tesseract are you using? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Apr 25, 2018 at 8:29 PM, Youcef wrote: > Hi, > > > Tesseract seems to post process its prediction. > > Here after, what I

[tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-25 Thread Александр Поздняков
for CentOS > yum-config-manager > --add-repo > https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/ > yum update > yum install tesseract for example > yum install tesseract-langpack-deu среда, 25 апреля 2018 г., 16:30:01 UTC+3 пользователь Eugene Huang написал:

[tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-25 Thread shree
Thanks for the rpm package, Alex. I have added the info to https://github.com/tesseract-ocr/tesseract/wiki On Tuesday, April 24, 2018 at 10:04:55 PM UTC+5:30, Александр Поздняков wrote: > > Hi. I compiled an rpm package with tesseract-ocr for CentOS, Fedora, > ScientificLinux, OpenSuse. It mus

[tesseract-ocr] tesseract performs wrong auto-correction sometimes : how to disable it?

2018-04-25 Thread Youcef
Hi, Tesseract seems to post process its prediction. Here after, what I get after OCRizing images (same font, same size images generated with text2image): - an image containing "12345678I" => `123456781` - an image containing "GLOTHUVFI" => `GLOTHUVFI` - an image containing "12345678H" => `1234

Re: [tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread Zdenko Podobny
Well, you should contact creator of traineddata . We have no clue what they did.. Zdenko 2018-04-25 14:55 GMT+02:00 : > Hello there, > > i don't know what to do anymore... > I want to use tesseract-ocr 3.05 for scanning documents, using the font > "Perfect DOS VGA 437 Win". > Got a traineddata f

[tesseract-ocr] Trained font - always one letter wrong

2018-04-25 Thread dave . hardy
Hello there, i don't know what to do anymore... I want to use tesseract-ocr 3.05 for scanning documents, using the font "Perfect DOS VGA 437 Win". Got a traineddata file for my font from trainyourtesseract.com, actual it works really nice but in every case the letter "d" isnt identified but "a"

[tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-25 Thread Eugene Huang
Hello Александр! I took a look at your stuff; it is very extensive. If all the installations work, this should be front-paged! I have never used openSUSE. Could you point me to some resources to figure out how use your installation packages? @shree Thanks for the info. I definitely notice that