[tesseract-ocr] Help

2019-04-08 Thread samer
Hi In Arabic you read small braces like (1) (2) symbols how to solve the problem -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@goog

Re: [tesseract-ocr] Tesseract different output on windows then linux

2019-04-08 Thread stjoweil via tesseract-ocr
On Monday, 8 April 2019 00:03:25 UTC+2, Chirs Masselli wrote: > > SOLVED > I solved it by downloading the 32 bit setup, it also fixed the bad > recognition on windows vs linux without swithcing the train data > 32bit linked I used from the wiki > https://digi.bib.uni-mannheim.de/tesseract/tesse

[tesseract-ocr] Minimum hardware requirement for openCV and pytesseract

2019-04-08 Thread Saurabh Jain
Dear Developers, I want to develop OCR application using openCV and pytesseract to extract text of document from photographed pictures of different types of printed document(e.g newspaper, magazine, colored certificate). Can you please inform me what will be minimum hardware requirement f

[tesseract-ocr] tesseract 4 training document is not uptodate

2019-04-08 Thread yoganand
Can someone update the tesseract training document with latest details. So that the starters can follow your instructions in the documentation. Problems i see are: 1) documentation should be for python 3 only 2)tesseract url given in makefile is not working. 3)few steps are quiet hard to underst

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-08 Thread Lorenzo Bolzani
Hi Shree, I'd love to but it is a commercial project I'm working on so I cannot share the current solution. I will try to find the old scripts I used for the first attempts. Basically it was something like this: - normalize lightness - make illumination uniform (CLAHE on HSV "V" channel) - denois

Re: [tesseract-ocr] confuse whether Otsu Thresholding affects lstm training

2019-04-08 Thread Lorenzo Bolzani
Hi, yes, at the very least you can use some adaptive threshold method, like OTSU, to find the best parameters. But OTSU has its own parameters so you need to fine tune those too (a little). What worked best for me was first to do a rough normalization of the images (lightness, contrast) and then d

[tesseract-ocr] Accessing Tesseract Library

2019-04-08 Thread Groxen Xypher
I am currently trying to find the file pagesegmain.cpp, which I think should have come with Tesseract whenever I downloaded it. However, whenever I searched it through my entire C: drive, I was unable to find it. How should I go about being able to access this file? Also, how does one edit how

Re: [tesseract-ocr] Making custom traineddata

2019-04-08 Thread Jankees Korstanje
Hi Shree, We have tried your traineddata file for MRZ and noticed that it does not detect the character X. Looking at https://github.com/Shreeshrii/tessdata_ocrb/blob/master/eng.MRZ.training_text We see that there are no X in there. In addition it might be good to add a couple of lines that a

Re: [tesseract-ocr] Making custom traineddata

2019-04-08 Thread Shree Devi Kumar
If you can provide another 40-50 lines of training data (text file) I will rerun the training On Mon, 8 Apr 2019, 22:11 Jankees Korstanje, wrote: > Hi Shree, > > We have tried your traineddata file for MRZ and noticed that it does not > detect the character X. > > Looking at > https://github.co

Re: [tesseract-ocr] How to train tesseract with new script?

2019-04-08 Thread Moni
Thanks a lot for your response I had gone through your page but for brahmi scripts its display error to show the raw data.. kindly help me with this... Thank you for your consideration On Mon 8 Apr, 2019, 10:29 AM Shree Devi Kumar Tesseract 4 LSTM training is done using tesseract, not tensowflow

Re: [tesseract-ocr] How to train tesseract with new script?

2019-04-08 Thread Moni
Hi good morning... Currently I am Phd scholar doing my research in ancient Tamil Inscriptions. Had seen your trained data for bramhi script and working with that but getting an error "Failed to load the language". If possible kindly share your language data. Thanks for your cooperation.. On Mon,

[tesseract-ocr] Using lang files from 3.04 with T4 legacy mode

2019-04-08 Thread estel4ever
Hi everyone! Have been using T3.04 for a while and have created several language files to improve ocr quality for specific pdfs. After moving to T4 overall quality increased with default eng language file, but there is still one pdf type where I get a lot of digits incorrectly (5 is treated is

Re: [tesseract-ocr] How to train tesseract with ancient Greek character

2019-04-08 Thread 易鑫
I have tried,but still can not recognize " Φ ". 易鑫 于2019年4月8日周一 上午9:44写道: > thanks a lot.I will try. > > Shree Devi Kumar 于2019年4月4日周四 下午10:05写道: > >> You don't need to add *"GFS Artemisia" as it may not have the Chinese >> characters.* >> >> Just add Greek character "Φ" to your training tex

Re: [tesseract-ocr] Questions about recognize Chinese characters

2019-04-08 Thread 易鑫
Does some one know the reason? thanks. 易鑫 于2019年4月8日周一 上午10:42写道: > Hello,everyone: > > Good day!I have trained a chi_sim model to recognize the Chinese > characters.You can find the sample image in the attach file. > > I find that the two Chinese characters are a little connected and the