[tesseract-ocr] simplified arabic .trained data

2019-07-24 Thread Mhd Fahri Sujarwadi
can someone give me a file ".traineddata" from simplified arabic data? please thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@

[tesseract-ocr] Support for New Reiwa Era Character ㋿ in Japanese

2019-07-24 Thread Prateek Mehta
There's a new character introduced ㋿ (U+32FF). Support for this character is required. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr..

[tesseract-ocr] Tesseract 4.0 LSTM comparing with other OCR engines

2019-07-24 Thread prasad nemmikanti
Recently I have started comparing OCR results of Tesseract 4.0 LSTM with other OCR engines like Kadmos, Google Vision and Amazon's Textract: Taken two types of production images(scanned) normal quality and poor quality. Ignored good quality images as all OCR engines are giving almost same resul

Re: [tesseract-ocr] Tesseract 4.0 LSTM comparing with other OCR engines

2019-07-24 Thread Lorenzo Bolzani
Hi Prasad, please post a few samples of normal and poor quality images and details of any preprocessing you did on these images before calling the OCR, if any. Bye Lorenzo Il giorno mer 24 lug 2019 alle ore 13:09 prasad nemmikanti < prasadn...@gmail.com> ha scritto: > Recently I have started

[tesseract-ocr] Trained data for OCRB font

2019-07-24 Thread prasad nemmikanti
Hi, Anyone has latest trained data for OCRB font which is used to extract MRZ data from passport and national ID cards images. Thanks Prasad -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving e

Re: [tesseract-ocr] Tesseract 4.0 LSTM comparing with other OCR engines

2019-07-24 Thread prasad nemmikanti
Hi Lorenzo, Sorry! I can't share sample images as these are client images. No preprocessing is done original image is given as input and raw text is taken as output. Regards Prasad On Wednesday, July 24, 2019 at 4:45:41 PM UTC+5:30, Lorenzo Blz wrote: > > Hi Prasad, > please post a few samples

Re: [tesseract-ocr] Tesseract 4.0 LSTM comparing with other OCR engines

2019-07-24 Thread Lorenzo Bolzani
If you cannot share not even a few words, lines or fragments I thinks there is not much to tell other than this: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality Google vision does a lot of complex pre-processing, tesseract does none, it has to be done by the user. I suggest to man

Re: [tesseract-ocr] Tesseract 4.0 LSTM comparing with other OCR engines

2019-07-24 Thread Zdenko Podobny
Read the forum. Don't be lazy - your are doing it for money - we don't. All topics where discused. Zdenko st 24. 7. 2019 o 14:08 prasad nemmikanti napísal(a): > Hi Lorenzo, > > Thanks for your response. > Attached 3 sample fragments. > I have also analyzed few failure cases and main reasons a

Re: [tesseract-ocr] Trained data for OCRB font

2019-07-24 Thread 'Bossiel' via tesseract-ocr
https://github.com/DoubangoTelecom/tesseractMRZ Sent from my iPhone > On Jul 24, 2019, at 13:17, prasad nemmikanti wrote: > > Hi, > > Anyone has latest trained data for OCRB font which is used to extract MRZ > data from passport and national ID cards images. > > > Thanks > Prasad > -- > Yo

[tesseract-ocr] Re: Support for New Reiwa Era Character ㋿ in Japanese

2019-07-24 Thread ElGato ElMago
You can train it. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract 2019年7月24日水曜日 19時49分13秒 UTC+9 Prateek Mehta: > There's a new character introduced ㋿ (U+32FF). Support for this character > is required. > -- You received this message because you are subscribed to the Google G

[tesseract-ocr] Tesseract with GPU

2019-07-24 Thread Purushotham Rao Eravalli
Does the response time gets reduced if we run tesseract on GPUs, If so can you share best process available as of now -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an ema

[tesseract-ocr] Re: GPU for Tesseract

2019-07-24 Thread Purushotham Rao Eravalli
Does response time of tesseract decreases if we run it on GPU? On Friday, June 28, 2019 at 11:24:07 AM UTC+5:30, Pooja Kamra wrote: > > On Tesseract site, it is mentioned that no GPU is needed (No support). > What does this statement means? > If i have a machine with GPU, does it improve training

[tesseract-ocr] two similar picture,one get correct result,the other gets only one char,why?

2019-07-24 Thread Chen Yufu
I use Tesseract command line to get OCR result,like this " tesseract "d:\1.jpg" "d:\out" -l chi_sim+eng --psm 7",1.jpg gets correct result "第 26-61 147151706 期",2.jpg gets only "|". They look similar to me,why the result is so different? What step can i do to get the correct result? -- You r

[tesseract-ocr] OCR directory pages with different indentation levels accurately

2019-07-24 Thread Dilcia Mercedes
Hi everyone, I am currently an intern for a news outlet and we are trying to OCR pages of a directory. I am reaching out because we've hit a couple of issues along the way. We have a few questions to identify if tesseract is the tool we should be using, and if so, you could help us understand

Re: [tesseract-ocr] Trained data for OCRB font

2019-07-24 Thread prasad nemmikanti
Thank you! On Wednesday, July 24, 2019 at 6:34:32 PM UTC+5:30, Mamadou wrote: > > https://github.com/DoubangoTelecom/tesseractMRZ > > Sent from my iPhone > > On Jul 24, 2019, at 13:17, prasad nemmikanti > wrote: > > Hi, > > Anyone has latest trained data for OCRB font which is used to extract MRZ

Re: [tesseract-ocr] Re: Support for New Reiwa Era Character ㋿ in Japanese

2019-07-24 Thread Prateek Mehta
So everywhere I can see examples and process to train it on new fonts, but what about the new characters? The characters which even current tesseract models doesn't know they exist. On Thu, Jul 25, 2019 at 6:01 AM ElGato ElMago wrote: > You can train it. > > https://github.com/tesseract-ocr/tess

Re: [tesseract-ocr] GPU for Tesseract

2019-07-24 Thread Zdenko Podobny
Well, this is not really true: - in age of tesseract version 3.x AMD sent some patches for OpenCL support[1]. Their are still present, but not maintained (search issue tracker for know problem) - AFAIR they affect only tif opening and some tif preprocessing without effect on OCR pro

Re: [tesseract-ocr] Re: Support for New Reiwa Era Character ㋿ in Japanese

2019-07-24 Thread Zdenko Podobny
Really??? https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters Zdenko št 25. 7. 2019 o 8:03 Prateek Mehta napísal(a): > So everywhere I can see examples and process to train it on new fonts, but > what about the new characters? The character

[tesseract-ocr] Re: GPU for Tesseract

2019-07-24 Thread Pooja Kamra
If i am doing tesseact training in a system without GPU and training on another machine with GPU. Will it make any difference. On Friday, July 19, 2019 at 7:50:57 PM UTC+5:30, Arno Loo wrote: > > If I understand correctly, you can use Tesseract OCR on GPU for speeding > up the process but not Te

Re: [tesseract-ocr] Training stops before specified iterations

2019-07-24 Thread Pooja Kamra
In file there is triplet 4643/15/15012. What does it mean? On Friday, July 19, 2019 at 7:47:57 PM UTC+5:30, Arno Loo wrote: > > I was confused about the triple iteration number too... > https://groups.google.com/d/msg/tesseract-ocr/hni4owhU3vs/ankF3gSrAwAJ > > > > Le vendredi 19 juillet 2019 1

[tesseract-ocr] Use Tesseract dll with c project

2019-07-24 Thread Pooja Kamra
Hi, I want to use libtesseract.dll in C project. In tesseract source file there is a header file capi.h. How can i use these functions in c exe project. Please suggest. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this g