[tesseract-ocr] how to use user words python

2021-05-10 Thread Lounis Khiat
Hi, I am using Tesseract in my python application, and i would use only my own dictionary. I tried to do it with bazaar explanation but i didn't understand how to do it ?!! I didn't find python commands

Re: [tesseract-ocr] Diagnosing and fixing poor precision on mixed Greek-English text

2021-05-10 Thread Ben Crowell
Here is a version of the text that I typeset using xelatex, with the font DejaVu Serif. It has all the accents, which should make it a good typographical match to the data that tesseract was trained on to make the grc file. [image: tex_output.png] Here is the result: Ἔννεπε declare pot to me, M

Re: [tesseract-ocr] Diagnosing and fixing poor precision on mixed Greek-English text

2021-05-10 Thread Ben Crowell
I compiled tesseract from source, which gave me version 5.0.0-alpha-20210401-102-g4374, and used the latest grc.traineddata file. To get a measure of what's going on, I decided to count the number of Greek words rendered as Greek in the first 7 lines of this text, which contain 22 actual Greek

Re: [tesseract-ocr] Diagnosing and fixing poor precision on mixed Greek-English text

2021-05-10 Thread Merlijn B.W. Wajer
Hi Ben, On 10/05/2021 15:09, Ben Crowell wrote: > Hi Merlijn, > > Thanks very much for your reply. It's encouraging that you were able to get > somewhat better results. However, I'm not able to reproduce them. When I > use -l eng+ell, the results are still very poor: > > 1. Evverre declare wot

Re: [tesseract-ocr] Diagnosing and fixing poor precision on mixed Greek-English text

2021-05-10 Thread Ben Crowell
I tried replacing the grc.traineddata file with the newer version, and the software still ran, but the results were identical. From the comments on git, it looks like the newer version is just optimized for speed. On Monday, May 10, 2021 at 6:09:02 AM UTC-7 Ben Crowell wrote: > Hi Merlijn, > >

[tesseract-ocr] Re: LSTM Training

2021-05-10 Thread piyus...@gmail.com
Use this link: https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html For ref: https://groups.google.com/g/tesseract-ocr/c/7q5pmgJDu_o/m/HysFXnYlAQAJ On Monday, 10 May 2021 at 18:28:27 UTC+5:30 chris.r...@gmail.com wrote: > Hello, > i trained Tesseract for a new font with >

Re: [tesseract-ocr] Diagnosing and fixing poor precision on mixed Greek-English text

2021-05-10 Thread Ben Crowell
Hi Merlijn, Thanks very much for your reply. It's encouraging that you were able to get somewhat better results. However, I'm not able to reproduce them. When I use -l eng+ell, the results are still very poor: 1. Evverre declare wot to me, Movca Muse, avopa the man voAvtpotrov of many fortunes,

[tesseract-ocr] LSTM Training

2021-05-10 Thread Christoph Ruffing
Hello, i trained Tesseract for a new font with - makebox - box.train - mftraining - cntraining How can i train with lstm? I can not find tesstrain.sh -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop rece

Re: [tesseract-ocr] Diagnosing and fixing poor precision on mixed Greek-English text

2021-05-10 Thread Merlijn B.W. Wajer
Hi Ben, On 09/05/2021 21:33, Ben Crowell wrote: > I'm trying to OCR a book that is written in interspersed Greek and English: > > https://archive.org/details/odysseyofhomerco01gile/page/n5/mode/2up > > Here is a sample of text from the first page: > > [image: a.jpg] > > I'm running tesseract 4