[tesseract-ocr] Re: Training help

2019-06-12 Thread ElGato ElMago
I guess you ran tesstrain.sh and had a problem. I had a problem there, too, but it seems different. Anyway, I got away with it by a work of a guy on this board. This one does the same thing as the tutorial but without an error. https://github.com/Shreeshrii/tess4training

[tesseract-ocr] what are script models, e.g. Cyrillic.traineddata, exactly?

2019-06-12 Thread Nikolai Krot
Hi guys, What are script models, e.g. Cyrillic.traineddata (https://github.com/tesseract-ocr/tessdata_best/blob/master/script/Cyrillic.traineddata)? As per the file:https://github.com/tesseract-ocr/langdata_lstm/blob/master/script/Cyrillic.langs.txt, it is a collection of several languages, na

[tesseract-ocr] Re: Retrain Tesseract 4.0.0 beta to recognise handwritten digits

2019-06-12 Thread hrishikesh kaulwar
can we use jTessBoxEditor for working with tesseract 4.0? I have read we can not use it for tesseract 4.0. On Tuesday, July 17, 2018 at 8:43:29 PM UTC+5:30, Ramakant Kushwaha wrote: > > *Hi,* > > *Recently I trying to retrain Tesseract 4.0 for recognising handwritten > digits. I am following o

[tesseract-ocr] Re: I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2019-06-12 Thread Justin Minnaar
Hi Vipin Did you ever come right with Tesseract 4 under C#? I've been using version 3 but would like to move to version 4. Justin On Tuesday, September 25, 2018 at 7:03:19 PM UTC+2, Vipin Tom Varghese wrote: > > Hi James, my apologies to hit you up so randomly, but I had no ther > options le

Re: [tesseract-ocr] Trained data for E13B font

2019-06-12 Thread Shree Devi Kumar
You will get output of A B C D for the MICR symbols. If it works well otherwise, I will update it to generate the Unicode text for the symbols. Trained using font "MICR Encoding" On Wed, Jun 12, 2019 at 9:53 PM Shree Devi Kumar wrote: > Please test the attached file. It is trained in legacy for

[tesseract-ocr] Extract VGSL net_spec from traindata

2019-06-12 Thread Chae Clark
Is there a way to extract the model string (e.g. [1,0,0,3 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]) from a .traindata file? Thanks, --Chae -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop recei

[tesseract-ocr] Tesseract source directory: ./configure

2019-06-12 Thread Sivan Langer
where is it exactly in the mac os and also in ubuntu I find the documentation unclear on this issue and can't find where to compile the training program -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop r

[tesseract-ocr] OCR Field level extraction

2019-06-12 Thread satyanand Gatla
Hi Guys, I need a help regarding extraction of data from PDF/IMAGE files using tesseract ocr or any other packages. I need to perform auto indexing such that field level extraction happens automatically followed by users review and confirm. I have seen dispatcher patterns like zonal ocr in CAPTI

[tesseract-ocr] Re: dealing with image with text of separate columns

2019-06-12 Thread Jingjing Lin
Further question, will training help for images like this? 在 2019年6月11日星期二 UTC-4下午12:09:28,Jingjing Lin写道: > > I'm wondering, what are the parameters to tune to get better result for > image with text of several columns, example as attached. > > Basically I would like to have separate columns sep

[tesseract-ocr] how to make box file for tesseract

2019-06-12 Thread Jingjing Lin
I'm very confused about how to prepare text for further training tesseract. I don't think the page below gives any useful information about this: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#making-box-files Shouldn't there be a process where we input the correct text? Or

[tesseract-ocr] Re: Tesseract source directory: ./configure

2019-06-12 Thread Sivan Langer
So the source had to be downloaded from git. I did not see any mention for that. That works on the ubuntu. On Wednesday, June 12, 2019 at 9:05:19 PM UTC+3, Sivan Langer wrote: > > where is it exactly in the mac os and also in ubuntu I find the > documentation unclear on this issue and can't fi

[tesseract-ocr] Re: how to make box file for tesseract

2019-06-12 Thread Mox Betex
You don't have to manually create .box files. Use OCR-D for training https://github.com/OCR-D/ocrd-train In data/ground-truth folder you put tif/gt.txt files and when you run make training it will generate box files. For every tif image you write correct text in gt.txt file, nothing else. Look a

[tesseract-ocr] OCR-D training sample

2019-06-12 Thread Mox Betex
I tried to make training in OCR-D using samples of german text from OCR-D repository. I put only 20 tif/txt files in ground-truth folder and run make training with 1000 iterations. I used one tif from ground-truth folder to test tesseract with trained data, and it didn't recognize any characte

Re: [tesseract-ocr] Tesseract source directory: ./configure

2019-06-12 Thread Zdenko Podobny
Which documentation you read about building tesseract from source that is not clear??? Zdenko st 12. 6. 2019 o 20:05 Sivan Langer napísal(a): > where is it exactly in the mac os and also in ubuntu I find the > documentation unclear on this issue and can't find where to compile the > training p