[tesseract-ocr] Re: i got Failed to continue from: data/eng/eng_num_vert.lstm

2024-08-30 Thread Menelik Berhan
" is an integer (fast) model": use the traineddata from https://github.com/tesseract-ocr/tessdata_best repo. On Friday, March 8, 2024 at 5:12:52 PM UTC+3 sct.pyt...@gmail.com wrote: > Warning: LSTMTrainer deserialized an LSTMRecognizer! > Error, data/eng/eng_num_vert.lstm is an integer (fast) m

[tesseract-ocr] Re: Training a new font on windows. Help with exact command.

2024-08-30 Thread Menelik Berhan
You can give MODEL_NAME any value. For specifying the path to data directory use: DATA_DIR Data directory for output files, proto model, start model, etc. Default: data for example: if MODEL_NAME=abc and DATA_DIR=data you need to put the ground truth files (box, gt.txt & tif) in 'data/abc-groun

Re: [tesseract-ocr] Re: Tesseract training ground truth: I'm confused about the box files

2024-09-06 Thread Menelik Berhan
This might be helpful: https://tesseract-ocr.github.io/tessdoc/tess4/Make-Box-Files.html And also some details in: https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#making-box-files On Thursday, September 5, 2024 at 6:41:50 PM UTC+3 Danny wrote: > Hi Zdenko, > Thanks f

[tesseract-ocr] Advice on training for Old Amharic texts

2024-01-13 Thread Menelik Berhan
*Background* I'm trying to use tesseract 5.3.3 on scanned old books written in Amharic (which uses Ethiopic script). *Major Shortcomings of amh.traineddata from tesseract* *Difference in type of Ethiopic script:* there are Ethiopic script characters in old Amharic texts that are not used in the

Re: [tesseract-ocr] Advice on training for Old Amharic texts

2024-01-13 Thread Menelik Berhan
use in the third > countries ( electric blackouts) > > Dear Menilik, we might need to put out hands together on this. > > On Sat, Jan 13, 2024, 11:21 AM Menelik Berhan wrote: > >> *Background* >> I'm trying to use tesseract 5.3.3 on scanned old books written in Amhari

Re: [tesseract-ocr] Advice on training for Old Amharic texts

2024-01-14 Thread Menelik Berhan
gt; perform better. Are u using linux? >> >> On Sat, Jan 13, 2024, 4:08 PM Menelik Berhan >> wrote: >> >>> Thanks for your swift reply. It would be my pleasure to collaborate with >>> you. >>> >>> I've noticed that there is are exten

Re: [tesseract-ocr] Advice on training for Old Amharic texts

2024-01-14 Thread Menelik Berhan
And yes I'm using Ubuntu 20.04 on windows with WSL. On Sun, Jan 14, 2024 at 4:06 PM Menelik Berhan wrote: > Yes I'm In addis. > My pc is not that powerful either. But I could find a couple of good > desktop PCs for the training. > > It would be my pleasure to meet in pe