[tesseract-ocr] Bounding box

2019-06-09 Thread Jennil Thiyam
ই 110 4657 137 4701 0 ম্ফা 131 4660 191 4693 0 ল 185 4660 217 4689 0 , 217 4654 226 4667 0 226 4650 240 4689 0 জু 240 4650 277 4689 0 ন 269 4660 298 4689 0 298 4660 316 4689 0 ১ 316 4660 332 4689 0 ৩ঃ 334 4661 376 4688 0 376 4655 394 4701 0 হৌ 394 4655 441 4701 0 জি 436 4660 482 4701 0 ক 477

Re: [tesseract-ocr] Bounding box

2019-06-09 Thread Lorenzo Bolzani
I think you are talking about preparing the training data. With tesseract 4.x you do not need to define the boxed for each chartacter just one big box for the whole line. Bye Lorenzo Il giorno dom 9 giu 2019 alle ore 10:50 Jennil Thiyam < thiyamjen...@gmail.com> ha scritto: > ই 110 4657 137 47

Re: [tesseract-ocr] Bounding box

2019-06-09 Thread Jennil Thiyam
After running tesstrain.sh for creating starter train data we got .box file, right?? in that file we got the coordinate of each unit (which is exactly the bounding box of each unit). can you please elaborate about that file, can you please send me the link about "no need" of bounding boxes of every

Re: [tesseract-ocr] Bounding box

2019-06-09 Thread Lorenzo Bolzani
I do not use tesstrain.sh for training, but I assume it does the right thing, so if there is a little overlap it is likely not to be a problem. Reading many messages on this mailing list I've never seen this as an issue. I use ocrd-train and it generates boxes for the whole line, not for individua

[tesseract-ocr] Failed to load script unicharset

2019-06-09 Thread Mox Betex
I am trying to build training data for Tesseract 4.00 When I execute this command: combine_lang_model --input_unicharset data/unicharset --script_dir data/tessdata --output_dir data/output --pass_through_recoder --lang MyModel I get error "Failed to load script unicharset from:data/ tessdata/

Re: [tesseract-ocr] Trained data for E13B font

2019-06-09 Thread ElGato ElMago
That'll be nice if there's traineddata out there but I didn't find any. I see free fonts and commercial OCR software but not traineddata. Tessdata repository obviously doesn't have one, either. 2019年6月8日土曜日 1時52分10秒 UTC+9 shree: > > Please also search for existing MICR traineddata files. > > O