[tesseract-ocr] MRZ/MRP (Machine-readable zone/passport) dataset for tesseract v4

2019-05-26 Thread Mamadou
Hello, We have open sourced (BSD license) MRZ/MRP (Machine-readable zone/passport) dataset and models for Tesseract v4. The dataset contains more than #7 thousands images (.tif) with ground truth (.gt.txt) from Google image augmented with few synthetic data. It's ready to be used to train with T

Re: [tesseract-ocr] MRZ/MRP (Machine-readable zone/passport) dataset for tesseract v4

2019-05-29 Thread Mamadou
n(detector, recognizer). These #1376 images can't be directly used with tesseract and requires a detector and preprocessor. On Wednesday, May 29, 2019 at 10:08:53 AM UTC+2, Lorenzo Blz wrote: > > Hi Mamadou, > this sounds very interesting. How did you do the training and accuracy > measur

[tesseract-ocr] Open source (BSD) MICR dataset for Tesseract v4 + evaluation app

2019-09-16 Thread Mamadou
Hello, We've open sourced (BSD 3-Clause License) our MICR dataset and *.traineddata for Tesseract v4. This was developed as an internal R&D project and never went to production as we ended using Tensorflow. Even as a PoC it's already more accurate than many commercial products. The repo conta

Re: [tesseract-ocr] Trained data for E13B font

2019-09-16 Thread Mamadou
I don't know if it's appropriate or not. Please tell me if > it's not. > > 2019年8月9日金曜日 16時17分41秒 UTC+9 Mamadou: >> >> >> >> On Friday, August 9, 2019 at 7:31:03 AM UTC+2, ElGato ElMago wrote: >>> >>> Here's my sharing on Git

Re: [tesseract-ocr] Trained data for E13B font

2019-08-06 Thread 'Mamadou' via tesseract-ocr
Hello, Are you planning to release the dataset or models? I'm working on the same subject and planning to share both under BSD terms On Tuesday, August 6, 2019 at 10:11:40 AM UTC+2, ElGato ElMago wrote: > > Hi, > > FWIW, I got to the point where I can feel happy with the accuracy. As the > images

Re: [tesseract-ocr] Trained data for E13B font

2019-08-07 Thread 'Mamadou' via tesseract-ocr
f Shree's text and mine. The > instructions and tools I used already exist. > If you have a Github account just create a repo and publish the data and instructions. > > ElMagoElGato > > 2019年8月7日水曜日 8時20分02秒 UTC+9 Mamadou: > >> Hello, >> Are you planning

Re: [tesseract-ocr] Problems with training tesseract

2019-08-07 Thread 'Mamadou' via tesseract-ocr
On Wednesday, August 7, 2019 at 4:10:44 PM UTC+2, Cristobal Jesus Muñoz Solano wrote: > > hello, I have already tried mrz.trainneddata yes quite good, but it is not > accurate. How can I do it to improve it? Is it possible to use box / png > files to improve its accuracy ?. > mrz.trainneddata

Re: [tesseract-ocr] Trained data for E13B font

2019-08-09 Thread 'Mamadou' via tesseract-ocr
e bit. >> Will be out there soon. >> >> 2019年8月7日水曜日 21時11分01秒 UTC+9 Mamadou: >>> >>> >>> >>> On Wednesday, August 7, 2019 at 2:36:52 AM UTC+2, ElGato ElMago wrote: >>>> >>>> HI, >>>> >>>> I'

Re: [tesseract-ocr] Trained data for E13B font

2019-08-09 Thread 'Mamadou' via tesseract-ocr
share our dataset (real life samples) in the coming days. > > 2019年8月9日金曜日 16時17分41秒 UTC+9 Mamadou: >> >> >> >> On Friday, August 9, 2019 at 7:31:03 AM UTC+2, ElGato ElMago wrote: >>> >>> Here's my sharing on GitHub. Hope it's of any use

[tesseract-ocr] Re: traindata for cmc7 font

2020-03-12 Thread 'Mamadou' via tesseract-ocr
You can open a ticket on our issue tracker ( https://github.com/DoubangoTelecom/tesseractMICR/issues) and will add to the roadmap for the coming days On Thursday, March 12, 2020 at 10:16:54 AM UTC+1, haytham Arori wrote: > > hi ti all > > I want to know if anyone has the .train data file for CMC

[tesseract-ocr] Re: cmc7.traineddata

2020-04-03 Thread 'Mamadou' via tesseract-ocr
The easiest way to train MICR CMC-7 font for Tesseract would be using OCR-D (https://github.com/OCR-D/ocrd-train). This is what we've used in our R&D project (https://github.com/DoubangoTelecom/tesseractMICR). We open sourced the MICR E-13B traineddata but not the CMC-7. We're not using these mo

Re: [tesseract-ocr] Re: cmc7.traineddata

2020-04-04 Thread 'Mamadou' via tesseract-ocr
les you're attaching won't help. You need thousands of samples for training. In our case we have 17k samples to train tensorflow. Try web scraping to collect real life samples instead of using synthetic data. On Friday, April 3, 2020 at 7:11:01 PM UTC+2, Ghada Aruri wrote: > > hi

[tesseract-ocr] Re: cmc7.traineddata

2020-04-04 Thread 'Mamadou' via tesseract-ocr
an online webapp to check the accuracy at https://www.doubango.org/webapps/micr/ On Saturday, April 4, 2020 at 11:59:34 AM UTC+2, Essam Zaky wrote: > > Hi @mamadou > > how did you collected the 17000 image are they real images , > also which type of Tensorfolw models you use