Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-10-27 Thread Sorosh Shiwa
hello thanks a lot for information but how can i use it in flutter? please reply my question sorosh shiwa On Tue, Oct 27, 2020 at 2:36 PM write2...@gmail.com wrote: > not able to extract this. can anyone able to extract this? > > On Thursday, August 13, 2020 at 3:31:19 PM UTC+3 Mahmoud Mabrouk w

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-10-27 Thread write2...@gmail.com
not able to extract this. can anyone able to extract this? On Thursday, August 13, 2020 at 3:31:19 PM UTC+3 Mahmoud Mabrouk wrote: > for numbers i used this and works fine with AEN numbers > https://github.com/ahmed-tea/tessdata_Arabic_Numbers > > > On Thursday, 13 August 2020 13:41:12 UTC+2, An

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-08-19 Thread Anuradha B
Thanks Mahmoud...DO we have to just copy the ara_number.traineddata file from https://github.com/ahmed-tea/tessdata_Arabic_Numbers to the tessdata folder in the local system.I am using Google colab Jup

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-08-13 Thread Mahmoud Mabrouk
for numbers i used this and works fine with AEN numbers https://github.com/ahmed-tea/tessdata_Arabic_Numbers On Thursday, 13 August 2020 13:41:12 UTC+2, Anuradha B wrote: > > I am trying to extract the arabic dates and numbers from the national ID > card.I am using the following code in Anaconda

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-14 Thread Shree Devi Kumar
@Eliyaz I do not know Arabic or any other RTL. I suggest you try running training with the latest code and tesstrain. You may have to experiment to get the best result. I will try to do a test run with the data you provided, does it include numbers and dates? On Tue, Jul 14, 2020, 13:18 Eliyaz L

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-14 Thread Eliyaz L
Hi sorry to bother, just a follow up. i tried the latest tesseract its working fine with the arabic text and numbers but the only issue is with arabic date, so if the issue is still open, can i prepare dataset and train a separate custom model for only numbers and date. if possible then pls hel

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-13 Thread Eliyaz L
Thanks for the support, it saves lot of time and efforts. i tried the latest tesseract its working fine with the arabic text and numbers but the only issue is with arabic date, so if the issue is still open, can i prepare dataset and train a separate custom model for only numbers and date. if p

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
If I recall correctly, ara_number.traineddata has been trained for legacy engine. You cannot use two traineddata files each using a different engine. Regarding training of Arabic numbers and punctuation, it is currently an open issue. If you use the latest code from tesstrain repo it should automa

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Eliyaz L
Hi Shree, i was using thie below version. I guess you are right its 2016 file. Let me test with latest traineddata. https://tesseract-ocr.github.io/tessdoc/Data-Files https://github.com/tesseract-ocr/tessdata/raw/4.00/ara.traineddata Meanwhile can u pls help me with arabic number. i tried ara_

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
See https://github.com/tesseract-ocr/tesseract/issues/758 and other similar issues On Sun, Jul 12, 2020 at 6:52 PM Shree Devi Kumar wrote: > @Eliyaz What version of tesseract are you using? Which traineddata? > > >Always the letter "لا" is predicted as "ال" . > > I think this was fixed by Ray Sm

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
@Eliyaz What version of tesseract are you using? Which traineddata? >Always the letter "لا" is predicted as "ال" . I think this was fixed by Ray Smiith in 2017 and should be ok in the traineddata files in tessdata_fast and tessdata_best repos. On Sun, Jul 12, 2020 at 6:45 PM Rainer Verteidiger <

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Rainer Verteidiger
Always the letter "لا" is predicted as "ال" . Not sure how much relevancy that bears in the context of training models, but لا is no letter! It's a ligature ("Arabic Ligature Lam with Alef") formed by combining ل ("Arabic Letter Lam") with ا ("Arabic Letter Alef") whereas ال is ا followed by

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Eliyaz L
Always the letter "لا" is predicted as "ال" . My training data here My prediction document will be in Traditional Arabic font here . Below shell command u

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
What character are you trying to add? Please share the training data to try and replicate the issue. On Sun, Jul 12, 2020, 15:35 Eliyaz L wrote: > Hi, > > > My use case is on Arabic document, the pre retrained ara.traineddata are > good but not perfect. so i wish to fine tune ara.traineddata, i

[tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Eliyaz L
Hi, My use case is on Arabic document, the pre retrained ara.traineddata are good but not perfect. so i wish to fine tune ara.traineddata, if the results are not satisfying then have train my own custom data. please suggest me for the following: 1. for my use case in Arabic text, proble