[tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-14 Thread reza
hi i tested tesseract 4 beta on persian lang , the results was good. but i think needs more training on more fonts and texts. how could we train more fonts and texts on model that exist in tesseract 4 beta for persian lang ? and last question is, how could we apply dictionary to correct that wor

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-14 Thread reza
or-impact > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Mon, May 14, 2018 at 1:52 PM, reza > > wrote: > >> hi >> i tested tesseract 4 beta on persian lang , the results was g

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-15 Thread reza
i used this attached finetune.sh file ... but that raised error. could u help me ? thanks > ## MAKING TRAINING DATA ## > > >> === Starting training for language 'eng' > > [Tue, May 15, 2018 11:42:36 AM] /c/Program Files >> (x86)/Tesseract-OCR/text2image --fonts_dir=C:WindowsFonts --fo

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-15 Thread reza
dable > > which version of icu library? > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, May 15, 2018 at 1:00 PM, reza > > wrote: > >> i used this attached

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-15 Thread reza
thanks for reply tesseract 4 beta windows 10 On Tuesday, May 15, 2018 at 1:12:20 PM UTC+4:30, shree wrote: > > What o/s are you running it on? > > Which version of tesseract? > > > ICU ERROR: U_FILE_ACCESS_ERRORERROR: /tmp/tmp.6m4B2TUln1/eng/eng.unicharset > does not exist or is not readable >

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-15 Thread reza
i test it on ubuntu , that raised error too. could u help me and send me a new bash file for fine tuning with new fonts ? i put "eng.traineddata" fil in tessdata_best folder and "eng.training_text" and "eng.traineddata" in langdata\eng is it true and sufficient ? or need more file ? thanks

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-15 Thread reza
;Arial Unicode MS' \ > 'B Nazanin' \ > 'B Nazanin Bold' \ > 'Calibri' \ > 'Courier New' \ > 'Microsoft Sans Serif' \ > 'Scheherazade' \ > 'Tahoma' \ > 'Times New Roman,' \ >

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-18 Thread reza
hi ShreeDevi Thanks. I tested the 2 models that you have provided. The accuracy on samples without noise were about 98% but on scanned samples or captured images, were about 80%. but still it didn't work on different fonts. Could u send all files that needed for training models? I want fine tun

Re: [tesseract-ocr] train more fonts on trained model fas in tesseract

2018-05-19 Thread reza
thanks for your reply. i will test these as soon as possible. one of the weakness of tesseract is when we want ocr multiple languages. for example, if we have an image with persian and english text, the tesseract can't recogize those as well as we have a single language. Do you have any soluti

Re: [tesseract-ocr] missing a line in OCR persian

2018-05-21 Thread reza
g when psm = 3, 6, 11 > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > 2018-05-22 10:45 GMT+05:30 reza >: > >> i used tesseract 4 beta for OCR. but the results had some missing words

[tesseract-ocr] Tesseract 4 beta (persian) didn't recognize punctuations

2018-05-21 Thread reza
hi shree I test tesseract 4 beta in persian, i think this version didn't support punctuation in image ? is this true ? and when a english word is between persian words, tesseract couldn't recognize this english word !!! why ? (i use commad -l fas+eng) thanks -- You received this message becau

[tesseract-ocr] Re: Tesseract 4 beta (persian) didn't recognize punctuations

2018-05-21 Thread reza
i attached a sample and its results. > نویسه‌خوان نوری!" که با سرواژه‌ی */:)0) شناخته می‌شود. عبارت است از تشخیص >> (۳6600۲۱]0070) خودکار متون

making persian tarineddata dosen't support rtl

2012-10-18 Thread Reza M
ke my data like Arabic with cub mode? or like Herby that works correctly for RTL? there is many languages that they are RTL would you please tell us how did you made Arabic file? yours, reza -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" gr

Re: making persian tarineddata dosen't support rtl

2012-10-29 Thread Reza M
running this code you should installed python on your PC your, Reza On Thursday, October 18, 2012 11:38:39 PM UTC+2, Reza M wrote: > > Hi, > I made a simple traineddata for Persian it recognized characters but it > changes words directions for example instead of رضا it writes اضر &g

Making huge box in fast way with BoxMaker (New tool)

2012-10-31 Thread Reza M
add you font *That's it!* You can find this tool here<https://github.com/reza1615/PersianOcr/blob/master/BoxMaker-en.zip> p.s.*To Admins*: is it possible to add this tool to wiki's training part? or add in part? your, Reza -- You received this message because you are

Re: Making huge box in fast way with BoxMaker (New tool)

2012-11-01 Thread Reza M
com/reza1615/PersianOcr/blob/master/BoxMaker-en.zip> On Wednesday, October 31, 2012 11:29:46 PM UTC+1, zdenop wrote: > > On Wed, Oct 31, 2012 at 10:42 AM, Reza M > > wrote: > >> Hi, >> In PersianOcr project <https://github.com/reza1615/PersianOcr> we >> devel

Re: Making huge box in fast way with BoxMaker (New tool)

2012-11-01 Thread Reza M
I uploaded it in download part [1] [1] https://github.com/reza1615/PersianOcr/downloads On Thursday, November 1, 2012 9:56:31 AM UTC+1, zdenop wrote: > > On Thu, Nov 1, 2012 at 1:18 AM, Reza M >wrote: > >> Excuse me I changed some part of project and I forgot to re-u

BoxMaker Local Server Version

2012-11-03 Thread Reza M
and pair image in few minutes with this version that is around 90K words! [1] http://nodejs.org/ [2] https://github.com/downloads/reza1615/reza1615.github.com/BoxMaker-Local%20Server.zip Yours, Reza -- You received this message because you are subscribed to the Google Groups "tesseract-o

can we use different traning text for fonts

2012-11-03 Thread Reza M
effect on Arial's results? yours, Reza -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@

have problem with unicharambigs for connected characters

2012-11-03 Thread Reza M
Hi, At first excuse because of text editor's problems with mix of rtl and ltr I had to use screen shot! I want to make per.unicharambigs file but I am confused which solution is correct? In attached image *1-case1:* what should i do? i shod define connected characters as 1 unite or counting th

[tesseract-ocr] use two language in tesseract

2017-09-02 Thread Reza Naddaf
hello . I use two languages in tesseract .If one or two words of the text are in English among the Persian words, the tesseract can not convert it ?How to improve this