> first of all in some words in tiff files the characters are not joined.
Make sure to include ZWNJ and ZWJ in your unicharset. > box file generated is from left to right but it should be RTL According to Ray that is intentional. > is using lstmtraining.exe the next and final step Yes. tesstrain.sh process only creates a 'starter traineddata' (unlike for tesseract3). On Mon, Jul 16, 2018 at 2:12 PM Hosein Khoshdel <hoskhosh...@gmail.com> wrote: > hi before asking my question i want to thank shree whose comments are very > helpful both here and in github repo of tesseract. > > i want to fine tune fas.traineddata to support some new fonts. the first > problem arises when i use the following command: > > tesstrain.sh --fonts_dir /c/folder/fonts/ --lang fas > --noextract_font_properties --linedata_only --exposures "0" --langdata_dir > ../langdata --tessdata_dir ../tessdata --fontlist "b nazanin" --output_dir > ../../tessdata/fas/ > > > <https://lh3.googleusercontent.com/-DlJrj5VB7tA/W0xEc9rxSzI/AAAAAAAAA-o/FOnTQLsVsFEpyjH4A9Kj7x_Chg87rMV7gCLcBGAs/s1600/nonjoined.PNG>i > put fas.traineddata, which i downloaded tessdata_best repo, in ../tessdata > folder, but it gives error and says that it can not find eng.traineddata. > this problem is resolved when i put eng.traineddata in ../tessdata but why > should it want eng when i specify that lang is fas? > > anyway for now i pasted eng,traineddata and moved on. the second problem > is with tiff/box pair generated with the above command. first of all in > some words in tiff files the characters are not joined.for example there is: > > > <https://lh3.googleusercontent.com/-kAn4-R5qPK4/W0xFXtQjb9I/AAAAAAAAA-4/8JQOGnHea5AsEHGBauE8Q90N1G9BikcIwCLcBGAs/s1600/joined.PNG>but > it should be > > another problem is that the box file generated is from left to right but > it should be RTL. this problem is addressed here > <https://github.com/tesseract-ocr/tesseract/issues/648> but i did not > understand if there is a solution for it or not. > > lastly i am confused with the fine tuning process. is tesstrain.sh only > for generating tiff/box pairs? what are the next steps. is using > lstmtraining.exe the next and final step? > > btw i'm using: > > tesseract 4.0.0-beta.3 > leptonica-1.76.0 (Jul 10 2018, 21:36:38) [MSC v.1900 LIB Debug x64] > libgif 5.1.4 : libjpeg 9b : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 > : libwebp 0.6.1 : libopenjp2 2.3.0 > Found AVX > Found SSE > > which i built with vs2015 also i'm using win 8.1 > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/6f28256d-f2d4-4d13-a439-751465ec97dd%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/6f28256d-f2d4-4d13-a439-751465ec97dd%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV0%2BSBqUgpPJ-B3KSXP1yTCR-0W_0QVd6w-t7cUqT8-5g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.