> first of all in some words in tiff files the characters are not joined.
Make sure to include ZWNJ and ZWJ in your unicharset.

>  box file generated is from left to right but it should be RTL

According to Ray that is intentional.

>  is using lstmtraining.exe the next and final step

Yes. tesstrain.sh process only creates a 'starter traineddata' (unlike for
tesseract3).

On Mon, Jul 16, 2018 at 2:12 PM Hosein Khoshdel <hoskhosh...@gmail.com>
wrote:

> hi before asking my question i want to thank shree whose comments are very
> helpful both here and in github repo of tesseract.
>
> i want to fine tune fas.traineddata to support some new fonts. the first
> problem arises when i use the following command:
>
> tesstrain.sh --fonts_dir /c/folder/fonts/ --lang fas
> --noextract_font_properties --linedata_only --exposures "0" --langdata_dir
> ../langdata --tessdata_dir ../tessdata --fontlist "b nazanin" --output_dir
> ../../tessdata/fas/
>
>
> <https://lh3.googleusercontent.com/-DlJrj5VB7tA/W0xEc9rxSzI/AAAAAAAAA-o/FOnTQLsVsFEpyjH4A9Kj7x_Chg87rMV7gCLcBGAs/s1600/nonjoined.PNG>i
> put fas.traineddata, which i downloaded tessdata_best repo, in ../tessdata
> folder, but it gives error and says that it can not find eng.traineddata.
> this problem is resolved when i put eng.traineddata in ../tessdata but why
> should it want eng when i specify that lang is fas?
>
> anyway for now i pasted eng,traineddata and moved on. the second problem
> is with tiff/box pair generated with the above command. first of all in
> some words in tiff files the characters are not joined.for example there is:
>
>
> <https://lh3.googleusercontent.com/-kAn4-R5qPK4/W0xFXtQjb9I/AAAAAAAAA-4/8JQOGnHea5AsEHGBauE8Q90N1G9BikcIwCLcBGAs/s1600/joined.PNG>but
> it should be
>
> another problem is that the box file generated is from left to right but
> it should be RTL. this problem is addressed here
> <https://github.com/tesseract-ocr/tesseract/issues/648> but i did not
> understand if there is a solution for it or not.
>
> lastly i am confused with the fine tuning process. is tesstrain.sh only
> for generating tiff/box pairs? what are the next steps. is using
> lstmtraining.exe the next and final step?
>
> btw i'm using:
>
> tesseract 4.0.0-beta.3
>  leptonica-1.76.0 (Jul 10 2018, 21:36:38) [MSC v.1900 LIB Debug x64]
>   libgif 5.1.4 : libjpeg 9b : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
> : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX
>  Found SSE
>
> which i built with vs2015 also i'm using win 8.1
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/6f28256d-f2d4-4d13-a439-751465ec97dd%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/6f28256d-f2d4-4d13-a439-751465ec97dd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV0%2BSBqUgpPJ-B3KSXP1yTCR-0W_0QVd6w-t7cUqT8-5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to