You can use find_fonts with your training_text to locate the fonts to use.

Modify the following command to match your directory setup and try

echo "###### FIND FONTS ######"
# Find fonts which can render your training_text. Run `fc-cache -vf` to
refresh cache.
# You can change the minimum coverage % as needed.
# This process can take a while if you have a number of installed fonts.
# Review the generated fontlist and modify, if needed.
# 2000 fonts found. Use a smaller set

nice text2image --find_fonts \
--fonts_dir $fonts_dir \
--text $langdata_dir/$Lang/$Lang.training_text \
--min_coverage 0.999  \
--render_per_font=false \
--outputbase $langdata_dir/$Lang/$Lang \
|& grep raw \
 | sed -e 's/ :.*/@ \\/g' \
 | sed -e "s/^/ '/" \
 | sed -e "s/@/'/g" > $langdata_dir/$Lang/$Lang.fontslist.txt

On Mon, Jul 2, 2018 at 12:06 PM ran go <irrang...@gmail.com> wrote:

> in my opinion error is for font-type, for some font there is no error but
> for some other fonts there is error
>
> On Mon, Jul 2, 2018 at 9:15 AM, john <irrang...@gmail.com> wrote:
>
>> I use tesseract 4.0.0-beta.1. downloaded from this link (UB mannheim)
>> <https://github.com/UB-Mannheim/tesseract/tree/v4.0.0-beta.1.20180414>
>>
>> On Saturday, June 30, 2018 at 7:13:30 PM UTC+4:30, shree wrote:
>>>
>>> Also check that there is no tab or other unprintable character in your
>>> training text.
>>>
>>> Which version of tesseract are you using? show output  of
>>>
>>> tesseract -v
>>>
>>>
>>> On Sat, Jun 30, 2018 at 8:04 PM Shree Devi Kumar <shree...@gmail.com>
>>> wrote:
>>>
>>>> Then there must be a mismatch between the unicharset you are using and
>>>> the training text. eg. check whether the copyright symbol is in your
>>>> unicharset.
>>>>
>>>> On Sat, Jun 30, 2018 at 4:48 PM john <irra...@gmail.com> wrote:
>>>>
>>>>> I saw that link. this error occured many times,how can i prevent that?
>>>>>
>>>>> On Saturday, June 30, 2018 at 3:17:26 PM UTC+4:30, shree wrote:
>>>>>>
>>>>>> see
>>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#error-messages-from-training
>>>>>>
>>>>>> On Sat, Jun 30, 2018 at 3:23 PM john <irra...@gmail.com> wrote:
>>>>>>
>>>>>>> Encoding of string failed! Failure bytes: ffffffc2 ffffffa9 20
>>>>>>> ffffffd8 ffffffa8 ffffffd8 ffffffa7 ffffffd8 ffffffae ffffffd8 ffffffaa
>>>>>>> ffffffd9 ffffff86 ffffffd8 ffffffa7 20 ffffffd9 ffffff84 ffffffd8 
>>>>>>> ffffffa7
>>>>>>> ffffffd8 ffffffa4 ffffffd8 ffffffb3 20 ffffffdb ffffff8c ffffffd9 
>>>>>>> ffffff86
>>>>>>> ffffffd8 ffffffa7 ffffffd8 ffffffb1 ffffffdb ffffff8c ffffffd8 ffffffa7 
>>>>>>> 20
>>>>>>> ffffffd8 ffffffa7 ffffffd8 ffffffa8 20 ffffffd8 ffffffaa ffffffd8 
>>>>>>> ffffffa8
>>>>>>> ffffffd8 ffffffab ffffffd9 ffffff87 20 ffffffd8 ffffffaf ffffffd8 
>>>>>>> ffffffa7
>>>>>>> ffffffd9 ffffff81 ffffffd8 ffffffaa ffffffd8 ffffffb3 ffffffd8 ffffffa7 
>>>>>>> 20
>>>>>>> ffffffd9 ffffff86 ffffffdb ffffff8c ffffffd9 ffffff86 ffffffda ffffff86
>>>>>>> ffffffd9 ffffff85 ffffffd9 ffffff87 20 ffffffd9 ffffff82 ffffffd9 
>>>>>>> ffffff84
>>>>>>> ffffffd8 ffffffb7 ffffffd9 ffffff85
>>>>>>> Can't encode transcription: '۱۹ 2006© باختنا لاؤس یناریا اب تبثه
>>>>>>> دافتسا نینچمه قلطم' in language ''
>>>>>>> ^C
>>>>>>>
>>>>>>> when I finetune network for fas language i see top error?
>>>>>>> what is wrong with training?
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/11d5277e-2ef1-4ae9-8cb3-3f38290c1dfc%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/11d5277e-2ef1-4ae9-8cb3-3f38290c1dfc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> ____________________________________________________________
>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/bb5696d3-f251-4181-a1a2-dcd6b0bbdf62%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/bb5696d3-f251-4181-a1a2-dcd6b0bbdf62%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/fb051eec-930c-4114-b2d7-a574aa6e79b5%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/fb051eec-930c-4114-b2d7-a574aa6e79b5%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAH8gkc9V_Ocb5S-Aq%2BaHP%3DTXBZcfxCBJ2v2XbRdU8mMpzvNJTg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAH8gkc9V_Ocb5S-Aq%2BaHP%3DTXBZcfxCBJ2v2XbRdU8mMpzvNJTg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX8LL-W%3DZxi_bDnto0Y_sja4duKzv4HOYYG8z4adhc7xQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to