You can use find_fonts with your training_text to locate the fonts to use. Modify the following command to match your directory setup and try
echo "###### FIND FONTS ######" # Find fonts which can render your training_text. Run `fc-cache -vf` to refresh cache. # You can change the minimum coverage % as needed. # This process can take a while if you have a number of installed fonts. # Review the generated fontlist and modify, if needed. # 2000 fonts found. Use a smaller set nice text2image --find_fonts \ --fonts_dir $fonts_dir \ --text $langdata_dir/$Lang/$Lang.training_text \ --min_coverage 0.999 \ --render_per_font=false \ --outputbase $langdata_dir/$Lang/$Lang \ |& grep raw \ | sed -e 's/ :.*/@ \\/g' \ | sed -e "s/^/ '/" \ | sed -e "s/@/'/g" > $langdata_dir/$Lang/$Lang.fontslist.txt On Mon, Jul 2, 2018 at 12:06 PM ran go <irrang...@gmail.com> wrote: > in my opinion error is for font-type, for some font there is no error but > for some other fonts there is error > > On Mon, Jul 2, 2018 at 9:15 AM, john <irrang...@gmail.com> wrote: > >> I use tesseract 4.0.0-beta.1. downloaded from this link (UB mannheim) >> <https://github.com/UB-Mannheim/tesseract/tree/v4.0.0-beta.1.20180414> >> >> On Saturday, June 30, 2018 at 7:13:30 PM UTC+4:30, shree wrote: >>> >>> Also check that there is no tab or other unprintable character in your >>> training text. >>> >>> Which version of tesseract are you using? show output of >>> >>> tesseract -v >>> >>> >>> On Sat, Jun 30, 2018 at 8:04 PM Shree Devi Kumar <shree...@gmail.com> >>> wrote: >>> >>>> Then there must be a mismatch between the unicharset you are using and >>>> the training text. eg. check whether the copyright symbol is in your >>>> unicharset. >>>> >>>> On Sat, Jun 30, 2018 at 4:48 PM john <irra...@gmail.com> wrote: >>>> >>>>> I saw that link. this error occured many times,how can i prevent that? >>>>> >>>>> On Saturday, June 30, 2018 at 3:17:26 PM UTC+4:30, shree wrote: >>>>>> >>>>>> see >>>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#error-messages-from-training >>>>>> >>>>>> On Sat, Jun 30, 2018 at 3:23 PM john <irra...@gmail.com> wrote: >>>>>> >>>>>>> Encoding of string failed! Failure bytes: ffffffc2 ffffffa9 20 >>>>>>> ffffffd8 ffffffa8 ffffffd8 ffffffa7 ffffffd8 ffffffae ffffffd8 ffffffaa >>>>>>> ffffffd9 ffffff86 ffffffd8 ffffffa7 20 ffffffd9 ffffff84 ffffffd8 >>>>>>> ffffffa7 >>>>>>> ffffffd8 ffffffa4 ffffffd8 ffffffb3 20 ffffffdb ffffff8c ffffffd9 >>>>>>> ffffff86 >>>>>>> ffffffd8 ffffffa7 ffffffd8 ffffffb1 ffffffdb ffffff8c ffffffd8 ffffffa7 >>>>>>> 20 >>>>>>> ffffffd8 ffffffa7 ffffffd8 ffffffa8 20 ffffffd8 ffffffaa ffffffd8 >>>>>>> ffffffa8 >>>>>>> ffffffd8 ffffffab ffffffd9 ffffff87 20 ffffffd8 ffffffaf ffffffd8 >>>>>>> ffffffa7 >>>>>>> ffffffd9 ffffff81 ffffffd8 ffffffaa ffffffd8 ffffffb3 ffffffd8 ffffffa7 >>>>>>> 20 >>>>>>> ffffffd9 ffffff86 ffffffdb ffffff8c ffffffd9 ffffff86 ffffffda ffffff86 >>>>>>> ffffffd9 ffffff85 ffffffd9 ffffff87 20 ffffffd9 ffffff82 ffffffd9 >>>>>>> ffffff84 >>>>>>> ffffffd8 ffffffb7 ffffffd9 ffffff85 >>>>>>> Can't encode transcription: '۱۹ 2006© باختنا لاؤس یناریا اب تبثه >>>>>>> دافتسا نینچمه قلطم' in language '' >>>>>>> ^C >>>>>>> >>>>>>> when I finetune network for fas language i see top error? >>>>>>> what is wrong with training? >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/11d5277e-2ef1-4ae9-8cb3-3f38290c1dfc%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/11d5277e-2ef1-4ae9-8cb3-3f38290c1dfc%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/bb5696d3-f251-4181-a1a2-dcd6b0bbdf62%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/bb5696d3-f251-4181-a1a2-dcd6b0bbdf62%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/fb051eec-930c-4114-b2d7-a574aa6e79b5%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/fb051eec-930c-4114-b2d7-a574aa6e79b5%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAH8gkc9V_Ocb5S-Aq%2BaHP%3DTXBZcfxCBJ2v2XbRdU8mMpzvNJTg%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAH8gkc9V_Ocb5S-Aq%2BaHP%3DTXBZcfxCBJ2v2XbRdU8mMpzvNJTg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX8LL-W%3DZxi_bDnto0Y_sja4duKzv4HOYYG8z4adhc7xQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.