1. tesseract 4 is outdated. 2. tesstrain.sh is depreciated
Zdenko st 20. 12. 2023 o 11:18 Uvindu Bimsara <bimsarauvi...@gmail.com> napĂsal(a): > When i started training tesseract 4.0 using tesstrain.sh for sinhala > unicode font got this error. > === Starting training for language 'sin' [Wed Dec 20 09:44:58 AM UTC 2023] > /usr/bin/text2image --fonts_dir=fonts --ptsize 12 --font=SS-SuLakna > --outputbase=/tmp/font_tmp.MoFCLmddzb/sample_text.txt > --text=/tmp/font_tmp.MoFCLmddzb/sample_text.txt > --fontconfig_tmpdir=/tmp/font_tmp.MoFCLmddzb Could not find font named > 'SS-SuLakna'. Pango suggested font 'Bhashitha Bold'. Please correct --font > arg. ERROR: Program text2image failed. Abort. > > Here is my code > !rm -rf train/* > ! /content/drive/MyDrive/nic_project/HNR/tesseract/src/training/tesstrain.sh > --fonts_dir fonts \ > --fontlist "SS-SuLakna" \ > --lang sin \ > --linedata_only \ > --langdata_dir /content/drive/MyDrive/nic_project/HNR/langdata_lstm \ > --tessdata_dir /content/drive/MyDrive/nic_project/HNR/tesseract/tessdata > \ > --save_box_tiff \ > --maxpages 10 \ > --output_dir train > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/019bf94a-c3bd-438a-b4e5-aca28de536c7n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/019bf94a-c3bd-438a-b4e5-aca28de536c7n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zUhe4mY8jyu46q0RPjkeHBr_3gryC8aA4WBTrbFwMJUA%40mail.gmail.com.