[tesseract-ocr] Can't encode transcription error with Sinhala language

Sumedhe Dissanayake Sat, 13 Jan 2018 23:02:03 -0800

I tried lstmtraining with sinhala language but I always get this error.

Command:


lstmtraining --traineddata ~/tesstutorial/sintrain/sin/sin.traineddata \
   --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c155]' \
   --debug_interval 0 --max_iterations 500000 --max_image_MB 60000 
--learning_rate 
20e-4 \
   --model_output ~/tesstutorial/sinoutput/base \
    -U ~/tesstutorial/sintrain/sin/sin.unicharset \
   --traineddata ~/tesstutorial/sintrain/sin/sin.traineddata \
   --train_listfile ~/tesstutorial/sintrain/sin.training_files.txt 


Error:
Can't encode transcription: 'වැනි නිර්භීත දැන් පියඹා මෙන්ම හා' in language 
''




<https://lh3.googleusercontent.com/-OI3Fa2QpWgk/WllqKRXYOBI/AAAAAAAAB1g/6gGg9l6txgItGlpGaAfPa4sNKfHYgL75QCLcBGAs/s1600/Screenshot%2Bfrom%2B2018-01-09%2B21-29-43.png>

I tried with english language also, It worked well with english.

How to resolve this issue?

Platform:
Linux Ubuntu 16.04 LTS

Tesseract Version: 
tesseract 4.00.00alpha
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a6f3ccc4-eb0f-4ab7-a194-90219f499135%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Can't encode transcription error with Sinhala language

Reply via email to