[tesseract-ocr] Need advice for training_text.txt

Aaron Shieh Wed, 10 Apr 2019 08:37:34 -0700

Hi,

I noticed in the langdata_lstm/chi_tra repo the training text contains long 
lines of text, my application requires only identifying single line text 
with only max of 15 chinese characters, so my question is how should I make 
my training text?


I was thinking something like this, where each row in the training text is 
close to what my final application will see:

一二三四五六七八九十年月日
一二三四五六七八九十年月日
一二三四五六七八九十年月日
...

OR should I do it like this, with spaces in between "blocks" per row:

一二三四五六七八九十年月日 一二三四五六七八九十年月日   一二三四五六七八九十年月日

Any suggestions? Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ab06272c-0e7e-4020-9e06-ca5321134aab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Need advice for training_text.txt

Reply via email to