After some research in Korean I found that they do use Chinese characters in their language, so it is correct to set Chinese as a sublanguage, the problem is that the kor.training_text doesn't have chinede letters, so the code is only training Korean and ignoring the Chinese, so if I tesseract on an image that has Korean and Chinese it is going to recognize some Korean characters as Chinese and some Chinese characters as Korean.
On Monday, 9 April 2018 05:15:57 UTC-3, shree wrote: > > Leftover from 3.04, my guess. > > On Mon 9 Apr, 2018, 12:52 PM Fanatico, <fanati...@gmail.com <javascript:>> > wrote: > >> It worked, thanks. >> >> Any reason for this chi_tra there? >> >> >> On Monday, 9 April 2018 03:24:44 UTC-3, shree wrote: >>> >>> Please remove the sub language line from config file, and use combine >>> tessdata to overwrite it. >>> >>> Right now it seems to be using chi_tra also. >>> >>> On Mon 9 Apr, 2018, 11:48 AM Fanatico, <fanati...@gmail.com> wrote: >>> >>>> I used one traineddata that I created on removing the top layer from >>>> the kor.traineddata from "tessdata_best", after this I replaced this >>>> traineddata with the one from "tessdata_best" and got the same problem. >>>> >>>> Yes, it include chi_tra as sublanguage >>>> tessedit_load_sublangs chi_tra >>>> >>>> lstm-unicharset only has corean characters >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/0d50ee2b-b5d4-4c73-a45b-d5245403ad04%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/0d50ee2b-b5d4-4c73-a45b-d5245403ad04%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/8496ad57-f7eb-426c-a4ae-5d365c56bc96%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/8496ad57-f7eb-426c-a4ae-5d365c56bc96%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d20b1468-9b36-49a5-9b96-3a8ed2df3e71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.