Please see https://github.com/tesseract-ocr/tesseract/issues/1579 and continue further discussion there.
On Tue, Oct 2, 2018 at 9:52 AM Shree Devi Kumar <shreesh...@gmail.com> wrote: > There is an open issue with similar problem in issue tracker. It will help > to move the discussion there. > > I will test with your sample image and also post link to the issue. > > On Tue, 2 Oct 2018, 01:01 Rujrawee K, <hevalinatroot...@gmail.com> wrote: > >> >> ok, Shree, I miscommunicated with my colleague, he said this problem >> occurred on both default and custom trained model, I mean no matter what >> model are used if I trained in single language with no other language using >> in the training process and use it with other model with "-l" and having >> both language in the same line it will read in 1 language but works fine on >> single language in that line(please find result below for clearer >> explanation) >> my answers are as below : >> >> 1. we trained for using with LSTM >> 2. we used "tessdata_best" >> 3. code as show below >> >> config_name = ('-l eng+tha --oem 1 --psm 3 -c >> preserve_interword_spaces=1') >> im_name = cv2.imread(img_path_name, cv2.IMREAD_COLOR) >> text_name = pytesseract.image_to_string(im_name,config=config_name) >> print (text_name) >> >> >> [image: en_th.jpg] >> >> >> *The result is : * [image: result.jpg] >> >> as you can see if the input image have both language(eng+thai) in the >> same line it will read only in 1 language but when having single language >> in that line it will read in correct language these are both default >> model(same result with custom model) >> >> เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 10 นาฬิกา 14 นาที 11 วินาที UTC+7, >> shree เขียนว่า: >>> >>> 1. Have you trained for legacy tesseract engine or for LSTM? >>> >>> 2. Which default traineddata are you using? >>> >>> 3. For us to test, please provide an image and the commands used for >>> testing and the output you got. >>> >>> On Mon, Oct 1, 2018 at 11:08 PM Rujrawee K <hevalina...@gmail.com> >>> wrote: >>> >>>> Hi Shree, >>>> Yes we tried that and it's working ok, but my problem is when I'm >>>> trying to train a new thai model and then use it with default eng model >>>> from tess4 like "-l custom_tha+eng" it can only read in 1 language that >>>> comes first in the command, in this case "custom_tha" and result is the >>>> same for "-l eng+custom_tha" it will only read "eng" but when using both >>>> languages default model from tess4 it can read both languages at the same >>>> time with out a problem except the accuracy. do I missed something? >>>> >>>> เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 8 นาฬิกา 26 นาที 48 วินาที UTC+7, >>>> shree เขียนว่า: >>>>> >>>>> Have you tried >>>>> >>>>> https://github.com/tesseract-ocr/tessdata_fast/blob/master/script/Thai.traineddata >>>>> >>>>> which is supposed to support both Thai and English >>>>> >>>>> On Mon, Oct 1, 2018 at 5:33 AM Rujrawee K <hevalina...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> After I trained my custom Thai language model to use in my tesseract >>>>>> 4, it's working fine(not talking about the accuracy) but it cannot read >>>>>> the >>>>>> English language due to not included in the model so I'm trying to >>>>>> combine >>>>>> my custom tha lang with default eng lang with "-l custom_tha+eng" the >>>>>> output shows that the tesseract still cannot read english texts but when >>>>>> I >>>>>> swap to "-l eng+custom_tha" it can read english text now but not the >>>>>> thai >>>>>> texts, it's like that tesseract only use 1 model to read the text. but >>>>>> when >>>>>> using both tha and eng default model from tesseract 4 it's working fine. >>>>>> *my question is* why and any solution/suggestion for this problem? >>>>>> >>>>>> Regards >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com >>>> <https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To post to this group, send email to tesseract-ocr@googlegroups.com. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/0c1cae97-8232-41cf-8143-2fe9870378c6%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/0c1cae97-8232-41cf-8143-2fe9870378c6%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWW4tukWCEkvtN86te%2BqJrNY_R_N0RvoiwG8xU--SVhWA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.