There is an open issue with similar problem in issue tracker. It will help to move the discussion there.
I will test with your sample image and also post link to the issue. On Tue, 2 Oct 2018, 01:01 Rujrawee K, <hevalinatroot...@gmail.com> wrote: > > ok, Shree, I miscommunicated with my colleague, he said this problem > occurred on both default and custom trained model, I mean no matter what > model are used if I trained in single language with no other language using > in the training process and use it with other model with "-l" and having > both language in the same line it will read in 1 language but works fine on > single language in that line(please find result below for clearer > explanation) > my answers are as below : > > 1. we trained for using with LSTM > 2. we used "tessdata_best" > 3. code as show below > > config_name = ('-l eng+tha --oem 1 --psm 3 -c preserve_interword_spaces=1') > im_name = cv2.imread(img_path_name, cv2.IMREAD_COLOR) > text_name = pytesseract.image_to_string(im_name,config=config_name) > print (text_name) > > > [image: en_th.jpg] > > > *The result is : * [image: result.jpg] > > as you can see if the input image have both language(eng+thai) in the same > line it will read only in 1 language but when having single language in > that line it will read in correct language these are both default > model(same result with custom model) > > เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 10 นาฬิกา 14 นาที 11 วินาที UTC+7, > shree เขียนว่า: >> >> 1. Have you trained for legacy tesseract engine or for LSTM? >> >> 2. Which default traineddata are you using? >> >> 3. For us to test, please provide an image and the commands used for >> testing and the output you got. >> >> On Mon, Oct 1, 2018 at 11:08 PM Rujrawee K <hevalina...@gmail.com> wrote: >> >>> Hi Shree, >>> Yes we tried that and it's working ok, but my problem is when I'm trying >>> to train a new thai model and then use it with default eng model from tess4 >>> like "-l custom_tha+eng" it can only read in 1 language that comes first in >>> the command, in this case "custom_tha" and result is the same for "-l >>> eng+custom_tha" it will only read "eng" but when using both languages >>> default model from tess4 it can read both languages at the same time with >>> out a problem except the accuracy. do I missed something? >>> >>> เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 8 นาฬิกา 26 นาที 48 วินาที UTC+7, >>> shree เขียนว่า: >>>> >>>> Have you tried >>>> >>>> https://github.com/tesseract-ocr/tessdata_fast/blob/master/script/Thai.traineddata >>>> >>>> which is supposed to support both Thai and English >>>> >>>> On Mon, Oct 1, 2018 at 5:33 AM Rujrawee K <hevalina...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> After I trained my custom Thai language model to use in my tesseract >>>>> 4, it's working fine(not talking about the accuracy) but it cannot read >>>>> the >>>>> English language due to not included in the model so I'm trying to combine >>>>> my custom tha lang with default eng lang with "-l custom_tha+eng" the >>>>> output shows that the tesseract still cannot read english texts but when I >>>>> swap to "-l eng+custom_tha" it can read english text now but not the thai >>>>> texts, it's like that tesseract only use 1 model to read the text. but >>>>> when >>>>> using both tha and eng default model from tesseract 4 it's working fine. >>>>> *my question is* why and any solution/suggestion for this problem? >>>>> >>>>> Regards >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To post to this group, send email to tesser...@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To post to this group, send email to tesser...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/0c1cae97-8232-41cf-8143-2fe9870378c6%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/0c1cae97-8232-41cf-8143-2fe9870378c6%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX2BCzGuUDcVGY30DtPLcHTDNeAuBT10ZFhkLMf1hLJ1A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.