There is an open issue with similar problem in issue tracker. It will help
to move the discussion there.

I will test with your sample image and also post link to the issue.

On Tue, 2 Oct 2018, 01:01 Rujrawee K, <hevalinatroot...@gmail.com> wrote:

>
> ok, Shree, I miscommunicated with my colleague, he said this problem
> occurred on both default and custom trained model, I mean no matter what
> model are used if I trained in single language with no other language using
> in the training process and use it with other model with "-l" and having
> both language in the same line it will read in 1 language but works fine on
> single language in that line(please find result below for clearer
> explanation)
> my answers are as below :
>
>    1. we trained for using with LSTM
>    2. we used "tessdata_best"
>    3. code as show below
>
> config_name = ('-l eng+tha --oem 1 --psm 3 -c preserve_interword_spaces=1')
> im_name = cv2.imread(img_path_name, cv2.IMREAD_COLOR)
> text_name = pytesseract.image_to_string(im_name,config=config_name)
> print (text_name)
>
>
> [image: en_th.jpg]
>
>
> *The result is : * [image: result.jpg]
>
> as you can see if the input image have both language(eng+thai) in the same
> line it will read only in 1 language but when having single language in
> that line it will read in correct language these are both default
> model(same result with custom model)
>
> เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 10 นาฬิกา 14 นาที 11 วินาที UTC+7,
> shree เขียนว่า:
>>
>> 1. Have you trained for legacy tesseract engine or for LSTM?
>>
>> 2. Which default traineddata are you using?
>>
>> 3. For us to test, please provide an image and the commands used for
>> testing and the output you got.
>>
>> On Mon, Oct 1, 2018 at 11:08 PM Rujrawee K <hevalina...@gmail.com> wrote:
>>
>>> Hi Shree,
>>> Yes we tried that and it's working ok, but my problem is when I'm trying
>>> to train a new thai model and then use it with default eng model from tess4
>>> like "-l custom_tha+eng" it can only read in 1 language that comes first in
>>> the command, in this case "custom_tha" and result is the same for "-l
>>> eng+custom_tha" it will only read "eng" but when using both languages
>>> default model from tess4 it can read both languages at the same time with
>>> out a problem except the accuracy. do I missed something?
>>>
>>> เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 8 นาฬิกา 26 นาที 48 วินาที UTC+7,
>>> shree เขียนว่า:
>>>>
>>>> Have you tried
>>>>
>>>> https://github.com/tesseract-ocr/tessdata_fast/blob/master/script/Thai.traineddata
>>>>
>>>> which is supposed to support both Thai and English
>>>>
>>>> On Mon, Oct 1, 2018 at 5:33 AM Rujrawee K <hevalina...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> After I trained my custom Thai language model to use in my tesseract
>>>>> 4, it's working fine(not talking about the accuracy) but it cannot read 
>>>>> the
>>>>> English language due to not included in the model so I'm trying to combine
>>>>> my custom tha lang with default eng lang with "-l custom_tha+eng" the
>>>>> output shows that the tesseract still cannot read english texts but when I
>>>>> swap to "-l eng+custom_tha"  it can read english text now but not the thai
>>>>> texts, it's like that tesseract only use 1 model to read the text. but 
>>>>> when
>>>>> using both tha and eng default model from tesseract 4 it's working fine.
>>>>> *my question is* why and any solution/suggestion for this problem?
>>>>>
>>>>> Regards
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to tesseract-oc...@googlegroups.com.
>>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/0c1cae97-8232-41cf-8143-2fe9870378c6%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/0c1cae97-8232-41cf-8143-2fe9870378c6%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX2BCzGuUDcVGY30DtPLcHTDNeAuBT10ZFhkLMf1hLJ1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to