ok, Shree, I miscommunicated with my colleague, he said this problem
occurred on both default and custom trained model, I mean no matter what
model are used if I trained in single language with no other language using
in the training process and use it with other model with "-l" and having
both language in the same line it will read in 1 language but works fine on
single language in that line(please find result below for clearer
explanation)
my answers are as below :
1. we trained for using with LSTM
2. we used "tessdata_best"
3. code as show below
config_name = ('-l eng+tha --oem 1 --psm 3 -c preserve_interword_spaces=1')
im_name = cv2.imread(img_path_name, cv2.IMREAD_COLOR)
text_name = pytesseract.image_to_string(im_name,config=config_name)
print (text_name)
[image: en_th.jpg]
*The result is : * [image: result.jpg]
as you can see if the input image have both language(eng+thai) in the same
line it will read only in 1 language but when having single language in
that line it will read in correct language these are both default
model(same result with custom model)
เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 10 นาฬิกา 14 นาที 11 วินาที UTC+7,
shree เขียนว่า:
>
> 1. Have you trained for legacy tesseract engine or for LSTM?
>
> 2. Which default traineddata are you using?
>
> 3. For us to test, please provide an image and the commands used for
> testing and the output you got.
>
> On Mon, Oct 1, 2018 at 11:08 PM Rujrawee K <hevalina...@gmail.com
> <javascript:>> wrote:
>
>> Hi Shree,
>> Yes we tried that and it's working ok, but my problem is when I'm trying
>> to train a new thai model and then use it with default eng model from tess4
>> like "-l custom_tha+eng" it can only read in 1 language that comes first in
>> the command, in this case "custom_tha" and result is the same for "-l
>> eng+custom_tha" it will only read "eng" but when using both languages
>> default model from tess4 it can read both languages at the same time with
>> out a problem except the accuracy. do I missed something?
>>
>> เมื่อ วันอังคารที่ 2 ตุลาคม ค.ศ. 2018 8 นาฬิกา 26 นาที 48 วินาที UTC+7,
>> shree เขียนว่า:
>>>
>>> Have you tried
>>>
>>> https://github.com/tesseract-ocr/tessdata_fast/blob/master/script/Thai.traineddata
>>>
>>> which is supposed to support both Thai and English
>>>
>>> On Mon, Oct 1, 2018 at 5:33 AM Rujrawee K <hevalina...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> After I trained my custom Thai language model to use in my tesseract 4,
>>>> it's working fine(not talking about the accuracy) but it cannot read the
>>>> English language due to not included in the model so I'm trying to combine
>>>> my custom tha lang with default eng lang with "-l custom_tha+eng" the
>>>> output shows that the tesseract still cannot read english texts but when I
>>>> swap to "-l eng+custom_tha" it can read english text now but not the thai
>>>> texts, it's like that tesseract only use 1 model to read the text. but
>>>> when
>>>> using both tha and eng default model from tesseract 4 it's working fine.
>>>> *my question is* why and any solution/suggestion for this problem?
>>>>
>>>> Regards
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To post to this group, send email to tesser...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/5cd91f67-0aa1-40a3-a605-4b90d413b2cd%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/tesseract-ocr/4364a760-774d-4e0f-83c6-8210e0a0f824%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
>
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/0c1cae97-8232-41cf-8143-2fe9870378c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.