Have you solved the problem?

在2021年3月26日星期五 UTC+8 09:55:53<charles...@gmail.com> 写道:

> Hi, 
>
> >>>The OSD module does not detect language - it detect script, as you also
> >>>noted earlier:
> It detects language by using OSD in tesseract and tesseract also provides 
> DetectOrientationScript function.
>
> api.Init("/Users/renard/devel/textfairy/tessdata", "osd", 
> tesseract::OcrEngineMode::OEM_DEFAULT);
> api.SetPageSegMode(tesseract::PageSegMode::PSM_OSD_ONLY);
> api.SetImage(pix);
> api.DetectOrientationScript(&orient_deg, &orient_conf, &script_name, 
> &script_conf);  
>
> After this, script_name will get language name and script_conf will get 
> confidence value.
> As I tested several languages, scipt_name gets following values.
> English -> 'Latin'
> French->'Latin'
> German->'Latin'
> Chinese_Sim -> 'Han'
> Chinese_Tra -> 'Han'
> Korean -> 'Korean'
> Japanese -> 'Japanese'
> Russian -> 'Cyrillic'
>
> So the problem is that I want to distinguish Latin languages exactly and I 
> want to  detects several languages once from an image.
>
> Thanks.
> Best,
> Charles.
> On Friday, March 26, 2021 at 2:33:26 AM UTC+8 Merlijn Wajer wrote:
>
>> Hi, 
>>
>> On 25/03/2021 19:04, Charles Cho wrote: 
>> > Hi. 
>> > 
>> > Thank you very much for your kind help, shree. 
>> > I tried to detect script by your help and it worked. Great. 
>> > 
>> > I have some questions. 
>> > 1. If the image contains texts of different languages in a page, is 
>> there 
>> > any way to detect all of the languages? Now it detects only one 
>> language. 
>> > 2. It detects English, German, French as 'Latin'. So how can I 
>> distinguish 
>> > the languages exactly? 
>>
>> The OSD module does not detect language - it detect script, as you also 
>> noted earlier: 
>>
>> >>> So in my analysis, it used OSD of tesseract engine to detect layout 
>> and 
>> >>> script. 
>> >>> After detect script, it detects languages on the script. 
>>
>> What's missing is performing OCR using just the script - and then 
>> analysing the corpus to detect the language. 
>>
>> You could use something like this: https://github.com/saffsd/langid.c 
>>
>> Regards, 
>> Merlijn 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/04a1038c-3720-4524-aa95-dc851804563bn%40googlegroups.com.

Reply via email to