Have you solved the problem? 在2021年3月26日星期五 UTC+8 09:55:53<charles...@gmail.com> 写道:
> Hi, > > >>>The OSD module does not detect language - it detect script, as you also > >>>noted earlier: > It detects language by using OSD in tesseract and tesseract also provides > DetectOrientationScript function. > > api.Init("/Users/renard/devel/textfairy/tessdata", "osd", > tesseract::OcrEngineMode::OEM_DEFAULT); > api.SetPageSegMode(tesseract::PageSegMode::PSM_OSD_ONLY); > api.SetImage(pix); > api.DetectOrientationScript(&orient_deg, &orient_conf, &script_name, > &script_conf); > > After this, script_name will get language name and script_conf will get > confidence value. > As I tested several languages, scipt_name gets following values. > English -> 'Latin' > French->'Latin' > German->'Latin' > Chinese_Sim -> 'Han' > Chinese_Tra -> 'Han' > Korean -> 'Korean' > Japanese -> 'Japanese' > Russian -> 'Cyrillic' > > So the problem is that I want to distinguish Latin languages exactly and I > want to detects several languages once from an image. > > Thanks. > Best, > Charles. > On Friday, March 26, 2021 at 2:33:26 AM UTC+8 Merlijn Wajer wrote: > >> Hi, >> >> On 25/03/2021 19:04, Charles Cho wrote: >> > Hi. >> > >> > Thank you very much for your kind help, shree. >> > I tried to detect script by your help and it worked. Great. >> > >> > I have some questions. >> > 1. If the image contains texts of different languages in a page, is >> there >> > any way to detect all of the languages? Now it detects only one >> language. >> > 2. It detects English, German, French as 'Latin'. So how can I >> distinguish >> > the languages exactly? >> >> The OSD module does not detect language - it detect script, as you also >> noted earlier: >> >> >>> So in my analysis, it used OSD of tesseract engine to detect layout >> and >> >>> script. >> >>> After detect script, it detects languages on the script. >> >> What's missing is performing OCR using just the script - and then >> analysing the corpus to detect the language. >> >> You could use something like this: https://github.com/saffsd/langid.c >> >> Regards, >> Merlijn >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/04a1038c-3720-4524-aa95-dc851804563bn%40googlegroups.com.