Hi. Thank you very much for your kind help, shree. I tried to detect script by your help and it worked. Great.
I have some questions. 1. If the image contains texts of different languages in a page, is there any way to detect all of the languages? Now it detects only one language. 2. It detects English, German, French as 'Latin'. So how can I distinguish the languages exactly? Thanks. Best, Charles. On Thursday, March 25, 2021 at 9:49:10 PM UTC+8 shree wrote: > See > https://github.com/tesseract-ocr/tessdoc/blob/master/examples/OSD_example.cc > > //Get OSD - new code > int orient_deg; > float orient_conf; > const char* script_name; > float script_conf; > api->DetectOrientationScript(&orient_deg, &orient_conf, &script_name, > &script_conf); > printf("************\n Orientation in degrees: %d\n Orientation > confidence: %.2f\n" > " Script: %s\n Script confidence: %.2f\n", > orient_deg, orient_conf, > script_name, script_conf); > > On Thursday, March 25, 2021 at 2:11:42 PM UTC+5:30 charles...@gmail.com > wrote: > >> Hi, >> >> I have investigated on trying to detect language automatically. >> I referred to these links. Thank you, Merlijin. >> https://archive.org/services/docs/api/ocr.html#autonomous-mode >> https://git.archive.org/www/tesseract/-/blob/master/main.py#L757 >> >> So in my analysis, it used OSD of tesseract engine to detect layout and >> script. >> After detect script, it detects languages on the script. >> >> So I tried to use OSD engine mode based on textfairy which is Android OCR >> app based on tesseract 4.1.1. >> But it doesn't work and I can't make sure how I can use OSD engine mode >> in Android. >> I set 'osd' as language option string and used osd.traindata and set >> 'OEM_OSD_ONLY' as engine mode. >> But it doesn't work. >> >> Hope anyone can help you to use OSD engine mode in Android. >> >> Thank you. >> Best, >> Charles. >> >> On Monday, March 22, 2021 at 10:28:38 AM UTC+8 Charles Cho wrote: >> >>> Hi, Merlijn. >>> >>> Thanks for your kind response. >>> >>> Regarding autonomous mode, I'm trying to find such module for Android. >>> But I found nothing. I will try more. >>> >>> >I am not sure what you're finding on google play store, but I have found >>> >there to be no limitation to the amount of languages that can be used >>> >during OCR. Keep in mind that using more languages will slow down the >>> >OCR process. >>> It's textfairy, open source app. >>> https://play.google.com/store/apps/details?id=com.renard.ocr >>> >>> Your response is really helpful. >>> >>> Best, >>> Charles. >>> On Sunday, March 21, 2021 at 8:29:13 AM UTC+8 Merlijn Wajer wrote: >>> >>>> Hi, >>>> >>>> On 19/03/2021 10:11, Charles Cho wrote: >>>> > Hello, >>>> > I'm working on a ocr android app based on tesseract. >>>> > I want to add feature that detects language automatically and >>>> recognize >>>> > at least 2 languages at once. >>>> > I have investigated on that for a while so I know that I have to >>>> specify >>>> > language for tesseract. >>>> > Then how can I implement auto detection of language? >>>> >>>> Not exactly a mobile use case, but you can read how the Internet >>>> Archive >>>> does this (I coined it "autonomous mode", where the software just >>>> figures out the scripts and languages): >>>> >>>> https://archive.org/services/docs/api/ocr.html#autonomous-mode >>>> >>>> And the code is available, here (I plan to split out the archive.org >>>> specific code from the python code that invokes Tesseract and performs >>>> heuristics like script detection): >>>> >>>> https://git.archive.org/www/tesseract/-/blob/master/main.py#L757 >>>> >>>> the tl;dr is to first perform script detection, and use the detected >>>> script to OCR the page - then use language detection libraries to guess >>>> the languages on the page. >>>> >>>> > And tesseract on google play store can recognize 3 languages at once. >>>> > Is it maximum? >>>> >>>> I am not sure what you're finding on google play store, but I have >>>> found >>>> there to be no limitation to the amount of languages that can be used >>>> during OCR. Keep in mind that using more languages will slow down the >>>> OCR process. >>>> >>>> > Any help and advice would be really appreciated. >>>> >>>> Hope this helps. >>>> >>>> Cheers, >>>> Merlijn >>>> >>> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c6c896fc-5e0c-40b6-af7f-f66c424ecd7cn%40googlegroups.com.