Hi, On 19/03/2021 10:11, Charles Cho wrote: > Hello, > I'm working on a ocr android app based on tesseract. > I want to add feature that detects language automatically and recognize > at least 2 languages at once. > I have investigated on that for a while so I know that I have to specify > language for tesseract. > Then how can I implement auto detection of language?
Not exactly a mobile use case, but you can read how the Internet Archive does this (I coined it "autonomous mode", where the software just figures out the scripts and languages): https://archive.org/services/docs/api/ocr.html#autonomous-mode And the code is available, here (I plan to split out the archive.org specific code from the python code that invokes Tesseract and performs heuristics like script detection): https://git.archive.org/www/tesseract/-/blob/master/main.py#L757 the tl;dr is to first perform script detection, and use the detected script to OCR the page - then use language detection libraries to guess the languages on the page. > And tesseract on google play store can recognize 3 languages at once. > Is it maximum? I am not sure what you're finding on google play store, but I have found there to be no limitation to the amount of languages that can be used during OCR. Keep in mind that using more languages will slow down the OCR process. > Any help and advice would be really appreciated. Hope this helps. Cheers, Merlijn -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/107cecab-c899-2e12-8621-e20f71a8c0f0%40archive.org.