try PageSegmentationMode.AUTO You may need to enlarge to 300, what’s original DPI?
From: tesseract-ocr@googlegroups.com [mailto:tesseract-ocr@googlegroups.com] On Behalf Of MariamHi Sent: 16 October 2018 11:07 To: tesseract-ocr@googlegroups.com Subject: RE: [tesseract-ocr] Multiple Languages Yes, I tried it the same, My code is : string dataPath = ConfigurationManager.AppSettings["DataSet"].ToString(); string language = “eng+ara"; // Tried ara+eng OcrEngineMode oem = OcrEngineMode.DEFAULT; PageSegmentationMode psm = PageSegmentationMode.SINGLE_BLOCK; TessBaseAPI tessBaseAPI = new TessBaseAPI(dataPath, language, oem, psm); Pix pix = Pix.Read(imageFilePath); pix.XRes = 300; pix.YRes = 300; if (pix != null) { tessBaseAPI.SetImage(pix); tessBaseAPI.Recognize(); string stringBuilder = tessBaseAPI.GetUTF8Text(); textBox2.Text = stringBuilder; } My image in attachment The Result : eng+ara SimplifiedArabic 1 1 مه 200 ?Google ( 15 93? يستعد لاقتحام أدمغتنا ?الاحد 9 سبتمبر 2018 3 ?الاقتصادية" من الرياض" ?2 9 ?هل حدث وأن بحثت عن منتج معين عبر الإنترنت وتفاجأت باقتراحات عديدة لاحقا على حسابك في فيسبوك ع لشركات توفر منتجات مشابهة؟ الأمر لیس صدفة؛ ولكن ذلك يندرج ضمن استراتيجيات يتبناها عمالقة التقنية للتأثير في قراراتنا الشرائية. ?وساهمت الشبكة العنكبوتية في جعل عملية جمع بيانات المستخدمين أكثر سهولة من أي وقت مضى؛ مع ترك هؤلاء لآثارهم الرقمية في العديد من المواقع والتطبیقات» والتي تكون هدفا لشركات التكنولوجيا المدعومة بترسانة من تقنیات الذکاء الاصطناعي ?intelligence? 4- ?وبيّنت دراسة صادرة عن باحثين في جامعة 'برينستون" الأميركيةء أن ?Google? ترصد تحرکات ما یزید عن ملياري شخص حول العالم» ممن يستعملون أجهزة وهواتف تعمل بنظام التشغيل الشهير "8001010" بحسب ما ذكرت ?.'Skynews"? ?وأكد تقرير لوكالة الأسوشيتد برس 016558 0551018160 أن الكثير من خدمات جوجل على أجهزة آيفون 06 وآندرويد ?Android? تخزّن بيانات مواقع المستخدمین» حتى وإن قاموا بإيقاف تشغيل خدمات تحديد الموقع الجغرافي بتغيير إعدادات الخصوصية المتوفرة في تلك الأجهزة. – The Result : ara+eng SimplifiedArabic 200 ao oe 1 1 جو جا 18 9 ?Goo? يستعد لاقتحام أدمغتنا الاحد 9 سبتمبر 2018 الاقتصادية" من الرياض" 2 9 هل حدث وأن بحثت عن منتج معين عبر الإنترنت وتفاجأت باقتراحات عديدة لاحقا على حسابك في فيسبوك ع لشركات توفر منتجات مشابهة؟ الأمر لیس صدفة؛ ولكن ذلك يندرج ضمن استراتيجيات يتبناها عمالقة التقنية للتأثير في قراراتنا الشرائية. وساهمت الشبكة العنكبوتية في جعل عملية جمع بيانات المستخدمين أكثر سهولة من أي وقت مضى؛ مع ترك هؤلاء لآثارهم الرقمية في العديد من المواقع والتطبيقات» والتي تكون هدفا لشركات التكنولوجيا المدعومة بترسانة من تقنيات الذكاء الاصطناعي ?intelligence? 4- وبيّنت دراسة صادرة عن باحثين في جامعة 'برينستون" الأميركية» أن 6ا6009 ترصد تحركات ما يزيد عن ملياري شخص حول العالم» ممن يستعملون أجهزة وهواتف تعمل بنظام التشغيل الشهير "8001010" بحسب ما ذكرت ?.'Skynews"? وأكد تقرير لوكالة الأسوشيتد برس 016558 0551018160 أن الكثير من خدمات جوجل على أجهزة آيفون 06 وآندرويد ?Android? تخزّن بيانات مواقع المستخدمین» حتى وإن قاموا بإيقاف تشغيل خدمات تحديد الموقع الجغرافي بتغيير إعدادات الخصوصية المتوفرة في تلك الأجهزة. - From: Adrian Owen<mailto:adrian.o...@eesm.com> Sent: Tuesday, October 16, 2018 12:42 PM To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> Subject: RE: [tesseract-ocr] Multiple Languages Try changing order: English+Arabic Any better ? From: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> [mailto:tesseract-ocr@googlegroups.com] On Behalf Of MariamHi Sent: 16 October 2018 08:27 To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> Subject: RE: [tesseract-ocr] Multiple Languages When I did pre-processing I get result more bad, the idea is when I recognize document in Arabic I get it almost correct and when I recognize document in English I get it correct but when I recognize document in Arabic+English “Multiple” I get allEnglish word in digits .. how to fix it ? From: Adrian Owen<mailto:adrian.o...@eesm.com> Sent: Monday, October 15, 2018 3:42 PM To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> Subject: RE: [tesseract-ocr] Multiple Languages Gimp is your friend: https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy If your programming, use KalikoImage library to replicate manual GIMP steps, that’s easy. I found greyscale didn’t help. YES: Long line removal (may not apply to you) (OpenCV) YES: resize to 300DPI YES: Apply filters Hope helps, Adrian From: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> [mailto:tesseract-ocr@googlegroups.com] On Behalf Of MariamHi Sent: 15 October 2018 13:38 To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> Subject: RE: [tesseract-ocr] Multiple Languages I did this but I have Bad recognition for English word .. what is the accuracy for multiple languages and how to improve it ? From: Adrian Owen<mailto:adrian.o...@eesm.com> Sent: Monday, October 15, 2018 3:35 PM To: tesseract-ocr<mailto:tesseract-ocr@googlegroups.com> Subject: Re: [tesseract-ocr] Multiple Languages Just list locales using + delimiter. Sent from my Huawei Mobile -------- Original Message -------- Subject: [tesseract-ocr] Multiple Languages From: Mariam Hijazi To: tesseract-ocr CC: Does tesseract support recognize multiple language in one document ? and how would do that ? Regards. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To post to this group, send email to tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/73c903ac-d23c-4396-84b3-c0fbfb9f8923%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/73c903ac-d23c-4396-84b3-c0fbfb9f8923%40googlegroups.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To post to this group, send email to tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e11e2821d81343d28488aa9212cceb47%40eesm.com<https://groups.google.com/d/msgid/tesseract-ocr/e11e2821d81343d28488aa9212cceb47%40eesm.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To post to this group, send email to tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5bc48a34.1c69fb81.7ab73.0898%40mx.google.com<https://groups.google.com/d/msgid/tesseract-ocr/5bc48a34.1c69fb81.7ab73.0898%40mx.google.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To post to this group, send email to tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f302c48f782041a5bed69846fda2e032%40eesm.com<https://groups.google.com/d/msgid/tesseract-ocr/f302c48f782041a5bed69846fda2e032%40eesm.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To post to this group, send email to tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5bc592ca.1c69fb81.a6161.21df%40mx.google.com<https://groups.google.com/d/msgid/tesseract-ocr/5bc592ca.1c69fb81.a6161.21df%40mx.google.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To post to this group, send email to tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/525df645f8534ca591b1c33fdaa6c027%40eesm.com<https://groups.google.com/d/msgid/tesseract-ocr/525df645f8534ca591b1c33fdaa6c027%40eesm.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>. To post to this group, send email to tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5bc5b85a.1c69fb81.3d75c.a344%40mx.google.com<https://groups.google.com/d/msgid/tesseract-ocr/5bc5b85a.1c69fb81.3d75c.a344%40mx.google.com?utm_medium=email&utm_source=footer>. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7df48cd0a2fd45a9a3365bb174e9743d%40eesm.com. For more options, visit https://groups.google.com/d/optout.