try PageSegmentationMode.AUTO

You may need to enlarge to 300, what’s original DPI?

From: tesseract-ocr@googlegroups.com [mailto:tesseract-ocr@googlegroups.com] On 
Behalf Of MariamHi
Sent: 16 October 2018 11:07
To: tesseract-ocr@googlegroups.com
Subject: RE: [tesseract-ocr] Multiple Languages

Yes, I tried it the same,
My code is :
string dataPath = ConfigurationManager.AppSettings["DataSet"].ToString();
string language = “eng+ara"; // Tried ara+eng
OcrEngineMode oem = OcrEngineMode.DEFAULT;
PageSegmentationMode psm = PageSegmentationMode.SINGLE_BLOCK;
TessBaseAPI tessBaseAPI = new TessBaseAPI(dataPath, language, oem, psm);
Pix pix = Pix.Read(imageFilePath);
pix.XRes = 300;
pix.YRes = 300;
if (pix != null)
{
tessBaseAPI.SetImage(pix);
tessBaseAPI.Recognize();
string stringBuilder = tessBaseAPI.GetUTF8Text();
textBox2.Text = stringBuilder;
}
My image in attachment
The Result : eng+ara
SimplifiedArabic

1 1 مه 200

?Google ( 15 93? يستعد لاقتحام أدمغتنا

?الاحد 9 سبتمبر 2018 3

?الاقتصادية" من الرياض"

?2 9

?هل حدث وأن بحثت عن منتج معين عبر الإنترنت وتفاجأت باقتراحات عديدة لاحقا على 
حسابك في فيسبوك ع لشركات توفر منتجات مشابهة؟ الأمر
لیس صدفة؛ ولكن ذلك يندرج ضمن استراتيجيات يتبناها عمالقة التقنية للتأثير في 
قراراتنا الشرائية.

?وساهمت الشبكة العنكبوتية في جعل عملية جمع بيانات المستخدمين أكثر سهولة من أي 
وقت مضى؛ مع ترك هؤلاء لآثارهم الرقمية في العديد من المواقع
والتطبیقات» والتي تكون هدفا لشركات التكنولوجيا المدعومة بترسانة من تقنیات 
الذکاء الاصطناعي ?intelligence? 4-

?وبيّنت دراسة صادرة عن باحثين في جامعة 'برينستون" الأميركيةء أن ?Google? ترصد 
تحرکات ما یزید عن ملياري شخص حول العالم» ممن يستعملون
أجهزة وهواتف تعمل بنظام التشغيل الشهير "8001010" بحسب ما ذكرت ?.'Skynews"?

?وأكد تقرير لوكالة الأسوشيتد برس 016558 0551018160 أن الكثير من خدمات جوجل على 
أجهزة آيفون 06 وآندرويد ?Android? تخزّن بيانات
مواقع المستخدمین» حتى وإن قاموا بإيقاف تشغيل خدمات تحديد الموقع الجغرافي بتغيير 
إعدادات الخصوصية المتوفرة في تلك الأجهزة. –


The Result : ara+eng
SimplifiedArabic

200 ao oe 1 1

جو جا 18 9 ?Goo? يستعد لاقتحام أدمغتنا

الاحد 9 سبتمبر 2018

الاقتصادية" من الرياض"

2 9

هل حدث وأن بحثت عن منتج معين عبر الإنترنت وتفاجأت باقتراحات عديدة لاحقا على 
حسابك في فيسبوك ع لشركات توفر منتجات مشابهة؟ الأمر
لیس صدفة؛ ولكن ذلك يندرج ضمن استراتيجيات يتبناها عمالقة التقنية للتأثير في 
قراراتنا الشرائية.

وساهمت الشبكة العنكبوتية في جعل عملية جمع بيانات المستخدمين أكثر سهولة من أي 
وقت مضى؛ مع ترك هؤلاء لآثارهم الرقمية في العديد من المواقع
والتطبيقات» والتي تكون هدفا لشركات التكنولوجيا المدعومة بترسانة من تقنيات 
الذكاء الاصطناعي ?intelligence? 4-

وبيّنت دراسة صادرة عن باحثين في جامعة 'برينستون" الأميركية» أن 6ا6009 ترصد 
تحركات ما يزيد عن ملياري شخص حول العالم» ممن يستعملون
أجهزة وهواتف تعمل بنظام التشغيل الشهير "8001010" بحسب ما ذكرت ?.'Skynews"?

وأكد تقرير لوكالة الأسوشيتد برس 016558 0551018160 أن الكثير من خدمات جوجل على 
أجهزة آيفون 06 وآندرويد ?Android? تخزّن بيانات
مواقع المستخدمین» حتى وإن قاموا بإيقاف تشغيل خدمات تحديد الموقع الجغرافي بتغيير 
إعدادات الخصوصية المتوفرة في تلك الأجهزة. -
From: Adrian Owen<mailto:adrian.o...@eesm.com>
Sent: Tuesday, October 16, 2018 12:42 PM
To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>
Subject: RE: [tesseract-ocr] Multiple Languages

Try changing order: English+Arabic

Any better ?

From: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> 
[mailto:tesseract-ocr@googlegroups.com] On Behalf Of MariamHi
Sent: 16 October 2018 08:27
To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>
Subject: RE: [tesseract-ocr] Multiple Languages

When I did pre-processing I get result more bad, the idea is when I recognize 
document in Arabic I get it almost correct and when I recognize document in 
English I get it correct but when I recognize document in Arabic+English 
“Multiple” I get allEnglish word in digits .. how to fix it ?
From: Adrian Owen<mailto:adrian.o...@eesm.com>
Sent: Monday, October 15, 2018 3:42 PM
To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>
Subject: RE: [tesseract-ocr] Multiple Languages

Gimp is your friend: 
https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy

If your programming, use KalikoImage library to replicate manual GIMP steps, 
that’s easy.

I found greyscale didn’t help.
YES: Long line removal (may not apply to you) (OpenCV)
YES: resize to 300DPI
YES: Apply filters

Hope helps, Adrian

From: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com> 
[mailto:tesseract-ocr@googlegroups.com] On Behalf Of MariamHi
Sent: 15 October 2018 13:38
To: tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>
Subject: RE: [tesseract-ocr] Multiple Languages

I did this but I have Bad recognition for English word .. what is the accuracy 
for multiple languages and how to improve it ?
From: Adrian Owen<mailto:adrian.o...@eesm.com>
Sent: Monday, October 15, 2018 3:35 PM
To: tesseract-ocr<mailto:tesseract-ocr@googlegroups.com>
Subject: Re: [tesseract-ocr] Multiple Languages

Just list locales using + delimiter.

Sent from my Huawei Mobile

-------- Original Message --------
Subject: [tesseract-ocr] Multiple Languages
From: Mariam Hijazi
To: tesseract-ocr
CC:
Does tesseract support recognize multiple language in one document ? and how 
would do that ?
Regards.
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To post to this group, send email to 
tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/73c903ac-d23c-4396-84b3-c0fbfb9f8923%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/73c903ac-d23c-4396-84b3-c0fbfb9f8923%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To post to this group, send email to 
tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e11e2821d81343d28488aa9212cceb47%40eesm.com<https://groups.google.com/d/msgid/tesseract-ocr/e11e2821d81343d28488aa9212cceb47%40eesm.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To post to this group, send email to 
tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5bc48a34.1c69fb81.7ab73.0898%40mx.google.com<https://groups.google.com/d/msgid/tesseract-ocr/5bc48a34.1c69fb81.7ab73.0898%40mx.google.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To post to this group, send email to 
tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f302c48f782041a5bed69846fda2e032%40eesm.com<https://groups.google.com/d/msgid/tesseract-ocr/f302c48f782041a5bed69846fda2e032%40eesm.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To post to this group, send email to 
tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5bc592ca.1c69fb81.a6161.21df%40mx.google.com<https://groups.google.com/d/msgid/tesseract-ocr/5bc592ca.1c69fb81.a6161.21df%40mx.google.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To post to this group, send email to 
tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/525df645f8534ca591b1c33fdaa6c027%40eesm.com<https://groups.google.com/d/msgid/tesseract-ocr/525df645f8534ca591b1c33fdaa6c027%40eesm.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
tesseract-ocr+unsubscr...@googlegroups.com<mailto:tesseract-ocr+unsubscr...@googlegroups.com>.
To post to this group, send email to 
tesseract-ocr@googlegroups.com<mailto:tesseract-ocr@googlegroups.com>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5bc5b85a.1c69fb81.3d75c.a344%40mx.google.com<https://groups.google.com/d/msgid/tesseract-ocr/5bc5b85a.1c69fb81.3d75c.a344%40mx.google.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7df48cd0a2fd45a9a3365bb174e9743d%40eesm.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to