[tesseract-ocr] Re: Need Help with extracting info from Invoice

2018-01-10 Thread saumitra mallick
> > Hello all , > I'm working on similar project , in my case i'm reading bank statements. I noticed the following 1. when you have a single line of text tesseract performs much better 2. I'm using openCV to cut individual cells from a table (you always know the order of cells since you cut the

Re: [tesseract-ocr] Re: Need Help with extracting info from Invoice

2018-01-10 Thread ShreeDevi Kumar
See https://github.com/tesseract-ocr/tesseract/wiki/APIExample For example of using tesseract in a program. The training tutorial you refer to is old. See tesstrain.sh for creating synthetic training data. On 10-Jan-2018 2:54 PM, "saumitra mallick" wrote: > Hello all , >> > I'm working on simi

Re: [tesseract-ocr] Traineddata always ended in same size and did not match with wordlist

2018-01-10 Thread easymavinmind
It works !! I modified your bash script and executed it. Finally I get different traineddata size. But, can I train it from scratch? It needs starting traineddata which I can get from combine_lang_model, isn't it? On Tuesday, January 9, 2018 at 7:36:08 PM UTC+7, shree wrote: > > >> My reason

Re: [tesseract-ocr] Traineddata always ended in same size and did not match with wordlist

2018-01-10 Thread ShreeDevi Kumar
On Wed, Jan 10, 2018 at 3:56 PM, wrote: > It works !! > I modified your bash script and executed it. Finally I get different > traineddata size. > > But, can I train it from scratch? > It needs starting traineddata which I can get from combine_lang_model, > isn't it? > > ​Starter traineddata will

[tesseract-ocr] Variables having no effect on C# Tesseract.net 4.0.0.6 wrapper

2018-01-10 Thread James Q
Here is my code: string text = ""; string tessDataPath = ConfigurationManager.AppSettings["TessPath"]; using (var engine = new TessBaseAPI(@tessDataPath, @"eng")) { engine.SetVariable("tessedit_ocr_engine_mode", "0"); engine.SetPageSegMode(PageSegmentationMode.SINGLE_LINE); engine.SetV

Re: [tesseract-ocr] VietOCR 5.0 alpha availability

2018-01-10 Thread Quan Nguyen
Just updated again to use Tesseract 4.00 fast data. On Monday, January 8, 2018 at 5:16:50 PM UTC-6, Quan Nguyen wrote: > > Just updated the alpha versions with latest Tesseract 4.00alpha > executables. > > https://sourceforge.net/projects/vietocr/files/ > > On Monday, April 3, 2017 at 6:26:37 AM

Re: [tesseract-ocr] Re: Need Help with extracting info from Invoice

2018-01-10 Thread Afreen Ferdoash
I am trying to solve a similar problem, that of reading forms. Tesseract 4 is doing well but is DROPPING lots of words withing boxes. I thought this problem of dropping words existed with Indic languages but here I am having this issue for English too! I tried to fool around with some paramet

Re: [tesseract-ocr] Re: Need Help with extracting info from Invoice

2018-01-10 Thread ShreeDevi Kumar
On Wed, Jan 10, 2018 at 8:07 PM, Afreen Ferdoash wrote: > I am trying to solve a similar problem, that of reading forms. Tesseract > 4 is doing well but is DROPPING lots of words withing boxes. I thought > this problem of dropping words existed with Indic languages but here I am > having this i

[tesseract-ocr] Working on degraded test data

2018-01-10 Thread gale
Hi guys , I am working on some degraded text image ( Japanese ) . Is there any way to adjust Degraded Image on training set ? And should I do this ? Regard -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and st

Re: [tesseract-ocr] Re: Need Help with extracting info from Invoice

2018-01-10 Thread Afreen Ferdoash
it is still not making any difference On Wednesday, January 10, 2018 at 9:27:20 PM UTC+5:30, shree wrote: > > > On Wed, Jan 10, 2018 at 8:07 PM, Afreen Ferdoash > wrote: > >> I am trying to solve a similar problem, that of reading forms. Tesseract >> 4 is doing well but is DROPPING lots of wor

[tesseract-ocr] Re: Invalid Digit recognition

2018-01-10 Thread mark
Hi Just stumbled on this forum while looking for answers as to why the Tesseract Demo on the site would fail with my images (using very similar approach of single digits in images etc etc) Found that scaling the image height by 50% worked a charm thanks!! Never thought to do that!! Also cropp

Re: [tesseract-ocr] Re: How can I do the training using my own image in Tesseract 4.0

2018-01-10 Thread Anubhav Rohatgi
Hi Shree, The box file uploaded by you as the attachment seems to contradict with the LSTM4.0 training tutorial guidelines, as there it states that the boxes should actually be at line level instead of at character level. Please do correct me if I am wrong. I still am not able to understand ho