[tesseract-ocr] Re: Need Help with extracting info from Invoice

Ha Hien Thu, 04 Jan 2018 04:16:45 -0800

Hi Djibril,
I am afraid that this is an old topic and he may not work with invoices 
anymore. I am also interested in extracting information from invoices. Have 
you tried to use tesseract with a dictionary
to improve accuracy? Because invoices have some particular data fields. You 
can see the manual here: 
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#config-files-and-augmenting-with-user-data
Tell me if you have better result. I will also tell you if I have.
Best,

Vào 20:32:50 UTC+1 Thứ Tư, ngày 06 tháng 12 năm 2017, Djibril Kaba đã viết:
>
> Hi Vinay,
>
> I am trying to solve the same problem here. Have you managed to get some 
> solution to your problem. Your help would be greatly appreciated.  Looking 
> forward to hearing from you.
>
> Many thanks!!
>
> On Tuesday, November 18, 2014 at 8:53:08 PM UTC+1, Vinay Matam wrote:
>>
>> Hi All,
>>
>> I really need your help with one of the projects that I am working on. I 
>> am using Tesseract 3.02 on a Ubuntu machine.
>>
>> I have an invoice (please see the attached file). I want to extract some 
>> information from that invoice like Advisor Name, Invoice Number, Invoice 
>> Date, License No, Mileage etc..
>>
>> I have tried to extract the whole data from the image to a text file. By 
>> doing some pre-processing on the image using Imagemagick, I was able to 
>> extract the info to some extent. However, I am not totally satisfied with 
>> the output. 
>> I need your inputs on how I should extract the information. Shall I first 
>> crop the specific portion of the image to different rectangles and then OCR 
>> them individually..? I tried this way and gained great results. But again 
>> in this case, not all the images are in the same size with same resolution 
>> and hence the rectangles co-ordinates will not work on all the cases. I 
>> thought this method will not work on all images (scanned, taken from mobile 
>> or pdf files).
>>
>> Then I thought of using Regular expressions on the extracted data and 
>> then pick up the data that I require from the whole text file. But this 
>> method also does not seem to be working. 
>>
>> I am totally in a confused state now. Any help or inputs are much 
>> appreciated. .. :) I have attached a sample image and the extracted output.
>>
>> Thanks,
>> Vinay.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/06659594-efd0-4d36-a2a0-144d5ef63968%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Need Help with extracting info from Invoice

Reply via email to