Hi Vinay,

I am trying to solve the same problem here. Have you managed to get some 
solution to your problem. Your help would be greatly appreciated.  Looking 
forward to hearing from you.

Many thanks!!

On Tuesday, November 18, 2014 at 8:53:08 PM UTC+1, Vinay Matam wrote:
>
> Hi All,
>
> I really need your help with one of the projects that I am working on. I 
> am using Tesseract 3.02 on a Ubuntu machine.
>
> I have an invoice (please see the attached file). I want to extract some 
> information from that invoice like Advisor Name, Invoice Number, Invoice 
> Date, License No, Mileage etc..
>
> I have tried to extract the whole data from the image to a text file. By 
> doing some pre-processing on the image using Imagemagick, I was able to 
> extract the info to some extent. However, I am not totally satisfied with 
> the output. 
> I need your inputs on how I should extract the information. Shall I first 
> crop the specific portion of the image to different rectangles and then OCR 
> them individually..? I tried this way and gained great results. But again 
> in this case, not all the images are in the same size with same resolution 
> and hence the rectangles co-ordinates will not work on all the cases. I 
> thought this method will not work on all images (scanned, taken from mobile 
> or pdf files).
>
> Then I thought of using Regular expressions on the extracted data and then 
> pick up the data that I require from the whole text file. But this method 
> also does not seem to be working. 
>
> I am totally in a confused state now. Any help or inputs are much 
> appreciated. .. :) I have attached a sample image and the extracted output.
>
> Thanks,
> Vinay.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/24133b1a-949b-490a-aff5-32e277359237%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to