Training won't fix that.

See
https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/

https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format

On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani <kumarrajwani1...@gmail.com>
wrote:

> i have tried a lot of images where it getting 90% accuracy and missing
> always one side of image. that's the reason i want to train model if it can
> improve a little a bit it would be great.
> if you can provide a script or steps that can help me it would be good for
> me.
>
> On Friday, February 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani wrote:
>
>> main thing is i want to learn about training tesseract on image level so
>> can you please tell me  how can i procced further. i want to know where is
>> the main problem.
>>
>>
>> On Friday, February 5, 2021 at 5:46:22 PM UTC+5:30 shree wrote:
>>
>>> I see the tabular image that you shared.  I don't think training is
>>> going to help you in this. eng.traineddata should be able to recognize it
>>> quite well. You should select the different areas of interest and just OCR
>>> those sections.
>>>
>>> On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani <kumarraj...@gmail.com>
>>> wrote:
>>>
>>>> i have tried to do same thing in tesseract 4 which stuck at following
>>>> line.
>>>> Compute CTC targets failed!
>>>>
>>>> On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani wrote:
>>>>
>>>>> !tesseract -v
>>>>> tesseract 5.0.0-alpha-20201231-171-g04173
>>>>>  leptonica-1.78.0
>>>>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 :
>>>>> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>>>>  Found AVX2
>>>>>  Found AVX
>>>>>  Found FMA
>>>>>  Found SSE
>>>>>  Found OpenMP 201511
>>>>>  Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6
>>>>> liblz4/1.7.1
>>>>>
>>>>> image example
>>>>> i have added one image from my training data.
>>>>>
>>>>> i am using the colab system which have ubuntu os.
>>>>>
>>>>> https://colab.research.google.com/drive/1_Bn4wbK6dE5zYAuFyC4Eczq_eNU2shuz?usp=sharing
>>>>> this is my notebook you can see complete process in finetune 2 section.
>>>>>
>>>>>
>>>>> On Friday, February 5, 2021 at 4:55:43 PM UTC+5:30 shree wrote:
>>>>>
>>>>>> On Fri, Feb 5, 2021 at 4:44 PM Kumar Rajwani <kumarraj...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> hi,
>>>>>>
>>>>>> i have tried minus 1 and got following result
>>>>>>> Iteration 0: GROUND  TRUTH : ) @®
>>>>>>> Iteration 0: BEST OCR TEXT : Yo
>>>>>>> File eng.arial.exp0.lstmf line 0 :
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> What's your version of tesseract? What o/s?
>>>>>>>
>>>>>>
>>>>>> Without your files, it's difficult to know what's causing the issue.
>>>>>>
>>>>>> with -1 debug_interval you should get the info for every iteration.
>>>>>>
>>>>> --
>>>>
>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@googlegroups.com.
>>>>
>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>>
>>> --
>>>
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXnFEdN12RhmwUn-pyqBwFBbe1BzPQ4r28uOfP5DrnuSg%40mail.gmail.com.

Reply via email to