hey can you please tell me how can i improve the text detection for the 
same kind of images?

On Friday, February 5, 2021 at 8:38:31 PM UTC+5:30 Kumar Rajwani wrote:

> Thanks for this. i know about the usage of the tesseract. i have multiple 
> images where i can't improve image quality so i want to improve my model to 
> get text from it.
> are you saying that text detection will not improve by training?
> Because i don't have an issue with text recognition most of time it right.
> can you tell me how can i improve the model to get more text from the 
> image? I am using psm 11 where it find lot's of text but some are missing.
>
>
> On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote:
>
>> Training won't fix that.
>>
>> See 
>> https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/
>>
>>
>> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format
>>
>> On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani <kumarraj...@gmail.com> 
>> wrote:
>>
>>> i have tried a lot of images where it getting 90% accuracy and missing 
>>> always one side of image. that's the reason i want to train model if it can 
>>> improve a little a bit it would be great.
>>> if you can provide a script or steps that can help me it would be good 
>>> for me.
>>>
>>> On Friday, February 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani wrote:
>>>
>>>> main thing is i want to learn about training tesseract on image level 
>>>> so can you please tell me  how can i procced further. i want to know where 
>>>> is the main problem.
>>>>
>>>>
>>>> On Friday, February 5, 2021 at 5:46:22 PM UTC+5:30 shree wrote:
>>>>
>>>>> I see the tabular image that you shared.  I don't think training is 
>>>>> going to help you in this. eng.traineddata should be able to recognize it 
>>>>> quite well. You should select the different areas of interest and just 
>>>>> OCR 
>>>>> those sections.
>>>>>
>>>>> On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani <kumarraj...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> i have tried to do same thing in tesseract 4 which stuck at following 
>>>>>> line.
>>>>>> Compute CTC targets failed!
>>>>>>
>>>>>> On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani 
>>>>>> wrote:
>>>>>>
>>>>>>> !tesseract -v
>>>>>>> tesseract 5.0.0-alpha-20201231-171-g04173
>>>>>>>  leptonica-1.78.0
>>>>>>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : 
>>>>>>> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>>>>>>  Found AVX2
>>>>>>>  Found AVX
>>>>>>>  Found FMA
>>>>>>>  Found SSE
>>>>>>>  Found OpenMP 201511
>>>>>>>  Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 
>>>>>>> liblz4/1.7.1
>>>>>>>
>>>>>>> image example
>>>>>>> i have added one image from my training data.
>>>>>>>
>>>>>>> i am using the colab system which have ubuntu os. 
>>>>>>>
>>>>>>> https://colab.research.google.com/drive/1_Bn4wbK6dE5zYAuFyC4Eczq_eNU2shuz?usp=sharing
>>>>>>> this is my notebook you can see complete process in finetune 2 
>>>>>>> section.
>>>>>>>
>>>>>>>
>>>>>>> On Friday, February 5, 2021 at 4:55:43 PM UTC+5:30 shree wrote:
>>>>>>>
>>>>>>>> On Fri, Feb 5, 2021 at 4:44 PM Kumar Rajwani <kumarraj...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> hi,
>>>>>>>>
>>>>>>>> i have tried minus 1 and got following result
>>>>>>>>> Iteration 0: GROUND  TRUTH : ) @®
>>>>>>>>> Iteration 0: BEST OCR TEXT : Yo
>>>>>>>>> File eng.arial.exp0.lstmf line 0 :
>>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>> What's your version of tesseract? What o/s?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Without your files, it's difficult to know what's causing the issue.
>>>>>>>>
>>>>>>>> with -1 debug_interval you should get the info for every iteration.
>>>>>>>>
>>>>>>> -- 
>>>>>>
>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com.
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> -- 
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/25cd50a4-58b8-4e42-bfda-3a5403800d0fn%40googlegroups.com.

Reply via email to