Hey, both of the pages answer on is steps after some text detected right?
https://i.pinimg.com/564x/bd/a3/d4/bda3d4bf11b0f727db1f9d81faac1b5d.jpg
i have all this type of images where I am not able to detect date at the 
top right. also contact name, phone, fax this is not correctly read every 
time or missed in detection part.
That's the reason i am asking i have a similar format of the document so if 
i trained the model on that it will help the model in the detection and 
recognition part?
I don't know how tesseract detecting the text from the whole form.
i have tried thresholding, scaling, sharpening but this can't give me 
results all time.

On Friday, February 12, 2021 at 1:04:29 AM UTC+5:30 g...@hobbelt.com wrote:

> Have you read the two pages linked to in the answer from february 5th?
> Have you executed those procedures, or anything similar, to extract the 
> individual table call images, to feed those to tesseract?
> So far you have not shown images or any results that show you have used a 
> tabular recognition and cell extraction process at all (which is a 
> preprocess required by the type of input image you have provided so far if 
> you want to significantly improve OCR output quality), so, *hey*, what are 
> your results so far following the sage advice (Feb 5)?
> (quoted below for convenience:)
>
>
> On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote:
>
>>
>> See 
>> https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/
>>
>>
>> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format
>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> --
>>
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
> --  
>
>
> Met vriendelijke groeten / Best regards,
>
> Ger Hobbelt
>
> --------------------------------------------------
> web:    http://www.hobbelt.com/
>         http://www.hebbut.net/
> mail:   g...@hobbelt.com
> mobile: +31-6-11 120 978
> --------------------------------------------------
>
>
> On Mon, Feb 8, 2021 at 1:47 PM Kumar Rajwani <kumarraj...@gmail.com> 
> wrote:
>
>> hey, i am still waiting for your reply. can  you please solve my doubts. 
>> On Sunday, February 7, 2021 at 8:13:56 AM UTC+5:30 Kumar Rajwani wrote:
>>
>>> hey can you please tell me how can i improve the text detection for the 
>>> same kind of images?
>>>
>>> On Friday, February 5, 2021 at 8:38:31 PM UTC+5:30 Kumar Rajwani wrote:
>>>
>>>> Thanks for this. i know about the usage of the tesseract. i have 
>>>> multiple images where i can't improve image quality so i want to improve 
>>>> my 
>>>> model to get text from it.
>>>> are you saying that text detection will not improve by training?
>>>> Because i don't have an issue with text recognition most of time it 
>>>> right.
>>>> can you tell me how can i improve the model to get more text from the 
>>>> image? I am using psm 11 where it find lot's of text but some are missing.
>>>>
>>>>
>>>> On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote:
>>>>
>>>>> Training won't fix that.
>>>>>
>>>>> See 
>>>>> https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/
>>>>>
>>>>>
>>>>> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format
>>>>>
>>>>> On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani <kumarraj...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> i have tried a lot of images where it getting 90% accuracy and 
>>>>>> missing always one side of image. that's the reason i want to train 
>>>>>> model 
>>>>>> if it can improve a little a bit it would be great.
>>>>>> if you can provide a script or steps that can help me it would be 
>>>>>> good for me.
>>>>>>
>>>>>> On Friday, February 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani 
>>>>>> wrote:
>>>>>>
>>>>>>> main thing is i want to learn about training tesseract on image 
>>>>>>> level so can you please tell me  how can i procced further. i want to 
>>>>>>> know 
>>>>>>> where is the main problem.
>>>>>>>
>>>>>>>
>>>>>>> On Friday, February 5, 2021 at 5:46:22 PM UTC+5:30 shree wrote:
>>>>>>>
>>>>>>>> I see the tabular image that you shared.  I don't think training is 
>>>>>>>> going to help you in this. eng.traineddata should be able to recognize 
>>>>>>>> it 
>>>>>>>> quite well. You should select the different areas of interest and just 
>>>>>>>> OCR 
>>>>>>>> those sections.
>>>>>>>>
>>>>>>>> On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani <kumarraj...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> i have tried to do same thing in tesseract 4 which stuck at 
>>>>>>>>> following line.
>>>>>>>>> Compute CTC targets failed!
>>>>>>>>>
>>>>>>>>> On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> !tesseract -v
>>>>>>>>>> tesseract 5.0.0-alpha-20201231-171-g04173
>>>>>>>>>>  leptonica-1.78.0
>>>>>>>>>>   libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 
>>>>>>>>>> : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>>>>>>>>>  Found AVX2
>>>>>>>>>>  Found AVX
>>>>>>>>>>  Found FMA
>>>>>>>>>>  Found SSE
>>>>>>>>>>  Found OpenMP 201511
>>>>>>>>>>  Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 
>>>>>>>>>> liblz4/1.7.1
>>>>>>>>>>
>>>>>>>>>> image example
>>>>>>>>>> i have added one image from my training data.
>>>>>>>>>>
>>>>>>>>>> i am using the colab system which have ubuntu os. 
>>>>>>>>>>
>>>>>>>>>> https://colab.research.google.com/drive/1_Bn4wbK6dE5zYAuFyC4Eczq_eNU2shuz?usp=sharing
>>>>>>>>>> this is my notebook you can see complete process in finetune 2 
>>>>>>>>>> section.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Friday, February 5, 2021 at 4:55:43 PM UTC+5:30 shree wrote:
>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 5, 2021 at 4:44 PM Kumar Rajwani <
>>>>>>>>>>> kumarraj...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> hi,
>>>>>>>>>>>
>>>>>>>>>>> i have tried minus 1 and got following result
>>>>>>>>>>>> Iteration 0: GROUND  TRUTH : ) @®
>>>>>>>>>>>> Iteration 0: BEST OCR TEXT : Yo
>>>>>>>>>>>> File eng.arial.exp0.lstmf line 0 :
>>>>>>>>>>>>
>>>>>>>>>>>  
>>>>>>>>>>>
>>>>>>>>>>>> What's your version of tesseract? What o/s?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Without your files, it's difficult to know what's causing the 
>>>>>>>>>>> issue.
>>>>>>>>>>>
>>>>>>>>>>> with -1 debug_interval you should get the info for every 
>>>>>>>>>>> iteration.
>>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>>>>
>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>> ____________________________________________________________
>>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>>>>
>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to tesseract-oc...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>
>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/5cb397af-eedd-40bb-979d-d7128ab7c64en%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/5cb397af-eedd-40bb-979d-d7128ab7c64en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f839b8b5-0996-445b-8607-9cc63c1a0d32n%40googlegroups.com.

Reply via email to