Have you read the two pages linked to in the answer from february 5th? Have you executed those procedures, or anything similar, to extract the individual table call images, to feed those to tesseract? So far you have not shown images or any results that show you have used a tabular recognition and cell extraction process at all (which is a preprocess required by the type of input image you have provided so far if you want to significantly improve OCR output quality), so, *hey*, what are your results so far following the sage advice (Feb 5)? (quoted below for convenience:)
On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote: > > See > https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/ > > > https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format > > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- Met vriendelijke groeten / Best regards, Ger Hobbelt -------------------------------------------------- web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: g...@hobbelt.com mobile: +31-6-11 120 978 -------------------------------------------------- On Mon, Feb 8, 2021 at 1:47 PM Kumar Rajwani <kumarrajwani1...@gmail.com> wrote: > hey, i am still waiting for your reply. can you please solve my doubts. > On Sunday, February 7, 2021 at 8:13:56 AM UTC+5:30 Kumar Rajwani wrote: > >> hey can you please tell me how can i improve the text detection for the >> same kind of images? >> >> On Friday, February 5, 2021 at 8:38:31 PM UTC+5:30 Kumar Rajwani wrote: >> >>> Thanks for this. i know about the usage of the tesseract. i have >>> multiple images where i can't improve image quality so i want to improve my >>> model to get text from it. >>> are you saying that text detection will not improve by training? >>> Because i don't have an issue with text recognition most of time it >>> right. >>> can you tell me how can i improve the model to get more text from the >>> image? I am using psm 11 where it find lot's of text but some are missing. >>> >>> >>> On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote: >>> >>>> Training won't fix that. >>>> >>>> See >>>> https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/ >>>> >>>> >>>> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format >>>> >>>> On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani <kumarraj...@gmail.com> >>>> wrote: >>>> >>>>> i have tried a lot of images where it getting 90% accuracy and missing >>>>> always one side of image. that's the reason i want to train model if it >>>>> can >>>>> improve a little a bit it would be great. >>>>> if you can provide a script or steps that can help me it would be good >>>>> for me. >>>>> >>>>> On Friday, February 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani wrote: >>>>> >>>>>> main thing is i want to learn about training tesseract on image level >>>>>> so can you please tell me how can i procced further. i want to know >>>>>> where >>>>>> is the main problem. >>>>>> >>>>>> >>>>>> On Friday, February 5, 2021 at 5:46:22 PM UTC+5:30 shree wrote: >>>>>> >>>>>>> I see the tabular image that you shared. I don't think training is >>>>>>> going to help you in this. eng.traineddata should be able to recognize >>>>>>> it >>>>>>> quite well. You should select the different areas of interest and just >>>>>>> OCR >>>>>>> those sections. >>>>>>> >>>>>>> On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani <kumarraj...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> i have tried to do same thing in tesseract 4 which stuck at >>>>>>>> following line. >>>>>>>> Compute CTC targets failed! >>>>>>>> >>>>>>>> On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani >>>>>>>> wrote: >>>>>>>> >>>>>>>>> !tesseract -v >>>>>>>>> tesseract 5.0.0-alpha-20201231-171-g04173 >>>>>>>>> leptonica-1.78.0 >>>>>>>>> libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 >>>>>>>>> : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 >>>>>>>>> Found AVX2 >>>>>>>>> Found AVX >>>>>>>>> Found FMA >>>>>>>>> Found SSE >>>>>>>>> Found OpenMP 201511 >>>>>>>>> Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 >>>>>>>>> liblz4/1.7.1 >>>>>>>>> >>>>>>>>> image example >>>>>>>>> i have added one image from my training data. >>>>>>>>> >>>>>>>>> i am using the colab system which have ubuntu os. >>>>>>>>> >>>>>>>>> https://colab.research.google.com/drive/1_Bn4wbK6dE5zYAuFyC4Eczq_eNU2shuz?usp=sharing >>>>>>>>> this is my notebook you can see complete process in finetune 2 >>>>>>>>> section. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Friday, February 5, 2021 at 4:55:43 PM UTC+5:30 shree wrote: >>>>>>>>> >>>>>>>>>> On Fri, Feb 5, 2021 at 4:44 PM Kumar Rajwani < >>>>>>>>>> kumarraj...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> hi, >>>>>>>>>> >>>>>>>>>> i have tried minus 1 and got following result >>>>>>>>>>> Iteration 0: GROUND TRUTH : ) @® >>>>>>>>>>> Iteration 0: BEST OCR TEXT : Yo >>>>>>>>>>> File eng.arial.exp0.lstmf line 0 : >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> What's your version of tesseract? What o/s? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Without your files, it's difficult to know what's causing the >>>>>>>>>> issue. >>>>>>>>>> >>>>>>>>>> with -1 debug_interval you should get the info for every >>>>>>>>>> iteration. >>>>>>>>>> >>>>>>>>> -- >>>>>>>> >>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>> >>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com >>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> ____________________________________________________________ >>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> >>>> >>>> -- >>>> >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/5cb397af-eedd-40bb-979d-d7128ab7c64en%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/5cb397af-eedd-40bb-979d-d7128ab7c64en%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFP60fqqvG7kgiQHUOCt3H0GUiayZAxhEeZrS39jY6Hh32CX9w%40mail.gmail.com.