Hey, both of the pages answer on is steps after some text detected right? https://i.pinimg.com/564x/bd/a3/d4/bda3d4bf11b0f727db1f9d81faac1b5d.jpg i have all this type of images where I am not able to detect date at the top right. also contact name, phone, fax this is not correctly read every time or missed in detection part. That's the reason i am asking i have a similar format of the document so if i trained the model on that it will help the model in the detection and recognition part? I don't know how tesseract detecting the text from the whole form. i have tried thresholding, scaling, sharpening but this can't give me results all time.
On Friday, February 12, 2021 at 1:04:29 AM UTC+5:30 g...@hobbelt.com wrote: > Have you read the two pages linked to in the answer from february 5th? > Have you executed those procedures, or anything similar, to extract the > individual table call images, to feed those to tesseract? > So far you have not shown images or any results that show you have used a > tabular recognition and cell extraction process at all (which is a > preprocess required by the type of input image you have provided so far if > you want to significantly improve OCR output quality), so, *hey*, what are > your results so far following the sage advice (Feb 5)? > (quoted below for convenience:) > > > On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote: > >> >> See >> https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/ >> >> >> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format >> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > -- > > > Met vriendelijke groeten / Best regards, > > Ger Hobbelt > > -------------------------------------------------- > web: http://www.hobbelt.com/ > http://www.hebbut.net/ > mail: g...@hobbelt.com > mobile: +31-6-11 120 978 > -------------------------------------------------- > > > On Mon, Feb 8, 2021 at 1:47 PM Kumar Rajwani <kumarraj...@gmail.com> > wrote: > >> hey, i am still waiting for your reply. can you please solve my doubts. >> On Sunday, February 7, 2021 at 8:13:56 AM UTC+5:30 Kumar Rajwani wrote: >> >>> hey can you please tell me how can i improve the text detection for the >>> same kind of images? >>> >>> On Friday, February 5, 2021 at 8:38:31 PM UTC+5:30 Kumar Rajwani wrote: >>> >>>> Thanks for this. i know about the usage of the tesseract. i have >>>> multiple images where i can't improve image quality so i want to improve >>>> my >>>> model to get text from it. >>>> are you saying that text detection will not improve by training? >>>> Because i don't have an issue with text recognition most of time it >>>> right. >>>> can you tell me how can i improve the model to get more text from the >>>> image? I am using psm 11 where it find lot's of text but some are missing. >>>> >>>> >>>> On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote: >>>> >>>>> Training won't fix that. >>>>> >>>>> See >>>>> https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/ >>>>> >>>>> >>>>> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format >>>>> >>>>> On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani <kumarraj...@gmail.com> >>>>> wrote: >>>>> >>>>>> i have tried a lot of images where it getting 90% accuracy and >>>>>> missing always one side of image. that's the reason i want to train >>>>>> model >>>>>> if it can improve a little a bit it would be great. >>>>>> if you can provide a script or steps that can help me it would be >>>>>> good for me. >>>>>> >>>>>> On Friday, February 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani >>>>>> wrote: >>>>>> >>>>>>> main thing is i want to learn about training tesseract on image >>>>>>> level so can you please tell me how can i procced further. i want to >>>>>>> know >>>>>>> where is the main problem. >>>>>>> >>>>>>> >>>>>>> On Friday, February 5, 2021 at 5:46:22 PM UTC+5:30 shree wrote: >>>>>>> >>>>>>>> I see the tabular image that you shared. I don't think training is >>>>>>>> going to help you in this. eng.traineddata should be able to recognize >>>>>>>> it >>>>>>>> quite well. You should select the different areas of interest and just >>>>>>>> OCR >>>>>>>> those sections. >>>>>>>> >>>>>>>> On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani <kumarraj...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> i have tried to do same thing in tesseract 4 which stuck at >>>>>>>>> following line. >>>>>>>>> Compute CTC targets failed! >>>>>>>>> >>>>>>>>> On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> !tesseract -v >>>>>>>>>> tesseract 5.0.0-alpha-20201231-171-g04173 >>>>>>>>>> leptonica-1.78.0 >>>>>>>>>> libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 >>>>>>>>>> : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 >>>>>>>>>> Found AVX2 >>>>>>>>>> Found AVX >>>>>>>>>> Found FMA >>>>>>>>>> Found SSE >>>>>>>>>> Found OpenMP 201511 >>>>>>>>>> Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 >>>>>>>>>> liblz4/1.7.1 >>>>>>>>>> >>>>>>>>>> image example >>>>>>>>>> i have added one image from my training data. >>>>>>>>>> >>>>>>>>>> i am using the colab system which have ubuntu os. >>>>>>>>>> >>>>>>>>>> https://colab.research.google.com/drive/1_Bn4wbK6dE5zYAuFyC4Eczq_eNU2shuz?usp=sharing >>>>>>>>>> this is my notebook you can see complete process in finetune 2 >>>>>>>>>> section. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Friday, February 5, 2021 at 4:55:43 PM UTC+5:30 shree wrote: >>>>>>>>>> >>>>>>>>>>> On Fri, Feb 5, 2021 at 4:44 PM Kumar Rajwani < >>>>>>>>>>> kumarraj...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> hi, >>>>>>>>>>> >>>>>>>>>>> i have tried minus 1 and got following result >>>>>>>>>>>> Iteration 0: GROUND TRUTH : ) @® >>>>>>>>>>>> Iteration 0: BEST OCR TEXT : Yo >>>>>>>>>>>> File eng.arial.exp0.lstmf line 0 : >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> What's your version of tesseract? What o/s? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Without your files, it's difficult to know what's causing the >>>>>>>>>>> issue. >>>>>>>>>>> >>>>>>>>>>> with -1 debug_interval you should get the info for every >>>>>>>>>>> iteration. >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>>>> >>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/5cb397af-eedd-40bb-979d-d7128ab7c64en%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/5cb397af-eedd-40bb-979d-d7128ab7c64en%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f839b8b5-0996-445b-8607-9cc63c1a0d32n%40googlegroups.com.