hey, i am still waiting for your reply. can you please solve my doubts. On Sunday, February 7, 2021 at 8:13:56 AM UTC+5:30 Kumar Rajwani wrote:
> hey can you please tell me how can i improve the text detection for the > same kind of images? > > On Friday, February 5, 2021 at 8:38:31 PM UTC+5:30 Kumar Rajwani wrote: > >> Thanks for this. i know about the usage of the tesseract. i have multiple >> images where i can't improve image quality so i want to improve my model to >> get text from it. >> are you saying that text detection will not improve by training? >> Because i don't have an issue with text recognition most of time it right. >> can you tell me how can i improve the model to get more text from the >> image? I am using psm 11 where it find lot's of text but some are missing. >> >> >> On Friday, February 5, 2021 at 7:53:26 PM UTC+5:30 shree wrote: >> >>> Training won't fix that. >>> >>> See >>> https://www.pyimagesearch.com/2020/09/07/ocr-a-document-form-or-invoice-with-tesseract-opencv-and-python/ >>> >>> >>> https://stackoverflow.com/questions/61265666/how-to-extract-data-from-invoices-in-tabular-format >>> >>> On Fri, Feb 5, 2021 at 6:14 PM Kumar Rajwani <kumarraj...@gmail.com> >>> wrote: >>> >>>> i have tried a lot of images where it getting 90% accuracy and missing >>>> always one side of image. that's the reason i want to train model if it >>>> can >>>> improve a little a bit it would be great. >>>> if you can provide a script or steps that can help me it would be good >>>> for me. >>>> >>>> On Friday, February 5, 2021 at 5:50:30 PM UTC+5:30 Kumar Rajwani wrote: >>>> >>>>> main thing is i want to learn about training tesseract on image level >>>>> so can you please tell me how can i procced further. i want to know >>>>> where >>>>> is the main problem. >>>>> >>>>> >>>>> On Friday, February 5, 2021 at 5:46:22 PM UTC+5:30 shree wrote: >>>>> >>>>>> I see the tabular image that you shared. I don't think training is >>>>>> going to help you in this. eng.traineddata should be able to recognize >>>>>> it >>>>>> quite well. You should select the different areas of interest and just >>>>>> OCR >>>>>> those sections. >>>>>> >>>>>> On Fri, Feb 5, 2021 at 5:33 PM Kumar Rajwani <kumarraj...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> i have tried to do same thing in tesseract 4 which stuck at >>>>>>> following line. >>>>>>> Compute CTC targets failed! >>>>>>> >>>>>>> On Friday, February 5, 2021 at 5:04:42 PM UTC+5:30 Kumar Rajwani >>>>>>> wrote: >>>>>>> >>>>>>>> !tesseract -v >>>>>>>> tesseract 5.0.0-alpha-20201231-171-g04173 >>>>>>>> leptonica-1.78.0 >>>>>>>> libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : >>>>>>>> libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 >>>>>>>> Found AVX2 >>>>>>>> Found AVX >>>>>>>> Found FMA >>>>>>>> Found SSE >>>>>>>> Found OpenMP 201511 >>>>>>>> Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 >>>>>>>> liblz4/1.7.1 >>>>>>>> >>>>>>>> image example >>>>>>>> i have added one image from my training data. >>>>>>>> >>>>>>>> i am using the colab system which have ubuntu os. >>>>>>>> >>>>>>>> https://colab.research.google.com/drive/1_Bn4wbK6dE5zYAuFyC4Eczq_eNU2shuz?usp=sharing >>>>>>>> this is my notebook you can see complete process in finetune 2 >>>>>>>> section. >>>>>>>> >>>>>>>> >>>>>>>> On Friday, February 5, 2021 at 4:55:43 PM UTC+5:30 shree wrote: >>>>>>>> >>>>>>>>> On Fri, Feb 5, 2021 at 4:44 PM Kumar Rajwani < >>>>>>>>> kumarraj...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> hi, >>>>>>>>> >>>>>>>>> i have tried minus 1 and got following result >>>>>>>>>> Iteration 0: GROUND TRUTH : ) @® >>>>>>>>>> Iteration 0: BEST OCR TEXT : Yo >>>>>>>>>> File eng.arial.exp0.lstmf line 0 : >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> What's your version of tesseract? What o/s? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Without your files, it's difficult to know what's causing the >>>>>>>>> issue. >>>>>>>>> >>>>>>>>> with -1 debug_interval you should get the info for every iteration. >>>>>>>>> >>>>>>>> -- >>>>>>> >>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to tesseract-oc...@googlegroups.com. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/342f3faf-b107-4243-845e-ba8a16274122n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/e663d426-2c32-432b-80b3-4ff9d8fe86d4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >>> >>> -- >>> >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5cb397af-eedd-40bb-979d-d7128ab7c64en%40googlegroups.com.