First of all: Unless you share input image, it does not make sense to share output.
Next - read the doc. You can start here https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md If you fail with image preprocessing and document analysis/text detection, training will not help you. If you need to know the model in detail - you will need to read the source code (I am afraid) . Zdenko ut 5. 10. 2021 o 16:51 Ruchika Tyagi <ruve...@gmail.com> napísal(a): > hi Zdenko, > > Thanks for your feedback! > > I have implemented the following things in Colab: > 1/ installed tesseract ocr and pytesseract > 2/ Used pytesseract.image_to_string to convert the image of scanned > document to text. > > The output text is like: > > sae S\Pewnowet refer Yo We Uniovetha, Bops don't a where MWAH ple > Commvadityer gre. Avediarie tee wode Onden OMe wol ' and On Wigs kcale. of > Oferakin, nee. es: [rer Bat Chain in Prd Vegelanie “roger | SP in Pst > Vegelasie “Wieder | ; AD Me ]8 inc ug Maer Contumneg hom Nes “I —> ty Uae | > . Mere ed Serigh Soma) > > Which is not making sense. > > So I was asking if there are ways to dig deeper into tesseract built in > model and understand the output of each layer. And then try some > enhancements to decode this better. > > But for that, I need to know the model in detail and should be able to use > it in Colab. and I am not able to find any relevant text around it. All I > could find is tuning of model from command line that too on Linux machines. > > So if there is any, would request you to provide a reference. > > Ruchika > > > On Tuesday, October 5, 2021 at 3:12:02 PM UTC+5:30 zdenop wrote: > >> Generally: new user + "i want to train tesseract" = fail >> >> If you are asking for help/support, provide information about what you >> have already tried, some examples of input images, tools you are able/plan >> to use... >> >> Zdenko >> >> >> ut 5. 10. 2021 o 11:36 Ruchika Tyagi <ruv...@gmail.com> napísal(a): >> >>> hello, >>> >>> I am new to Tesseract and trying to use it for one of the use case. >>> >>> I wonder if there is any way to use the already trained models through >>> Colab? And further train them if required. >>> >>> I am actually looking for outputs after layers and may be remove the top >>> layer for further processing. However, till now I have not found anything >>> relevant around this. >>> >>> Can anyone please help? >>> >>> Thanks >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/3620138c-fb82-42ff-8080-1cb85c5119d3n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/3620138c-fb82-42ff-8080-1cb85c5119d3n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/cb321ecc-a28e-4e82-96ef-b4d28d328f10n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/cb321ecc-a28e-4e82-96ef-b4d28d328f10n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xMKvMz_4BPd2oi674dO8G6YiLuGqDRCjj95FtxYuZ87A%40mail.gmail.com.