[tesseract-ocr] Tesseract Guide for newbies (first draft)

2019-02-07 Thread Kristóf Horváth
Hi, i set out to make a newbie friendly guide and i already have some stuff that might already help people, but its not complete yet. I would like people to read it and where they can help out with comments. I left places empty or left notes of my own pls feel free to figure out what should be

Re: [tesseract-ocr] Tesseract Guide for newbies (first draft)

2019-02-07 Thread Lorenzo Bolzani
Hi Kristof, good work, I thought about it a few times. I gave a quick look, just a couple of quick notes, I'll try to read it better when I get time. This thread about the font size is where I got the 30/40px indication: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tes

Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

2019-02-07 Thread Lorenzo Bolzani
You do not need any font or font data, just the images and the corresponding text. As a bare minimum 500/1000. Il giorno gio 7 feb 2019 alle ore 05:10 ha scritto: > Thanks for your response, Since these are handwritten digits I don't have > font data and what I'm having is cropped image blocks

Re: [tesseract-ocr] Tesseract Guide for newbies (first draft)

2019-02-07 Thread Kristóf Horváth
Dear Lorenzo, thank you for your input it is very much appreciated. I will go through your suggestions, because I have questions or clarifications. This thread about the font size is where I got the 30/40px indication: > > > https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!ms

Re: [tesseract-ocr] Tesseract Guide for newbies (first draft)

2019-02-07 Thread Shree Devi Kumar
You may want to see the following guide (found using Google search) https://www.endpoint.com/blog/2018/07/09/training-tesseract-models-from-scratch On Thu, 7 Feb 2019, 19:44 Kristóf Horváth Dear Lorenzo, > > thank you for your input it is very much appreciated. I will go through > your suggesti

Re: [tesseract-ocr] Tesseract Guide for newbies (first draft)

2019-02-07 Thread Kristóf Horváth
Thx shree. I will check it out tomorrow, but pls can you give a personal feedback? Also i left from stratch because it requires serious amount of sample data and a newbie wont have that but definetly will dig myself into this guide. 2019. február 7., csütörtök 16:43:11 UTC+1 időpontban shree a

Re: [tesseract-ocr] Tesseract Guide for newbies (first draft)

2019-02-07 Thread Shree Devi Kumar
>> iteration 31/100/100 see https://github.com/tesseract-ocr/tesseract/blob/3a7f5e4de459f4c64f36e08b18ce1b66b1fbc876/src/lstm/lstmtrainer.cpp#L410 / Appends iteration learning_iteration()/training_iteration()/ // sample_iteration() to the log_msg. void LSTMTrainer::LogIterations(const char* intr

[tesseract-ocr] Re: how to train using .box/tif file?

2019-02-07 Thread Rich Hart
Try this link https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0 On Tuesday, February 5, 2019 at 2:29:26 PM UTC-5, Shailesh Barve wrote: > > I have box file and image file (.tif). How do i generate my training data > using these files ? Pls help. > > Also i corrected my box f

[tesseract-ocr] ERROR: shared library version mismatch (was 4.0.0-279-gec8f, expected 4.0.0-255-gfc55

2019-02-07 Thread 한정협
I was try to use /src/training/tesstrain.sh with my own .tif/box files my tesseract version is below tesseract 4.0.0-279-gec8f leptonica-1.74.4 libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 === Starting training for language 'eng' ERROR: shared library ve

Re: [tesseract-ocr] Tesseract Guide for newbies (first draft)

2019-02-07 Thread Kristóf Horváth
Thank you Shree, that helps. 2019. február 7., csütörtök 17:31:24 UTC+1 időpontban shree a következőt írta: > > >> iteration 31/100/100 > > see > https://github.com/tesseract-ocr/tesseract/blob/3a7f5e4de459f4c64f36e08b18ce1b66b1fbc876/src/lstm/lstmtrainer.cpp#L410 > > / Appends iteration learni

[tesseract-ocr] What is the benefit of including old_traineddata in training?

2019-02-07 Thread Kristóf Horváth
Hi, I just would like some information on what exactly is the difference when i train including old_traineddata and when i dont. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,