Hello!
Just found a tesseract and it seems a very great and powerful instrument,
but as we say in Russia, equipment in the hands of the fool is a
scrap-metal...
So please, if somebody would be kind and help me to give advice
step-by-step:
1. What to do
2. What to read/watch
3. Take a look on the result and give me a hint where to go next
My subject actually is that I have a lot of scanned (and many not scanned
yet) books in mixed languages,
like English, Russian, Hindi, Bengali, sometimes kind of diacritic symbols,
etc...
Most of them, I have to idea, is there any fonts available, which were they
printed with...
But I'm ready to select on the image for the first time some letters,
words, etc
Then tell to the program, which letter from image means as unicode char
(not sure how does it called correctly)
So this way maybe possible to create missing fonts
So as I understood, the training neural network is kinda spiral process:
1. We have an image
2. We tell to the network, which part of the image is a symbol and what
that symbol is (character code).
This becomes a training materials
3. Network based on the first small experience (let's say 1 page) tries to
recognize 2-nd page
4. We verify and correct if needed. It becomes more training materials
And so on, so steps 3-4 repeats until the whole book will not be recognized.
Sometimes step 2 will be invoked for new characters or patters, etc..
So I think, this is should be enough to understand my level on the subject
and my goal,
so I request, please, if anybody would like to help me to establish the
process
to recognize many rare books to be able to search and navigate among
tons of scriptures, which will be lost and burried by the time...
Thank You all very much,
best regards, Alexander
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.