Hello!

Just found a tesseract and it seems a very great and powerful instrument,
but as we say in Russia, equipment in the hands of the fool is a 
scrap-metal...

So please, if somebody would be kind and help me to give advice 
step-by-step:
1. What to do
2. What to read/watch
3. Take a look on the result and give me a hint where to go next

My subject actually is that I have a lot of scanned (and many not scanned 
yet) books in mixed languages,
like English, Russian, Hindi, Bengali, sometimes kind of diacritic symbols, 
etc...
Most of them, I have to idea, is there any fonts available, which were they 
printed with...

But I'm ready to select on the image for the first time some letters, 
words, etc
Then tell to the program, which letter from image means as unicode char 
(not sure how does it called correctly)
So this way maybe possible to create missing fonts

So as I understood, the training neural network is kinda spiral process:
1. We have an image
2. We tell to the network, which part of the image is a symbol and what 
that symbol is (character code).
    This becomes a training materials
3. Network based on the first small experience (let's say 1 page) tries to 
recognize 2-nd page
4. We verify and correct if needed. It becomes more training materials

And so on, so steps 3-4 repeats until the whole book will not be recognized.
Sometimes step 2 will be invoked for new characters or patters, etc..

So I think, this is should be enough to understand my level on the subject 
and my goal,
so I request, please, if anybody would like to help me to establish the 
process
to recognize many rare books to be able to search and navigate among
tons of scriptures, which will be lost and burried by the time...

Thank You all very much,
best regards, Alexander

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f4d5673a-31f4-4c2b-91f2-6cb843943a41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to