Dear Shree,
I've tried it with the format below and combined letter-and-sign-symbols
(see attached file)
and with WordStr-Format (see attached file),
but still the same error...
Kind regards, Jochen
Am 18.04.19 um 17:40 schrieb Shree Devi Kumar:
The following format (as in your box file) will
Hello Jochen,
I prefer the Wordstr format since it is easier to correct the text with
ground truth, so I have not tested with the lstmbox file.
A cursory glance at the file shows that the lstmbox file does not have
lines with spaces between words.
Another point to remember when training with ima
zip file is too big. Let me do an alternative upload.
Training runs ok for me -
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Continuing from /home/ubuntu/tessdata_best/script/Devanagari.lstm
Loaded 13/13 lines (1-13) of document NKP/dp10.lstmf
Loaded 13/13 lines (1-13) of document NKP/dp1
Uploaded the files at https://github.com/Shreeshrii/tessdata_sanskrit
See NKP.sh and folder NKP
The first part of the script loops through the images and creates Wordstr
box files for same using tesseract.
It then uses sed to replace the reognized text by the ground truth.
This corrected box file
Thanks a lot.
The error seems to be the missing space after the tab character in line
below »WordStr«!
Kind regards,
Jochen
Am 23.04.19 um 12:02 schrieb Shree Devi Kumar:
Uploaded the files at https://github.com/Shreeshrii/tessdata_sanskrit
See NKP.sh and folder NKP
The first part of the
Glad you figured out the problem.
Please consider sharing the improved traineddata file (when you complete
training) for tessdata_contrib repo.
On Tue, 23 Apr 2019, 16:24 Jochen Barth, wrote:
> Thanks a lot.
>
> The error seems to be the missing space after the tab character in line
> below »Wo
It seems like there are three forms for a word stored in the
eng.lstm-word-dawg.
For example the word 'book' has three different forms: lower case (book),
upper case (BOOK) and caption case (Book).
When we check whether a word is in the dictionary or not, do we really care
about their forms?
Whe
Hi, Zdenko,
My ".configure" log is following and I think I found the issue. Let me
post my log file first:
"""
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking wheth
Thanks for getting back to me. When i run it i get an error, any ideas why
and how to resolve it?
pi@ShopFloorOCRReader:~ $ tesseract --psm 6 "test_images/cropped_image.jpg"
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
read_params_file: parameter not found:
On Sunday, 14 Ap
What about:
tesseract --help
;-)
Zdenko
ut 23. 4. 2019 o 16:59 alex kelly napísal(a):
> Thanks for getting back to me. When i run it i get an error, any ideas
> why and how to resolve it?
>
> pi@ShopFloorOCRReader:~ $ tesseract --psm 6
> "test_images/cropped_image.jpg"
> Tesseract Open Source
Hi,
I used the ocrd_train for fine tune training, the start model is
eng.traineddata in tessdata_best.
the traineddata file I got after training is around 11-12MB, less than the
original eng.traineddata which is 15MB.
Is it normal? or I have done something wrong during training? I have added
Hi,
I suspect you did a cut and paste or some edits and now you have some
non-printable characters in your command line (the question mark boxes).
Write it again from scratch.
And you are missing one parameter in the command line, the output file, you
can use "-" for standard output.
$ tesseract
please provide output of command:
dpkg -l | cut -d " " -f 3 | grep "icu\|cairo\|pango\|pkg-config"
and config.log file (you can compress it with e.g. gzip before sending)
Zdenko
ut 23. 4. 2019 o 16:30 Tairen Chen napísal(a):
> Hi, Zdenko,
> My ".configure" log is following and I think I
Hi, Zdenko,
Thank you for your reply.
I uninstall every package that I had installed and remove unzip the
packages.
Then, I follow the link to install again.
Now the training is working. :-)
Thank you for pointing me to the configure file and I understand what
is "
>Why not just use ocrd for fine tune training? Just set up your START_MODEL
as chi_sim.
Because I have trained a chi_sim model from Tesseract-OCR, and I don't
have too many sample images.
Shanshan Wang 于2019年4月22日周一 下午8:34写道:
> Why not just use ocrd for fine tune training? Just set up your STA
I am using Tess-Two Library to train Punjabi and Hindi language data for
android platform (Android Studio 3.3.1) and getting space issue (recognized
words are not separated by spaces). While searching for solution I found
that this issue is resolved in tesseract 4.0, but how can I use the same
for
16 matches
Mail list logo