HI shree, so by running this command, the model will be in its integer/fast
version?
On Wed, Feb 24, 2021 at 10:27 AM shree wrote:
> You can create an integer/fast version of traineddata which cannot be used
> as START_MODEL for further training.
>
> `combine_tessdata -c myfile.traineddata`
>
>
Does anyone have any idea about making the traineddata file non trainable,
which meant to make it not applicable for fine-tuning by other
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails f
Hi everyone
Does Anyone know what is the actual size(may be in number of words) to
train. For example, for the traineddata bengali (ben), the trainingtext
size is 34.7 mb (for tesseract LSTM version) but for assamese (asm) I can
see the size of training text is only 140 kb (thi is also for tess
Does anyone has any links that describe the detail working of the tesseract
using LSTM. Like detail on what are the features extraction techniques and
all. Please let me know
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from t
HI shree, Is there any tools associated with tesseract that we can use for
preprocessing the images? Please advise
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
tutorial2016/6ModernizationEfforts.pdf>
>, #7
>
> <https://github.com/tesseract-ocr/docs/blob/master/das_tutorial2016/7Building%20a%20Multi-Lingual%20OCR%20Engine.pdf>
> have
>information about LSTM integration in Tesseract 4.0.
>
>
> On Wed, Sep 11, 2019 a
Shree do you have any other links that talk about how LSTM works in
tesseract OCR
On Wed, Sep 11, 2019 at 6:33 PM Shree Devi Kumar
wrote:
> https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#documentation
>
>
>
>
> On Wed, Sep 11, 2019 at 6:29 PM Jennil Thiyam
Does anyone has the link that describes the working of Tessercat 4, I found
paper that talks about the processing steps of tesseract 3, but failed to
get any research paper that describes tesseract 4. Please let me know
--
You received this message because you are subscribed to the Google Groups
Is it possible to add new traineddata in the repository, so that everyone
who knows the language can use it
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tess
I did fine-tuning by adding some words that contained the new characters
that I want. Now what I want to know is when we OCRed the document which is
not computerized printed but scan image, the accuracy drops. so I thought
if we trained the engine even in scan image then the accuracy won't be
dropp
Thanks, I will check it out.
On Thu, Jun 13, 2019 at 9:46 PM Jingjing Lin wrote:
> I think this link might be helpful although I didn't succeed for some
> reason:
> https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging
>
> 在 2019年6月13日星期四 UTC-4上午8:57:43,Jennil Thiy
Lets say I have a file "test.tiff" which i want to OCRed, can we get the
box file for this data. I know we get box file when creating training data,
but what I want is to see how the model is performing segmentation
algorithm over my testing data. I want to know this because i have some
character w
> Bye
>
> Lorenzo
>
> Il giorno dom 9 giu 2019 alle ore 10:50 Jennil Thiyam <
> thiyamjen...@gmail.com> ha scritto:
>
>> ই 110 4657 137 4701 0
>> ম্ফা 131 4660 191 4693 0
>> ল 185 4660 217 4689 0
>> , 217 4654 226 4667 0
>> 226 4650 240 4689
ই 110 4657 137 4701 0
ম্ফা 131 4660 191 4693 0
ল 185 4660 217 4689 0
, 217 4654 226 4667 0
226 4650 240 4689 0
জু 240 4650 277 4689 0
ন 269 4660 298 4689 0
298 4660 316 4689 0
১ 316 4660 332 4689 0
৩ঃ 334 4661 376 4688 0
376 4655 394 4701 0
হৌ 394 4655 441 4701 0
জি 436 4660 482 4701 0
ক 477
ince sanskrit
> training text did not have samples of all letters. I then also added any
> new characters that I wanted to add.
>
> On Thu, 6 Jun 2019, 14:01 Jennil Thiyam, wrote:
>
>> Manipuri language has been using two scripts, among them one is bengali
>> script wit
Manipuri language has been using two scripts, among them one is bengali
script with some extra characters,(these extra characters has been using in
Assamese's script). As tesseract gives an opportunity to train the already
existing model by adding some extra characters, i tried performing fine
tuni
ained on bengali, Bengali with ben, asm and English.
>
>
> https://github.com/tesseract-ocr/langdata_lstm/blob/master/script/Bengali.langs.txt
>
>
> On Tue, 4 Jun 2019, 17:11 Jennil Thiyam, wrote:
>
>> What is the difference between ben.traineddata and Bengali.tra
What is the difference between ben.traineddata and Bengali.traineddata,
some character are not recognised by the be.traineddata but it was
recognised by Bengali.traineddata.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from th
Thank you so much for all your help
On Fri, May 31, 2019 at 11:26 PM Jennil Thiyam
wrote:
> So, your suggestion is perform fine tuning process to this
> bengali.traineddata?
>
> On Fri, May 31, 2019 at 11:16 PM Shree Devi Kumar
> wrote:
>
>> https://github.com/tesserac
So, your suggestion is perform fine tuning process to this
bengali.traineddata?
On Fri, May 31, 2019 at 11:16 PM Shree Devi Kumar
wrote:
> https://github.com/tesseract-ocr/tessdata_best/tree/master/script
>
>
>
> On Fri, 31 May 2019, 23:01 Jennil Thiyam, wrote:
>
>> Wha
my guess is that the vowel maatraa that
> go on both sides of consonants may have been encoded as separate rather
> than one.
>
>
>
>
>
>
> On Fri, 31 May 2019, 22:40 Jennil Thiyam, wrote:
>
>> SHree Devi, any suggestions?
>>
>> On Fri, May 31, 2019
SHree Devi, any suggestions?
On Fri, May 31, 2019 at 5:45 PM Jennil Thiyam
wrote:
> Assamese used some extra characters which are not used in Bengali and our
> language, so I want to modify in ben.traineddata. I tried using
> asm.traineddata, it recognizes the character that I wante
;
> On Fri, 31 May 2019, 16:58 Shree Devi Kumar, wrote:
>
>> Please try the asm.traineddata which is for Assamese which is written in
>> Bengali script.
>>
>> On Fri, 31 May 2019, 16:55 Jennil Thiyam, wrote:
>>
>>> How come this character is in here??? I
I have followed the procedure (that is described in training tesseract 4
for fine tuning for putting plus-minus sign in eng.traineddata) to train
ben.traineddata (by adding one character which is not in the Bengali
alpahbets, more than 30 times, in ben.training_text). after creating
starter trainin
The character that i added is still not recognized, do you have any idea ?
On Thu, May 30, 2019 at 3:56 PM Shree Devi Kumar
wrote:
> You have to convert the checkpoint to traineddata - run lstmtraining with
> --stop_training flag
>
> On Thu, May 30, 2019 at 3:44 PM Jennil Thi
traineddata(that I got as an output of tesstrain.sh)
or is it the old traineddata?
On Thu, May 30, 2019 at 3:56 PM Shree Devi Kumar
wrote:
> You have to convert the checkpoint to traineddata - run lstmtraining with
> --stop_training flag
>
> On Thu, May 30, 2019 at 3:44 PM Jennil Thi
:
> --traineddata ~/tesstitorial/train_wa/ben/ben.traineddata \
>
> Typo tere tutorial check spelling
>
> On Thu, 30 May 2019, 12:05 Jennil Thiyam, wrote:
>
>> lstmtraining --model_output ~/tesstutorial/train_wa/wa \
>> > --continue_from ~/tesstutorial/train_wa/ben.lstm \
lstmtraining --model_output ~/tesstutorial/train_wa/wa \
> --continue_from ~/tesstutorial/train_wa/ben.lstm \
> --traineddata ~/tesstitorial/train_wa/ben/ben.traineddata \
> --old_traineddata tessdata/best/ben.traineddata \
> --train_listfile ~/tesstutorial/train_wa/ben.training_files.txt \
> --max
I add only one character like 30 times in the ben.training_text (that too
in the end of the original training text), which meant i dint modified the
original ben.training_text in large aspect. still why i am getting this
"normalization failed" in many of the words which are already in the
original
One simple question, I get confuse every time. The question is about
setting the TESSDATA_PREFIX environment variable.
Which path should i set?
*/usr/local/share/tessdata* (but here i could not find .traineddata,
but if this is the path, can i just copy the .traineddata to this folder
"tess
aineddata for LSTM training of language 'ben'
Run 'lstmtraining' command to continue LSTM training for language 'ben'
*No error, will this training data be good, i am asking this because i feel
lots of things are happening not in the way it has to belike it say
t of
> fonts.
>
> It all depends on what you want to accomplish with training.
>
> On Tue, May 28, 2019 at 5:59 PM Jennil Thiyam
> wrote:
>
>> training/tesstrain.sh \
>> --fonts_dir /c/Windows/Fonts \
>> --tessdata_dir ./tessdata \
>> --training_tex
s \
--exposures "0"\
--fontlist "Arial" \
"Arial Unicode MS" \
"Calibri" \
"Courier New" \
--output_dir ~/tesstutorial/araeval
can anyone tell me why do we need to create this eval data, i meant it
is also going to same as training data.
On Tue, Ma
Tue, May 28, 2019 at 10:26 AM Jennil Thiyam
> wrote:
>
>> do you mean to change only the path of this old traineddata(in the
>> command, that I underlined) to the path of ben.traineddata(that i am going
>> to download from tessdata_best)? or do i need to perform the
the estimated time it will take for 1500
iterations?
Thank you
On Mon, May 27, 2019 at 10:20 PM Shree Devi Kumar
wrote:
> You can download ben.traineddata from tessdata_best in a different
> location and use that as part of lstmtraining command
>
> On Mon, May 27, 2019 at 6:24 PM J
els can be used for finetuning.
>
> On Mon, May 27, 2019 at 4:25 PM Jennil Thiyam
> wrote:
>
>> yes...i extracted with the command combine_tessdata
>>
>> On Mon 27 May, 2019, 4:23 PM Shree Devi Kumar > wrote:
>>
>>> Has /ben_extract/ben.lstm be
yes...i extracted with the command combine_tessdata
On Mon 27 May, 2019, 4:23 PM Shree Devi Kumar Has /ben_extract/ben.lstm been extracted from
> /usr/share/tesseract-ocr/4.00/tessdata/ben.traineddata ?
>
> On Mon, May 27, 2019 at 2:55 PM Jennil Thiyam
> wrote:
>
>> I got
I got error whie trying to perform fine tuning, the command i used is below:
lstmtraining --model_output /model \
--continue_from /ben_extract/ben.lstm \
--traineddata /tesstutorial_output/ben/ben.traineddata \
--old_traineddata /usr/share/tesseract-ocr/4.00/tessdata/ben.traineddata \
--tr
I want to perform fine tuning over ben.traindata by adding one character.
It is written that for fine tuning what we need is to add only the
desirable characters to langdata/ben/ben,training_text. but in the folder
'ben' it consist other file also like ben.config,
ben.params_model,ben.word.bigram,
>
> On Wed, 22 May 2019, 18:16 Jennil Thiyam, wrote:
>
>> The layout of writing is in some manner in the ben_training.txt, (i have
>> attached the sshot). could u please explain how do i put my character in
>> this file
>>
>> On Wed, May 22, 2019 at 5:35 PM Je
The layout of writing is in some manner in the ben_training.txt, (i have
attached the sshot). could u please explain how do i put my character in
this file
On Wed, May 22, 2019 at 5:35 PM Jennil Thiyam
wrote:
> we used bengali script, but with one extra character, that is what i want
>
lready existing ben.traindata
> model.
>
> What character do you want to add?
>
> You should be able to do the same process as the plus-minus training for
> one character as shown in example for English.
>
> On Wed, May 22, 2019 at 1:51 PM Jennil Thiyam
> wrote:
>
>
I am planning to perform fine tuning training in ben.traindata.
According to he procedure written it is said to we that "The training
requires a new unicharset/recoder, optional language models, and the old
traineddata file containing the old unicharset/recoder." Here I get the old
traindata, bu
I am new in tessseract and ubuntu, plz forgive me if if my question does
not make sense. will it work if I put this new model inside the folder of
Tessdata that is situated in the program files folder?
On Sat, May 4, 2019 at 2:44 PM Shree Devi Kumar
wrote:
> Depends on where you keep the new tra
hat rather
> than the normal ones.
>
> Are you doing cut and paste from some word processor? This is probably
> causing all the errors...
>
>
>
> 2018-07-23 9:48 GMT+02:00 Jennil Thiyam :
>
>> I tried using Lohit Bengali and here is the command
>>
>> /usr
121: Meera
122: Mitra Mono
...
Lohit Bengali is in it, so please tell me why is the error, do i need to do
something others too?
On Sun, Jul 22, 2018 at 11:00 AM, Shree Devi Kumar
wrote:
> See https://github.com/tesseract-ocr/tesseract/wiki/Fonts
>
> On Sun 22 Jul, 2018, 8:20 PM Jenn
it-bengali”.exp0.box does not exist
or is not readable
ERROR: /tmp/tmp.pBWa4wRHmt/ben/ben.“lohit-bengali”.exp0.box does not exist
or is not readable
SO , please tell is all the fonts which are in this FONTS folder are
already installed to tesseract or not?
On Sun, Jul 22, 2018 at 7:15 AM, Je
Oh sorry for the mistake...I put two dashes, still it says unrecognised..
On Sun 22 Jul, 2018, 4:27 PM Shree Devi Kumar, wrote:
> needs two dashes,
>
> On Sun, Jul 22, 2018 at 12:29 PM wrote:
>
>> hello again, i modified the error in the way you said and there is no
>> error. but now the same e
48 matches
Mail list logo