Re: [tesseract-ocr] fine tuning on images

2024-03-27 Thread Zdenko Podobny
You can easily test your hypothesis by modifying Makefile[1] lines from tesseract "$<" $* --psm $(PSM) lstm.train to tesseract "$<" $* --psm $(PSM) -l $(START_MODEL) lstm.train [1] https://github.com/tesseract-ocr/tesstrain/blob/19f79e2d38dfeada41a96c8d87426c85a7eaa454/Makefile#L242-L255 Z

[tesseract-ocr] fine tuning on images

2024-03-14 Thread roei shlezinger
Hello, I have relatively clear images in Hebrew and Tesseract produces reasonable but not perfect results. I thought about continuing to train the model to make them better but ran into a problem. Here is the command I run: "bash-4.4# make training MODEL_NAME=test11 GROUND_TRUTH_DIR=/home/tesst

[tesseract-ocr] Fine Tuning

2024-01-23 Thread Simon
Hello everybody, I just finished fine tuning according to Ray's tutorial. I did the following steps: 1. I used tesstrain.sh to create training data and the starter traineddata. The training data consists of the eng.training_text with the multiple times added ± character. 2.

[tesseract-ocr] Fine Tuning with image containing multiple languages

2022-12-16 Thread Jacob Pedersen
Hi Consider an image containing a mix of English and German text. Extracting wordstr boxes from it and fixing mistakes. When fine tuning the two languages, I get encoding errors for English as it does not contain German chars. What is the correct approach here? 1. Ignore encoding errors? What

[tesseract-ocr] Fine tuning tha.traineddata with character that is not in original unichaset file

2022-09-19 Thread Unnop Paripunnang
I would like to fine tuning tesseract traineddata with Thai language (tha). But unfortunately, after extract original tha.traineddata from official tesseract tessdata-best. I've found that there is some character missing in tha.unicharset, e.g. Thai number ๐ ๑ ๒ ๓ ๕ (0 1 2 3 5) is appear in tha

[tesseract-ocr] Fine-tuning a trained data in arabic.

2022-03-10 Thread Wolf Assi
I have noticed that the "ara-Scheherazade" trained data was trained for the "Traditional Arabic" font. I have tried it, it performs well but with low accuracy, and has a problem when it comes to arabic numerals as the numbers are inverted. I want to fix the issue. I have tried to fine-tune it fo

[tesseract-ocr] Fine- tuning on windows

2021-10-03 Thread Samruddhi Dhake
Hello, I have diameter symbol in my image nd I have to train that too. I am trying to do fine tuning for symbol. What are the required steps to fine tuning on Windows? Thanks and regards, Samruddhi -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" grou

[tesseract-ocr] Fine-tuning error

2021-06-10 Thread Antonio Pardo de Santayana Navarro
Hello, I am attempting to train tesseract using the tesstrain repo, but getting this error message as i make training. make: * [Makefile:279: data/foo/checkpoints/foo_checkpoint] Violación de segmento (core dumped) [Núcleo vaciado a un archivo] I also get some warnings along the way that might

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-10-11 Thread Shree Devi Kumar
Tesseract will make a checkpoint, if needed, every 100 iterations, so I suggest a minimum 50-100 line images to test finetuning. Also, one of your image samples has a lot of noise on the right side. Crop all extra parts. Also for `ben` you should choose the Indic language option in tesstrain. On S

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-10-10 Thread Fazle Rabbi
i did the process manually for 5-6 images. i attached some samples of the line images and ground truth. then i ran >> make training MODEL_NAME= START_MODEL=ben TESSDATA= the resulting .traineddata file seem to not have any connection with the original 'ben' file. the ocr produces unreadable text

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-10-10 Thread Shree Devi Kumar
What command did you use? Difficult to help without seeing what training data you used. On Sat, Oct 10, 2020, 09:31 Fazle Rabbi wrote: > Hi. I have a similar goal in mind about finetuning the 'ben' traineddata > with the pictures i am working with. The picture will be an id so the names > of pe

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-10-09 Thread Fazle Rabbi
Hi. I have a similar goal in mind about finetuning the 'ben' traineddata with the pictures i am working with. The picture will be an id so the names of people have to be recognized correctly. I tried the (line image,ground truth) way of finetuning the traineddata with very small number of images

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-27 Thread Shree Devi Kumar
Thank you for sharing the results of your trial with fine-tuning and getting better results with the official traineddata after pre-processing the images. Hope your notes will help other users with similar questions. On Sun, Sep 27, 2020, 20:51 Grad wrote: > @shree thank you for the advice, it

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-27 Thread Grad
@shree thank you for the advice, it was helpful. I managed to get everything working satisfactorily: after adding additional training images, I now get perfect results (446 pass, 0 fail)! Furthermore, these results come with using the built-in "eng" model. I ended up not needing to re-train or

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-20 Thread Shree Devi Kumar
Resize your images so that text is 36 pixels high. That's what is used for eng models. Since you are fine tuning, limit number of iterations to 400 or so (not 1 which is default). Use dedug_level of -1 during training so that you can see the details per iteration. On Sun, Sep 20, 2020, 00:

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-19 Thread Grad
What matters is the *contents* of each ground truth file, not the filename, correct? (so long as the ground truth filename matches the PNG image filename, not counting the extension) On Saturday, September 19, 2020 at 12:12:19 PM UTC-5 Grad wrote: > If it turns out to be that simple, I will fee

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-19 Thread Grad
If it turns out to be that simple, I will feel really relieved and really stupid at the same time. I cannot believe I didn't catch this before posting. Thank you for taking a look, I'll fix my ground-truth file creator script and try again. On Saturday, September 19, 2020 at 12:01:50 PM UTC-5 s

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-19 Thread Shree Devi Kumar
You will get better results when you fix your training data (I deleted all file names ending in -2 and -3). Mean rms=0.145%, delta=0.046%, train=0.214%(1.01%), skip ratio=0% Iteration 396: GROUND TRUTH : 5,500,000 File data/swtor-ground-truth/5,500,000.lstmf line 0 (Perfect): Mean rms=0.145%, del

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-19 Thread Shree Devi Kumar
> Each of my PNG files have file names that indicate ground truth, and I have a little script that generates ground-truth TXT files from the PNG file names. Please review your script. I notice a number of file names ending with -2. The gt.txt files for the same also contain -2 while the image only

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-19 Thread Gradalajage
Absolutely! The following Google Drive link is for the "training_data.7z" archive for the training data itself: https://drive.google.com/file/d/1z8XH8JxOzlqol9ZU9hS8a4wWBR5Db9Km/view?usp=sharing Also, here is a link to "data.7z" which contains my "./tesstrain/data" directory contents, which incl

Re: [tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-19 Thread Shree Devi Kumar
Please share your training data so that we can test. Thanks. Virus-free. www.avg.com

[tesseract-ocr] Fine-tuning via tesstrain repo gives me poorer results than built-in eng model

2020-09-18 Thread Gradalajage
I have 395 PNG files depicting numbers with commas. The images are 130x54 pixels and are black text on white background. Here is an example of an image showing the number 638,997: [image: 638,997.png] I would like to use Tesseract to perform reliable OCR on these images and others like them. Out

Re: [tesseract-ocr] fine tuning from traineddata_best

2020-04-03 Thread Lorenzo Bolzani
Hi, tesstrain (https://github.com/tesseract-ocr/tesstrain) works very well. It is not the same thing as tesstrain.sh, it was called ocr-d before. tesstrain works only with single lines. You need only the images and the corresponding gt.txt files, it will create the tiff, box files and ltmsf, unich

Re: [tesseract-ocr] fine tuning from traineddata_best

2020-04-03 Thread hmaster
1. So essentially, I need to create a box file and ground-truth file for each image I have, and run it with tesstrain repo. Which doesn't work 2. That's what I understood from the README as well. 3. Unfortunately, I've tried it already, and have not come too far with that e

Re: [tesseract-ocr] fine tuning from traineddata_best

2020-04-03 Thread Shree Devi Kumar
There are alternate approaches to training. tesstrain.sh in tesseract repo works on training text and fonts, creating synthetic training data as multi-page tifs. tesstrain repo uses a makefile for training from images with their corresponding ground truth. For fine-tuning for a font, both can wo

Re: [tesseract-ocr] fine tuning from traineddata_best

2020-04-03 Thread Shree Devi Kumar
As per the info given by Ray Smith, lead developer of tesseract, if you just need to fine-tune for a new font face, use fine-tune by impact. His example uses the training text from langdata repo (approx 80 lines) rendered with the font, generating lstmf files and then running lstmtraining on that

[tesseract-ocr] fine tuning from traineddata_best

2020-04-03 Thread hmaster
Hello, I am trying to improve accuracy for my use case, by fine tuning. Currently I'm getting between 80-90% accuracy on my scanned images, and around 60% for images taken via phone. I'm running on a Jetson Nano, using: ``` tesseract 4.1.1-rc2-21-gf4ef leptonica-1.78.0 libgif 5.1.4 : libjpeg

Re: [tesseract-ocr] Fine tuning existing model

2019-09-08 Thread Lorenzo Bolzani
Hi Ayush, usually images are denoised much more. I think the standard models are trained on pure black on pure white background, maybe with a little noise. I think it could work even on these images especially with fine tuning. But this is not the typical training data, I'm not surprised you have p

Re: [tesseract-ocr] Fine tuning existing model

2019-09-08 Thread Ayush Pandey
Hi Lorenzo, Shree - Here is the link of the images for which no lsmtf files were generated -> https://drive.google.com/drive/folders/1VDBPB_k-oOXbWUI3zIlB3ljuyIlOkoMK?usp=sharing . - Here is the Makefile that I used for generating lstmf files -> https://drive.google.com/open?i

Re: [tesseract-ocr] Fine tuning existing model

2019-09-06 Thread Lorenzo Bolzani
Hi Ayush, psm 6 and 7 do some extra pre-processing of the image, 13 does much less. Unless your image contains text like this: I would not expect much difference between PSM 6/7 and 13. While PSM 13 solves some problems I got more "ghost letters" errors (letters that are repeated

Re: [tesseract-ocr] Fine tuning existing model

2019-09-06 Thread Ayush Pandey
Hi Lorenzo. The empty output was due to the fact that I was using 7 as PSM parameter. Using 13 as PSM parameter completely eliminated the problem. On Friday, September 6, 2019 at 12:34:22 PM UTC+5:30, Lorenzo Blz wrote: > > Can you please share an example? > > An empty output usually means that

Re: [tesseract-ocr] Fine tuning existing model

2019-09-06 Thread Lorenzo Bolzani
Can you please share an example? An empty output usually means that it failed to recognize the black parts as text, this could be because the text is too big or too small or a wrong dpi setting. Or the image is not reasonably clean. To better understand the problem you can try to downscale the im

Re: [tesseract-ocr] Fine tuning existing model

2019-09-05 Thread Ayush Pandey
Hi shree, Thank you so much for your response. I also wanted to ask, I do get an empty output on a lot of images, after training, the height and width of the image in pixels is usually > 100. Apart from changing the psm value, is there any other way to reduce this. On Thursday, Sep

Re: [tesseract-ocr] Fine tuning existing model

2019-09-05 Thread Shree Devi Kumar
See https://github.com/tesseract-ocr/tesstrain/wiki/GT4HistOCR#tesseract-fails-to-create-lstm-files On Thu, Sep 5, 2019 at 1:25 PM Ayush Pandey wrote: > Tesseract Version: 4.1.0 > > I am trying to fine tune tesseract on custom dataset with the following > Makefile: > > export > > SHELL := /bin/b

Re: [tesseract-ocr] Fine tuning existing model

2019-09-05 Thread Ayush Pandey
Tesseract Version: 4.1.0 I am trying to fine tune tesseract on custom dataset with the following Makefile: export SHELL := /bin/bash HOME := $(PWD) TESSDATA = $(HOME)/tessdata LANGDATA = $(HOME)/langdata # Train directory # TRAIN := $(HOME)/train_data TRAIN := /media/vimaan/Data/OCR/tesserac

[tesseract-ocr] Fine tuning without losing generalizability

2019-08-28 Thread Ayush Pandey
Hi, I am using the following Makefile to fine tune eng.traineddata from tessdata_best on my data. export SHELL := /bin/bash HOME := $(PWD) TESSDATA = $(HOME)/tessdata LANGDATA = $(HOME)/langdata # Train directory TRAIN := $(HOME)/train_data # Name of the model to be built MODEL_NAM

[tesseract-ocr] fine tuning a few characters generating training images error

2019-06-13 Thread Jingjing Lin
when I tried to create new training data using the command below for fine tuning a few characters: src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang chi_sim --linedata_only \ --noextract_font_properties --langdata_dir ../langdata \ --tessdata_dir ./tessdata --output_dir ~/tesstut

[tesseract-ocr] Fine tuning

2019-05-23 Thread Jennil Thiyam
I want to perform fine tuning over ben.traindata by adding one character. It is written that for fine tuning what we need is to add only the desirable characters to langdata/ben/ben,training_text. but in the folder 'ben' it consist other file also like ben.config, ben.params_model,ben.word.bigram,

Re: [tesseract-ocr] Fine tuning existing model

2019-05-03 Thread Tairen Chen
Thank you for your further explanation, Shree!! On Friday, May 3, 2019 at 2:59:12 AM UTC-7, shree wrote: > > >There are three model sizes: best, normal and fast. Each of these can > also be converted to an integer model. > > Only `best` can be converted to integer and in fact the LSTM models in

Re: [tesseract-ocr] Fine tuning existing model

2019-05-03 Thread Tairen Chen
Hi, Lorenzo, Thank you very much for your reply. It really gives more clue about the training. All the best, Tairen On Friday, May 3, 2019 at 2:30:12 AM UTC-7, Lorenzo Blz wrote: > > See answer inline. > > Il giorno ven 3 mag 2019 alle ore 03:48 Tairen Chen > ha scritt

Re: [tesseract-ocr] Fine tuning existing model

2019-05-03 Thread Lorenzo Bolzani
Shree, thanks for the clarification. Il giorno ven 3 mag 2019 alle ore 11:59 Shree Devi Kumar < shreesh...@gmail.com> ha scritto: > >There are three model sizes: best, normal and fast. Each of these can > also be converted to an integer model. > > Only `best` can be converted to integer and in fa

Re: [tesseract-ocr] Fine tuning existing model

2019-05-03 Thread Shree Devi Kumar
>There are three model sizes: best, normal and fast. Each of these can also be converted to an integer model. Only `best` can be converted to integer and in fact the LSTM models in `tessdata` are the integer versions of best along with the base/legacy models. `fast` models have been trained with

Re: [tesseract-ocr] Fine tuning existing model

2019-05-03 Thread Lorenzo Bolzani
See answer inline. Il giorno ven 3 mag 2019 alle ore 03:48 Tairen Chen ha scritto: > > 1. I define the "--max_iterations 2" but the training stops at > 5700, like below: > " At iteration 351/5700/5700, Mean rms=0.117%, delta=0%, char > train=0%, word train=0%, skip ratio=0%, wr

Re: [tesseract-ocr] Fine tuning existing model

2019-05-02 Thread Tairen Chen
Thank you very much for your quick answer, Lorenzo! You are right, it is an extra space at the beginning where the "TESSDATA" is defined not at the "lstmtraining" line. I still have few questions want to ask you for help. 1. I define the "--max_iterations 2" but the training

Re: [tesseract-ocr] Fine tuning existing model

2019-05-02 Thread Lorenzo Bolzani
Hi Tairen, the error is quite clear: Must provide a --traineddata see training wiki You say that it works if you run it as a single line so I suppose there is something wrong in the make file, probably a typo. Maybe there is a space or a tab after a "\" ? Maybe there are some extra characters fr

Re: [tesseract-ocr] Fine tuning existing model

2019-05-02 Thread Tairen Chen
Hi, Lorenzo and Shree Thanks for your sharing. I am trying to repeat what you have done here. I followed your posts and change the Makefile, but when I run $ make training, I got the following errors: mkdir -p data/checkpoints lstmtraining \ --contin

Re: [tesseract-ocr] Fine tuning existing model

2019-02-15 Thread Russia Aiyappa
Having a hard time training tesseract as I am naive to this. Is it possible to get the updated code for fine-tuning now that langdata is not supported? https://github.com/OCR-D/ocrd-train/issues/49 On Friday, 29 June 2018 08:09:09 UTC-4, shree wrote: > > I modified the makefile for ocrd-train t

[tesseract-ocr] Fine tuning Tesseract page layout analysis

2019-01-24 Thread dmitri
Is there a way to fine tune the page layout engine that Tesseract is using in order to better segment very complex and domain specific layouts? Especially when it comes to training tesseract to properly extract tabular data. -- You received this message because you are subscribed to the Googl

Re: [tesseract-ocr] Fine tuning the Old traineddat file

2018-10-11 Thread Shree Devi Kumar
No. On Thu, 11 Oct 2018, 03:41 Mugunthan, wrote: > Hi, > > Is there any way to fine to the old trained data files (3.05) using the > new version 4.00 [LSTM]? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this gro

[tesseract-ocr] Fine tuning the Old traineddat file

2018-10-11 Thread Mugunthan
Hi, Is there any way to fine to the old trained data files (3.05) using the new version 4.00 [LSTM]? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract

Re: [tesseract-ocr] Fine tuning existing model

2018-09-19 Thread Varun Sab
Thank you so much.. That worked. :) On Tuesday, September 18, 2018 at 9:24:53 PM UTC+5:30, shree wrote: > > If you are getting error > > !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 > !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 > > You are probably usi

[tesseract-ocr] Fine tuning ocr model - Poor detection results

2018-09-18 Thread Varun Sab
Hi, I am trying to train *Tesseract OCR 4.0 using images* instead of font. I have used OCR-D to train the images. But after 1 iterations error rate remains to 100. When i increased iterations to 10 (although smaller iterations are preferred everywhere) error rate drops to 7.8% but testin

Re: [tesseract-ocr] Fine tuning existing model

2018-09-18 Thread Shree Devi Kumar
If you are getting error !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 !int_mode_:Error:Assert failed:in file weightmatrix.cpp, line 244 You are probably using the traineddata fille which has an `integer` model. Please use tessdata_best as base for further training. On Tue,

Re: [tesseract-ocr] Fine tuning existing model

2018-09-18 Thread Varun Sab
HI @ Lorenzo Blz, I am also getting the same segmentation fault error. Can you please suggest how you solved it. On Friday, June 29, 2018 at 9:03:34 PM UTC+5:30, Lorenzo Blz wrote: > > Hi Shree, thanks for your answer. > > I tried the script setting: > > TESSDATA=extracted

Re: [tesseract-ocr] Fine tuning existing model

2018-07-02 Thread Lorenzo Bolzani
Hi Shree, I replaced the line: merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset $(TRAIN)/my.unicharset "$@" with: cp "$(TRAIN)/my.unicharset" "data/unicharset" (I write this in case someone else is following this thread). And now I have a fine tuned brand new model with only t

Re: [tesseract-ocr] Fine tuning existing model

2018-06-29 Thread Shree Devi Kumar
> ​ The problem was a "-gt.txt" rather than a ".gt.txt" as in my train files. Now I can run your script directly. Oh, I remember now. I had changed that for ease in renaming files for some reason. > In this way can I train a model that, for example, only recognize uppercase characters, or numbers

Re: [tesseract-ocr] Fine tuning existing model

2018-06-29 Thread Lorenzo Bolzani
I think I found the problem. Running directly the new Makefile I had this error: make: *** No rule to make target 'data/train/alexis_ruhe01_1852_0018_022.box', needed by 'data/all-boxes'. Stop. The problem was a "-gt.txt" rather than a ".gt.txt" as in my train files. Now I can run your script dir

Re: [tesseract-ocr] Fine tuning existing model

2018-06-29 Thread Shree Devi Kumar
You should be able to use the new makefile after you make changes for all the directory locations to match your setup. Change the language from frk to eng, though the sample training text seems to be non-english. In which case it is better for you to use the appropriate language traineddata eg. te

Re: [tesseract-ocr] Fine tuning existing model

2018-06-29 Thread Lorenzo Bolzani
Hi Shree, thanks for your answer. I tried the script setting: TESSDATA=extracted # here I have the eng.lstm and eng.trainedata LANGDATA=langdata-master # all langdata downladed by OCR-D MODEL_NAME = eng CONTINUE_FROM = eng First I run the old Makefile to create the boxes.

Re: [tesseract-ocr] Fine tuning existing model

2018-06-29 Thread Shree Devi Kumar
I modified the makefile for ocrd-train to do fine-tuning. It is pasted below: export SHELL := /bin/bash LOCAL := $(PWD)/usr PATH := $(LOCAL)/bin:$(PATH) HOME := /home/ubuntu TESSDATA = $(HOME)/tessdata_best LANGDATA = $(HOME)/langdata # Name of the model to be built MODEL_NAME = frk # Name of

[tesseract-ocr] Fine tuning existing model

2018-06-29 Thread Lorenzo Bolzani
​​ Hi, I'm trying to do fine tuning of an existing model using line images and text labels. I'm running this version: tesseract 4.0.0-beta.3-56-g5fda leptonica-1.76.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0

[tesseract-ocr] Fine Tuning for ± a few characters in jpn and sim_chi

2017-09-02 Thread Hoang Vu
As the wiki say : New feature It is possible to add a few new characters to the character set > and train for them by fine tuning, without a large amount of training data. > > I'm trying to add some symbol to jpn.traineddata by using fine tuning a few characters but i'm so wondering how many t

Re: [tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread Ibr
how can I know how many lines in each lstmf file? I opened one with the notepad ++ and it was almost 7 line, and that can't be correct since I tried 61 font with 10 iterations > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To uns

Re: [tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread Ibr
thanks On Thursday, June 22, 2017 at 1:01:13 PM UTC+3, shree wrote: > > >what is the number of the iterations that will for sure cover the 40 > lstmf files? > > It will depend on number of lines in each file eg. If each file has 1000 > lines, then 40,000 iterations should cover all files once. >

Re: [tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread ShreeDevi Kumar
>what is the number of the iterations that will for sure cover the 40 lstmf files? It will depend on number of lines in each file eg. If each file has 1000 lines, then 40,000 iterations should cover all files once. You can use --target_error_rate 0.01 instead of number of iterations as a guide

[tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread Ibr
Hi, if I want to run the command: training/lstmtraining --model_output ~/tesstutorial/full_japanese/new \ --continue_from ~/tesstutorial/extracted_lstm/jpn.lstm \ --train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt \ --max_iterations 10 how can I match the --max_iterations

[tesseract-ocr] Fine Tuning all Fonts List

2017-06-19 Thread Ibr
Hi, for engtrain and engeval they almost have the same command but for eval you specify the font using the argument --font-list, while in train you define the fonts in language-specifics.sh , I ran both command and I noticed that they produce the same results files, except in engtrain case ther

[tesseract-ocr] Fine tuning with existing box/tiff pairs in Tesseract 4.0

2017-05-06 Thread an-an-kondratjeva
Hello everyone, I'm experimenting with handwriting recognition using Tesseract 4.0. More concrete, I want to train Tesseract to recognize one particular Russian handwriting. So, I wanted to add the "new font" (based on a bunch of tiff-images, which are a part of scanned archive, and box files) t