Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread Navaneetha Bitla
yeah i've tried to train with these images but its giving dpi etc error. Then i've moved to ttf font then converted ttf to tiff finally trained the data but output is very bad, i dont know whether bad results for training process or dataser. Still trying to make progress. On Thu, Jun 21, 2018 at

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread Shree Devi Kumar
Tesseract4 LSTM training is line based. On Thu 21 Jun, 2018, 12:25 PM chandra churh chatterjee, < chandrachurh.chatterje...@gmail.com> wrote: > Excuse me @Shree Devi Kumar can you please tell me whether data for > training tesseract 4.0 would be better if the data has images which have > paragrap

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread Shree Devi Kumar
I had tried training with the handwriting font you mentioned in first message. I think that font has same shapes for capitals as well as lower case letters. So recognition rates will be lower for it. On Thu 21 Jun, 2018, 1:49 PM Navaneetha Bitla, wrote: > yeah i've tried to train with these im

[tesseract-ocr] Tesseract installation rhel7

2018-06-21 Thread Vidur Malhotra
Can anyone help me with steps for installing Tesseract on rhel7? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread James Q
Quite a few of these handwriting fonts are uppercase letters only (so lowercase come out as uppercase when typed) . What is the best type of [lang].training_text data to use for training these - is it uppercase only? On Thursday, June 21, 2018 at 10:24:11 AM UTC+1, shree wrote: > > I had tried t

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread Shree Devi Kumar
> Quite a few of these handwriting fonts are uppercase letters only (so lowercase come out as uppercase when typed) . What is the best type of [lang].training_text data to use for training these - is it uppercase only? It would depend on the application where training is being used. If you want s

[tesseract-ocr] [Begginer] Tesseract doesn't recognize numbers. [C#]

2018-06-21 Thread Steven Jeanne
First of all, I'm not english (yet I hope you'll understand what I try to explain) I just discovered Tesseract, and i'm trying to use it to get information from screen. Actually I need to get decimals numbers such as : 5 / 84.12 / 45,74 (etc) private void button1_Click(object sender, EventArg

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread James Q
Hi Shree, I'm trying out the script you posted earlier which is great so thank you! I was wondering how many fonts I can specify at once in the 'fonts_for_training' list. I have run it with 9 fonts at once and that seems fine but I would like to do 100s or even 1000s if I can. Is this the best

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread Shree Devi Kumar
You can use ALL fonts at once. However, I have had errors with box files not being created for some fonts and the tesstrain_utils.sh script dies only at end while checking whether files are readable or not. In that case have to restart the process again. On Thu, Jun 21, 2018 at 8:28 PM James Q w

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread fadifawzi55
@Shree Thanks for providing the two bash scripts I want to ask you about tesstrain.sh and tesstrain_utils.sh, Is there something that must be edited before running lstmtrain_finetune_impact.sh ? On Wednesday, June 20, 2018 at 11:56:27 PM UTC+3, shree wrote: > > Here are the bash script files: >

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread Shree Devi Kumar
# Make about 150 lines of representative training text for finetuning finetune_training_text=$langdata_dir/$Lang/$Lang.finetune.training_text # Make about 150 lines of representative training text for evaluation eval_training_text=$langdata_dir/$Lang/$Lang.eval.training_text On Thu, Jun 21, 2

Re: [tesseract-ocr] recognising roman with sanskrit diacritics

2018-06-21 Thread yajva
done On Wednesday, June 20, 2018 at 9:05:01 PM UTC+5:30, shree wrote: > > I am attaching the OCRed text. Please correct it so that I can use as > groundtruth for further training and testing. > > On Wed, Jun 20, 2018 at 3:15 PM Shree Devi Kumar > wrote: > >> I had done a training for sanskrit f

Re: [tesseract-ocr] recognising roman with sanskrit diacritics

2018-06-21 Thread yajva
one more correction. On Thursday, June 21, 2018 at 11:34:00 PM UTC+5:30, yajva wrote: > > done > > On Wednesday, June 20, 2018 at 9:05:01 PM UTC+5:30, shree wrote: >> >> I am attaching the OCRed text. Please correct it so that I can use as >> groundtruth for further training and testing. >> >>

[tesseract-ocr] How to upgrade to Tesseract 4.0 with C++ in Visual Studio.

2018-06-21 Thread Chris
Following these steps: https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows on the official projects "Compiling page" I was successfully able to get tesseract 3.05.01 and other required packages installed to start using #include in visual studio. However, tesseract 3.05.01 isn't

[tesseract-ocr] Getting error while creating .lstm files

2018-06-21 Thread Harathi Surya
Hi, I am trying to create .lstm files to finetune tesseract4.0.0 for new characters. I want to fine tune tesseract to recognize new characters like ±. What i tried: I added text that consists of the plus or minus symbol to the eng.training_text in langdata. Then I tried to run the following com

Re: [tesseract-ocr] Getting error while creating .lstm files

2018-06-21 Thread Shree Devi Kumar
Look at src/training/language_specific.sh The list of default fonts for English is being picked up from there and you probably don't have them installed. Use fonts that are available. On Fri, Jun 22, 2018 at 9:20 AM Harathi Surya wrote: > Hi, > > I am trying to create .lstm files to finetune t