Re: [tesseract-ocr] Re: Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread Shree Devi Kumar
See tesstrain_utils.sh On Thu, 20 Jun 2019, 10:55 hrishikesh kaulwar, wrote: > > Hey shree could you tell me what line in tesstrain.sh takes care of user > provided tiff box pairs. Like what is the line which creates lstmf files > from those pairs and then puts the name of lstmf files in trainin

[tesseract-ocr] Re: Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread hrishikesh kaulwar
Hey shree could you tell me what line in tesstrain.sh takes care of user provided tiff box pairs. Like what is the line which creates lstmf files from those pairs and then puts the name of lstmf files in training_list. Thanks in advance. On Tuesday, June 18, 2019 at 2:54:09 PM UTC+5:30, hrishik

[tesseract-ocr] Re: Suggest a method to improve tesseract results

2019-06-19 Thread hrishikesh kaulwar
Yes there are few cases like this in my data due to partially skewed scanning. can anyone tell me how to train tesseract or how to proceed with ocr in such cases? Dictionary is nice idea I think since I and l have very indistinguishable shapes in this font. Thanks for helping. On Thursday, June

[tesseract-ocr] Re: Suggest a method to improve tesseract results

2019-06-19 Thread hrishikesh kaulwar
Yes there are few cases like this in my data due to partially skewed scanning. can anyone tell me how to train tesseract or how to proceed with ocr in such cases? On Wednesday, June 19, 2019 at 5:07:18 PM UTC+5:30, hrishikesh kaulwar wrote: > > Dear all, > In the above image tesseract

[tesseract-ocr] Re: Suggest a method to improve tesseract results

2019-06-19 Thread hrishikesh kaulwar
There are some cases like this in my data due to partiall skewed scanning On Wednesday, June 19, 2019 at 5:07:18 PM UTC+5:30, hrishikesh kaulwar wrote: > > Dear all, > In the above image tesseract could not detect the first letter S > which is important for my purpose.Also there are fe

[tesseract-ocr] Re: Suggest a method to improve tesseract results

2019-06-19 Thread ElGato ElMago
Does it have to be distorted like that? It's amazing that human being can take it as an S. Is neural network ever capable of doing the same thing? If I and l do not take the same shape, I'd think of dictionary or post processing to switch them around. 2019年6月19日水曜日 20時37分18秒 UTC+9 hrishikesh ka

[tesseract-ocr] Re: What do we inherit from tessdata_best when doing fine tuning?

2019-06-19 Thread Jingjing Lin
I'm fine tuning for chi_sim, not eng. Which seems to be more complicated. 在 2019年6月19日星期三 UTC-4下午4:22:29,Jingjing Lin写道: > > We know that we do fine tuning for tesseract based on tessdata_best, but > what do we inherit from tessdata_best? Is it just the weights of the neural > nets? > > From wh

[tesseract-ocr] What do we inherit from tessdata_best when doing fine tuning?

2019-06-19 Thread Jingjing Lin
We know that we do fine tuning for tesseract based on tessdata_best, but what do we inherit from tessdata_best? Is it just the weights of the neural nets? >From what I have it looks like the new .unicharset only contains those characters in the .training_text I created. I guess this means the

Re: [tesseract-ocr] table ocr with tesseract(tess4j)

2019-06-19 Thread Timothy Snyder
Would you be able to provide an example of said table? On Wed, Jun 19, 2019 at 8:40 AM Momene Vigal wrote: > Hello, please im a beginner with tesseract actually using it with java > please can anyone help me with how to do the ocr of a table with > tesseract > in python or java > > -- > You rec

Re: [tesseract-ocr] Re: how to check .unicharset in a .traineddata file

2019-06-19 Thread Jingjing Lin
Thanks for your comments. So did you mean we cannot use the method to add a special character to eng to add a special character to chi_sim? We'll have to retrain the top layer to achieve this? Another question is, when we use a smaller .training_text, the .unicharset only contains a limited a

Re: [tesseract-ocr] Re: how to check .unicharset in a .traineddata file

2019-06-19 Thread Shree Devi Kumar
Old thread https://groups.google.com/forum/#!searchin/tesseract-ocr/layer$20chi_sim%7Csort:date/tesseract-ocr/iFMg7Gjczq4/f7_XRop2BAAJ On Wed, Jun 19, 2019 at 9:13 PM Shree Devi Kumar wrote: > Update: > > 1. When using a smaller training_text for chi_sim for plus training, the > unicharset gets

Re: [tesseract-ocr] Re: how to check .unicharset in a .traineddata file

2019-06-19 Thread Shree Devi Kumar
Update: 1. When using a smaller training_text for chi_sim for plus training, the unicharset gets restricted. So, merge the lstm-unicharset with it. 2. The unicharset for chi_sim using langdata is different from the one extracted from tessdata_best. so using training_text from langdata will add mo

[tesseract-ocr] table ocr with tesseract(tess4j)

2019-06-19 Thread Momene Vigal
Hello, please im a beginner with tesseract actually using it with java please can anyone help me with how to do the ocr of a table with tesseract in python or java -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group

Re: [tesseract-ocr] OCR pipeline with OpenCV

2019-06-19 Thread Mox Betex
> > > @Mox Betex > >> Did you train Tesseract? >> > >> Yes, I have. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com

Re: [tesseract-ocr] what mean updatesubtrainer?

2019-06-19 Thread Pndaza
In lsmttraining process, is say Can't encode transcription: 'ကင်းသည် ဖြစ်ရာ၏၊ မင်းမြတ် *ခြင်္သေ့*၏ ရှေးဦးစွာသောX ဤအင်္ဂါကို ယူအပ်၏။' in language ' when there have kinzi in string. On Wednesday, 19 June 2019 11:28:00 UTC+6:30, Pndaza wrote: > > I wrongly gave old traineddata (mya-layer.trained

[tesseract-ocr] Suggest a method to improve tesseract results

2019-06-19 Thread hrishikesh kaulwar
Dear all, In the above image tesseract could not detect the first letter S which is important for my purpose.Also there are few cases where I(capital i) and l(small L) are detected wrongly. what training or method I can use to improve tesseract results in such cases. Thanks in

Re: [tesseract-ocr] OCR pipeline with OpenCV

2019-06-19 Thread Nicolas Colomer
Thanks all for your answers! @Mox Betex > Did you train Tesseract? > @ElGato ElMago > Those images and fonts obviously are not for OCR. Need to improve images > and train fonts. No, I use tesseract vanilla, only binary tuning parameters. I'd like to avoid training my own model at first, but I

Re: [tesseract-ocr] Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread hrishikesh kaulwar
Okay I will ignore it. Just wanted to know what the generation of text file signifies in lstm train step since its unusual. Is it some decoding encoding error? Is it showing incomplete lstm training? I have attached a sample text file. You can check out the file. Tell me if you know what is w

Re: [tesseract-ocr] Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread hrishikesh kaulwar
Okay. On Wednesday, June 19, 2019 at 3:18:12 PM UTC+5:30, shree wrote: > > >Also one more doubt is when I use lstm.train command a text file also > gets generated with lstmf file > You can ignore that txt file. Only lstmf is used for further processing. > > On Wed, Jun 19, 2019 at 2:44 PM hrishi

Re: [tesseract-ocr] OCR pipeline with OpenCV

2019-06-19 Thread Lorenzo Bolzani
Hi Nicolas, I think what you did is good, you just need to play with pre-processing more. I usually process the images with Gimp until I can get a good results, then I try to do the same processing with opencv/PIL. You do not strictly need to threshold the image, a very very strong contrast is en

Re: [tesseract-ocr] Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread Shree Devi Kumar
>Also one more doubt is when I use lstm.train command a text file also gets generated with lstmf file You can ignore that txt file. Only lstmf is used for further processing. On Wed, Jun 19, 2019 at 2:44 PM hrishikesh kaulwar wrote: > Hello shree, > I tried again with .tif and lstm.train co

Re: [tesseract-ocr] Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread hrishikesh kaulwar
Hello shree, I tried again with .tif and lstm.train command generated .txt file again along with lstmf file. I don't think that's the error. Thanks for helping. On Wednesday, June 19, 2019 at 2:02:54 PM UTC+5:30, shree wrote: > > > eng.Arial_Regular.exp0.png > > The script expects tif file

Re: [tesseract-ocr] Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread Shree Devi Kumar
> eng.Arial_Regular.exp0.png The script expects tif files not png. On Wed, Jun 19, 2019 at 1:42 PM hrishikesh kaulwar wrote: > Thank you for your help. I have checked it many times. Could you tell me > where I am doing wrong? It takes my 3 tiff box pairs for example and copies > it into train d

Re: [tesseract-ocr] Custom Tiff/Box pairs support in tesstrain.sh

2019-06-19 Thread hrishikesh kaulwar
Thank you for your help. I have checked it many times. Could you tell me where I am doing wrong? It takes my 3 tiff box pairs for example and copies it into train directoey. Then it overwrites exp0.tif file with randomly generated text and text2image tool. Although 3 tiff box pairs are accepted

[tesseract-ocr] Re: Multiline tiff/txt

2019-06-19 Thread Mox Betex
I mean to train with OCR-D. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tes