Re: [tesseract-ocr] Re: Same image and commonad giving different results

2019-02-04 Thread santhosh
Didn't solve the issue On Monday, February 4, 2019 at 6:34:25 PM UTC+5:30, shree wrote: > > https://github.com/tesseract-ocr/tessdata_best > > https://github.com/tesseract-ocr/tessdata > > On Mon, Feb 4, 2019 at 6:29 PM > wrote: > >> Where can i find the testdata_best or testdata? >> >> Still i am

[tesseract-ocr] Coordinates of the Text on the Mobile screen.

2019-02-04 Thread Rakesh Kumar
Hi, Recently i have success using Tesseract-ocr in converting PNG file into Text. Scenario: I am taking screenshot(PNG) of the Mobile app and using Tesseract for converting PNG file into Text. Question: When i convert PNG file into Text, can i also get coordinates(X,Y) of the

Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

2019-02-04 Thread Lorenzo Bolzani
To use ocrd you need to prepare image files and txt files with the same name but different extension. For example: sample1.png sample1.gt.txt The gt.txt is a simple text file containing the correct text, 145, for example. The images must be cropped with no border or just a couple of pixels. Text

[tesseract-ocr] how to train custom objects?

2019-02-04 Thread Shailesh Barve
Can someone help me out on creating custom training data? How is it done in tesseract ? Any tutorial or step by step guide would be helpful. Thank you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop re

[tesseract-ocr] Re: Question about train_listfile and eval_listfile

2019-02-04 Thread Kristóf Horváth
Oh boy, where to start! So first of all you are not alone not finding any information. Currently i am a week ahead of you, so im gonna share what i found out. Lets start with training_files.txt. Whats inside? /home/kh/tesstutorial/engtrain/eng.Arial.exp0.lstmf/home/kh/tesstutorial/engtrain/eng.

[tesseract-ocr] Re: Same image and commonad giving different results

2019-02-04 Thread santhosh
Where can i find the testdata_best or testdata? Still i am not able to get the result if i remove --oem 2 or use --oem 1 On Monday, February 4, 2019 at 4:45:04 PM UTC+5:30, sant...@artivatic.ai wrote: > > > I am using 'tesseract' command line to extract the information in this > image. > > Tess

Re: [tesseract-ocr] Same image and commonad giving different results

2019-02-04 Thread santhosh
Where can i find the testdata_best or testdata? Still i am not able to get the result if i remove --oem 2 or use --oem 1 On Monday, February 4, 2019 at 5:52:59 PM UTC+5:30, shree wrote: > > ubuntu@tesseract-ocr:~/TEST$ tesseract tmpy6s8p6m1.jpg stdout --psm 6 > --tessdata-dir ../tessdata_fast >

Re: [tesseract-ocr] Re: Same image and commonad giving different results

2019-02-04 Thread Shree Devi Kumar
https://github.com/tesseract-ocr/tessdata_best https://github.com/tesseract-ocr/tessdata On Mon, Feb 4, 2019 at 6:29 PM wrote: > Where can i find the testdata_best or testdata? > > Still i am not able to get the result if i remove --oem 2 or use --oem 1 > > On Monday, February 4, 2019 at 4:45:0

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 249

2019-02-04 Thread Kristóf Horváth
thx see this could be in the documentation it would be super awsome but dont worry you dont have to do anything just answer my upcoming questions and i will write it, but also gonna need a review on my final draft just to make sure my wording and the facts i managed to dig up are correct 2019.

Re: [tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 249

2019-02-04 Thread Shree Devi Kumar
> kh@DSAD-6 /usr/share/tessdata $ combine_tessdata -e ./eng.traineddata ~/tesstutorial/engoutput/eng.lstm Extracting tessdata components from ./eng.traineddata Wrote /home/kh/tesstutorial/engoutput/eng.lstm You need the traineddata from tessdata_best repo for use with training. On Mon, Feb 4

[tesseract-ocr] Assert failed:in file weightmatrix.cpp, line 249

2019-02-04 Thread Kristóf Horváth
Im using Cygwin (64, on win10) to compile tesseract and I ran the following commands and got the following error: > > kh@DSAD-6 /usr/share/tessdata > > $ tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Arial" "Impact >> Condensed" --lang eng --linedata_only --noextract_font_properties >>

Re: [tesseract-ocr] Same image and commonad giving different results

2019-02-04 Thread Shree Devi Kumar
Try your commands with --oem 1 or with default. It works fine TESSDATA_PREFIX=/home/ubuntu/tessdata_best $ tesseract -v tesseract 4.0.0-272-g005f leptonica-1.76.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0 $

Re: [tesseract-ocr] Same image and commonad giving different results

2019-02-04 Thread Shree Devi Kumar
ubuntu@tesseract-ocr:~/TEST$ tesseract tmpy6s8p6m1.jpg stdout --psm 6 --tessdata-dir ../tessdata_fast Warning: Invalid resolution 0 dpi. Using 70 instead. 1 GAAXCS8821M1Z8 ubuntu@tesseract-ocr:~/TEST$ ubuntu@tesseract-ocr:~/TEST$ tesseract tmpy6s8p6m1.jpg stdout --psm 6 --tessdata-dir ../tessdata_b

[tesseract-ocr] Re: Ocr-d train - Tesseract 4.0 Training

2019-02-04 Thread sarathgis93
Really appreciate your help!! I will try to workout what you have sent. Please send me your contact(email). Thanks again! On Monday, February 4, 2019 at 1:12:36 PM UTC+5:30, Kristóf Horváth wrote: > > So i have the same issue as you, no clue how tesseract works because of > bad documentaion, but

[tesseract-ocr] Re: tesseract 4 box files format

2019-02-04 Thread thebigwasp
Also, what if there are huge gaps between words in line on picture? If I set bounding box for the whole line, can Tesseract learn on that ? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving email

[tesseract-ocr] Same image and commonad giving different results

2019-02-04 Thread santhosh
I am using 'tesseract' command line to extract the information in this image. Tesseract 4.0.0-115-ge3a3 command used tesseract tmpy6s8p6m1.jpg stdout --oem 2 --psm 6 result 19AAXCS8821M1Z8 tesseract 4.0.0-274-gc999 command used tesseract tmpy6s8p6m1.jpg stdout --oem 2 --psm 6 result

Re: [tesseract-ocr] Ocr-d train - Tesseract 4.0 Training

2019-02-04 Thread sarathgis93
I checked that too.. I cannot able to understand how should I give input to tesseract, because it is not a book. I'm trying to do OCR for survey plans. If possible, please send your working OCRD folder, So that I will have a look and I will modify it. Please accept my invitation, So that I can a

[tesseract-ocr] tesseract 4 box files format

2019-02-04 Thread thebigwasp
Hello, after reading article about training Tesseract 4 ( https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00) I found it very confusing. My goal is to train existing model with new tiff/box pairs. After hours of googling how to generate box files I found all this: 1) https://

[tesseract-ocr] Question about train_listfile and eval_listfile

2019-02-04 Thread Krzysztof Kanafa
Helllo I'm completely new in tesseract, first version I'm using is 4.0.0. Sorry for noob question, but I really didn't find answer despite quite long searching. Its about these two options -> eval_listfile and train_listfile. What exactly should be in these files? Is there in train_listfile.txt