Re: [tesseract-ocr] Read Local Charter (Hindi , Tamil, Sinhala)

2018-02-21 Thread ShreeDevi Kumar
What operating system are you on? Which version of tesseract are you currently using? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Feb 21, 2018 at 10:09 AM, Aruna Gamage wrote: > Dear Sir, > > I need to read l

Re: [tesseract-ocr] Tesseract is giving column data on the last line of file

2018-02-22 Thread ShreeDevi Kumar
What --psm are you using? Tesseract might be treating the last portion as a different column. Try PSM 4 or 6. On 22-Feb-2018 3:48 PM, wrote: > > > > >

Re: [tesseract-ocr] Creating wordlist from high confidence words

2018-02-22 Thread ShreeDevi Kumar
Take a look at --user-words and the commands Combine_tessdata Dawg2wordlist Wordlist2dawg You can change the wordlist and it may improve chances of word being recognised, but I don't think recognition is limited to the list. It also depends on the version of tesseract that u r using. On 22-

Re: [tesseract-ocr] Error when doing the set_unicharset_properties command on Windows

2018-02-23 Thread ShreeDevi Kumar
Please open this as an issue in github repo - https://github.com/tesseract-ocr/tesseract/issues > the "/" is added without taking care if the command is used on Windows or Linux. Found a couple of places in that file where this is the case. // Load the unicharset for the script if available

Re: [tesseract-ocr] Tesseract is giving column data on the last line of file

2018-02-23 Thread ShreeDevi Kumar
Probably FF. Tesseract adds a page break (normally form feed) by default. It is still possible to suppress page breaks by setting an empty page_separator. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Feb 23,

Re: [tesseract-ocr] Error when doing the set_unicharset_properties command on Windows

2018-02-23 Thread ShreeDevi Kumar
I use mobaxterm and WSL (bash under windows) on Windows 10. If you are training for legacy tesseract engine (not LSTM) you can use Jtessboxeditor for training. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Feb 2

Re: [tesseract-ocr] Error when doing the set_unicharset_properties command on Windows

2018-02-23 Thread ShreeDevi Kumar
I have used git bash for running tesseract. Not tried for training. You can use the ppa from the link below, rather than trying to build it. https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr/+packages -- You received this message because you are subscribed to the Google Groups "tesse

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-02-25 Thread ShreeDevi Kumar
which version of tesseract are you using? See attached results with Tesseract 4 and eng from tessdata_fast ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Feb 25, 2018 at 8:16 PM, Zdenko Podobny wrote: > https

Re: [tesseract-ocr] Tesseract is giving column data on the last line of file

2018-02-26 Thread ShreeDevi Kumar
try -c page_separator= "\n" or the code for CRLF -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-02-26 Thread ShreeDevi Kumar
You can download latest version of tesseract-ocr and appropriate traineddata from https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr I ran tesseract via command line with default values. You may need to remove the existing old version, before installing new. On 27-Feb-2018 1:14 AM, "D

Re: [tesseract-ocr] Warning. Invalid resolution 0 dpi. Using 70 instead

2018-02-27 Thread ShreeDevi Kumar
Which version of tesseract are you using? On 27-Feb-2018 7:52 PM, "Terry Bryant" wrote: > Hello everyone. I'm facing this above problem when my input image is the > attached file. > > My os: ubuntu14.04 > My input image: in attached file(which is a .png file) > My command:

Re: [tesseract-ocr] Re: Read Local Charter (Hindi , Tamil, Sinhala)

2018-02-27 Thread ShreeDevi Kumar
Yes, it is possible to use tesseract for sinhala. Please mention the type of computer operating system you use and it's version so that I can send appropriate links for you to use. On 27-Feb-2018 4:07 PM, "Aruna Gamage" wrote: > Dear sir, > > Mainly I need sinhala language(Sri Lanka). > > Thank

Re: [tesseract-ocr] Tesseract for Android for Hindi language

2018-02-27 Thread ShreeDevi Kumar
> Hindi OCR using tess-two on Android Studio. Probably uses old version of tesseract and traineddata. For Hindi, you will get best result with tesseract (version 4.00alpha) and traineddata files from tessdata_fast ShreeDevi भजन - कीर्

Re: [tesseract-ocr] Warning. Invalid resolution 0 dpi. Using 70 instead

2018-02-27 Thread ShreeDevi Kumar
try with --psm 6, 7 or 8, i get correct results with it 6Assume a single uniform block of text. 7Treat the image as a single text line. 8Treat the image as a single word. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans

Re: [tesseract-ocr] Warning. Invalid resolution 0 dpi. Using 70 instead

2018-02-27 Thread ShreeDevi Kumar
Which traineddata file are you using? I am using the ones from tessdata_fast. I will have to recheck the commands that I had used. Usually it will be on the lines of tesseract input.png outputbase --psm 6 --oem 1 -l langcode On 28-Feb-2018 9:03 AM, "Terry Bryant" wrote: > Could you please wr

Re: [tesseract-ocr] I have a Question about Creating Traing Data

2018-02-27 Thread ShreeDevi Kumar
Please use tesstrain.sh script to creating training data. It will create the required box/tif files using the training text and list of fonts. The process uses the box/tif files for creating the lstmf files which are used for LSTM training. Since these box/tif files are used in the intermediate st

Re: [tesseract-ocr] Tesseract for Android for Hindi language

2018-02-27 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/875#issuecomment-369143904 You maybe able to build tesseract4 to use with tess-two using the suggestion in that thread. On 28-Feb-2018 12:34 PM, "Harshit Dohare" wrote: > Thanks for the reply. > > Since I am working to build android a

Re: [tesseract-ocr] I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-02-27 Thread ShreeDevi Kumar
training/tesstrain.sh \ --fonts_dir /usr/share/fonts \ --lang eng \ --linedata_only \ --noextract_font_properties \ --langdata_dir ../langdata \ --tessdata_dir ./tessdata \ --output_dir ~/tesstutorial/engtrain You should try to follow the above tutorial for training eng. You nee

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-02-28 Thread ShreeDevi Kumar
Try with following - make sure that you change all variables with dir to match your setup tesstrain.sh \ --lang kor \ --noextract_font_properties \ --linedata_only \ * --langdata_dir ../langdata \* * --tessdata_dir ../tessdata \* * --fonts_dir **/mnt/c/Windows/Fonts** \* --fontlist \ "Arial

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-02-28 Thread ShreeDevi Kumar
On Thu, Mar 1, 2018 at 9:21 AM, 이경준 wrote: > Thank U reply my question. > > But my system is operated by Ubuntu 16.04. 03 LTS > > I think that that path is not working ? Am I false? > > > 2018년 2월 28일 수요일 오후 6시 18분 41초 UTC+9, shree 님의 말: >> >> Try with following - make sure that you change all v

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-02-28 Thread ShreeDevi Kumar
> my system is operated by Ubuntu 16.04. 03 LTS > Yes .I tried tessdata - kor.trainnedata /// But it is not good enough. sorry .ㅜㅜ i can not use tesseract 4.0 tessdata-kor.trainnedata. in bussiness .. I will suggest that you uninstall your old tesseract version.(3.0x) sudo apt-get remove tesser

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-02-28 Thread ShreeDevi Kumar
>we don't understand each otehr saying. Sorry about that. Please give the following commands and let me know the result. tesseract -v tesseract --list-langs combine_tessdata -u kor.traineddata I do not know Korean, but feedback from other users has been that tesseract4 and the latest traineda

Re: [tesseract-ocr] Hindi language version not working. VietOCR.NET-4.5_64

2018-03-01 Thread ShreeDevi Kumar
That document is for an old version of tesseract. Please use vietocr version which supports tesseract 4.00alpha. Download traineddata files for 4.00alpha from tessdata_fast You can try OCR with both hin and Devanagari traineddata files. On 01-Mar-2018 3:23 PM, "Sohan Shekhawat" wrote: > Hello

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread ShreeDevi Kumar
> combine_tessdata -u kor.traineddata What is that meaning ? Could you explain for me ? That command will show and unpack the components of your traineddata file. eg. from tesdata_fast combine_tessdata -u ./tessdata_fast/kor.traineddata ./tessdata_fast/kor. Extracting tessdata components from ./

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread ShreeDevi Kumar
at version of tesseract program you are using. I have already sent you the bash script that you can modify for training. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Mar 1, 2018 at 6:36 PM, ShreeDevi

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread ShreeDevi Kumar
n tessdata and tessdata_best and >> tessdata_fast are NOT compatible. So, it depends on what version of >> tesseract program you are using. >> >> I have already sent you the bash script that you can modify for >> training. >> >> ShreeDevi >> ______

Re: [tesseract-ocr] What is difference between "unicharset file" and "lstm-unicharset file"

2018-03-01 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc#components ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Mar 2, 2018 at 6:22 AM, 이경준 wrote: > > Hi . Thank you for

Re: [tesseract-ocr] Re: Read Local Charter (Hindi , Tamil, Sinhala)

2018-03-02 Thread ShreeDevi Kumar
Please post issue in the appropriate repository ie https://github.com/rmtheis/android-ocr Sinhala language can be recognized using latest version of tesseract and traineddata from tessdata_fast repo. Please close this issue. ShreeDevi __

Re: [tesseract-ocr] tesseract data files

2018-03-02 Thread ShreeDevi Kumar
Hi Simon, If you are planning to package using 4.00alpha from master branch, please use traineddata files from tessdata_fast. These are the files that have been shipped for Ubuntu 18.04 and included in Debian. See https://github.com/tesseract-ocr/tesseract/wiki for some links. You can update the

Re: [tesseract-ocr] tesseract data files

2018-03-02 Thread ShreeDevi Kumar
evi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 3, 2018 at 9:42 AM, ShreeDevi Kumar wrote: > Hi Simon, > > If you are planning to package using 4.00alpha from master branch, please > use traineddata files fro

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-03-03 Thread ShreeDevi Kumar
No, I had not pre-processed the iame. I used tessdata_fast NOT tessdata_best.​ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 3, 2018 at 3:59 PM, Dusayanta Prasad wrote: > Please tell me one more thing. Bef

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-03-03 Thread ShreeDevi Kumar
ls -l /home/dusayanta/tesseract/tessdata/eng.traineddata combine_tessdata -d /home/dusayanta/tesseract/tessdata/eng.traineddata ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 3, 2018 at 5:57 PM, ShreeDevi

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-03-03 Thread ShreeDevi Kumar
Also check tesseract --list-langs ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Mar 3, 2018 at 6:22 PM, ShreeDevi Kumar wrote: > ls -l /home/dusayanta/tesseract/tessdata/eng.traineddata > > combine

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-03-03 Thread ShreeDevi Kumar
- आरती @ http://bhajans.ramparivar.com On Sat, Mar 3, 2018 at 6:24 PM, ShreeDevi Kumar wrote: > Also check > > tesseract --list-langs > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > >

Re: [tesseract-ocr] Tesseract convert image to gibberish

2018-03-03 Thread ShreeDevi Kumar
_ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Sat, Mar 3, 2018 at 6:24 PM, ShreeDevi Kumar >> wrote: >> >>> Also check >>> >>> tesseract --list-langs >>> >>> ShreeDevi >>>

Re: [tesseract-ocr] @Shree //// I have a question about Making a Traineddata which is finely tunned.

2018-03-04 Thread ShreeDevi Kumar
> > . > > *#section 1. (plus) I have a quesiton about a bash script you gave me* > > > *In the bash scripts* > > > 1. what is the criterion about extracting 100-120 lines ??? I have no idea. > Only 3 pages are processed by tesstrain.sh for making box/tiff files, so it will be about 120 lines of t

Re: [tesseract-ocr] @Shree //// I have a question about Making a Traineddata which is finely tunned.

2018-03-04 Thread ShreeDevi Kumar
> > Once you make a small training text and choose the fonts to use and modify >> the bash script to point to correct directory in your setup, it will >> perform all the training steps for finetuning. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr

Re: [tesseract-ocr] tesseract data files

2018-03-04 Thread ShreeDevi Kumar
just > needs me to repackage the resulting stuff. > > so tessdata_best isn't like the wiki says for better accuracy? > > greetings, > Simon > > Am 03.03.2018 um 05:12 schrieb ShreeDevi Kumar: > > Hi Simon, > > > > If you are planning to package usi

Re: [tesseract-ocr] How to use tesseract-ocr with Node.js + windows machines?

2018-03-05 Thread ShreeDevi Kumar
Take a look at https://github.com/nguyenq/VietOCR3 On Mon 5 Mar, 2018, 2:05 PM Prabhakar Manne, wrote: > How to use tesseract-ocr with Node.js + windows machines? Could you please > provide step by step guide so that I can install on machine to evaluate > tesseract for OCR. > > -- > You receive

Re: [tesseract-ocr] Image format

2018-03-05 Thread ShreeDevi Kumar
No. It can handle lot of image formats, via leptonica. Tesseract -v will show which image libs have been used while building leptonica. On Mon 5 Mar, 2018, 8:39 PM Dusayanta Prasad, wrote: > Is it necessary for Tesseract that the input should always be in .tif > format ? > > -- > You received t

Re: [tesseract-ocr] LSTM + Tesseract is better than LSTM Best

2018-03-06 Thread ShreeDevi Kumar
oem 2 is unsupported for traineddata files from tessdata_fast and tessdata_best. It should still work with trainedata files from tessdata repo. There is an issue tracking scenarios where the 'legacy' tesseract is better than the new LSTM. You can add more details there, if you like. ShreeDevi __

Re: [tesseract-ocr] LSTM + Tesseract is better than LSTM Best

2018-03-06 Thread ShreeDevi Kumar
This is the issue on github - https://github.com/tesseract-ocr/tesseract/issues/707 Removing the legacy OCR Engine #707 Open amitdo opened this issue on Feb 7, 2017 · 72 comments ShreeDevi भजन - कीर्तन - आर

Re: [tesseract-ocr] @shree / Fianlly I made the customzied (fine tuned) traineddata

2018-03-08 Thread ShreeDevi Kumar
Please look at the kor.config file in langdata. Maybe it is loading chi_tra The langdata files r from 3.04 On Thu 8 Mar, 2018, 2:27 PM 이경준, wrote: > Hi > > Fianlly I made the customzied (fine tuned) traineddata - korean > > > But, Run tesseract > > I have a problem. > > *Please make sure the TE

Re: [tesseract-ocr] tesseract (4.0) criterion

2018-03-09 Thread ShreeDevi Kumar
>From the wiki, home page Various types of training data can be found on GitHub . Unpack and copy the .traineddata file into a 'tessdata' directory. The exact directory will depend both on the type of training data, and your Linux distribtion. Possibilities are /

Re: [tesseract-ocr] Bad results on simple code image

2018-03-09 Thread ShreeDevi Kumar
Trying adding a small white border around the image and see if that gives better results. Which version of tesseract, which traineddata file, which os ? On Sat 10 Mar, 2018, 1:59 AM Benno Fünfstück, wrote: > Hi, > > I've tried to get tesseract to recognize a (in my opinion simple) image of > a

Re: [tesseract-ocr] I do not include 'chi_tra' in my tessdata folder . What is it ? I have seen language-specific.sh

2018-03-09 Thread ShreeDevi Kumar
I hope someone who knows Korean can answer your questions. On Sat 10 Mar, 2018, 12:48 PM 이경준, wrote: > Hi i'm sorry to question oftenly. and lots of questions. > > But, I must use tesseract 4.0 for my business . > > plz understand my situations. I have lots of family to raise. > > > ealier you

Re: [tesseract-ocr] Re: I do not include 'chi_tra' in my tessdata folder . What is it ? I have seen language-specific.sh

2018-03-10 Thread ShreeDevi Kumar
Lang1+lang2 should work. If it does not, please open an issue with an example image. If lang2 is English, you may want to try the script level traineddata, which includes English with the other languages . Please take a look at the readme file in tessdata_fast which explains about script level fi

Re: [tesseract-ocr] Tesseract tsv output not working

2018-03-11 Thread ShreeDevi Kumar
1. Please check that your tessdata/configs folder has a file called tsv. 2. Try giving a different output file name (NOT out). 3. Do hocr and pdf outputs work for you? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-03-12 Thread ShreeDevi Kumar
Please try tesseract 4.0.0beta.1 with languages such as *enm* (English, Middle (1100-1500)) and Fraktur script Also, look at the following project from a few years back http://emop.tamu.edu/outcomes/Franken-Plus ShreeDevi भजन - की

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-03-12 Thread ShreeDevi Kumar
files in it have not been updated for 4.0.0 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 12, 2018 at 2:00 PM, ShreeDevi Kumar wrote: > Please try tesseract 4.0.0beta.1 with languages such as >

Re: [tesseract-ocr] Re: tesseract 4.00 beta is released ? I saw the who use the tesseract 4.00 beta

2018-03-12 Thread ShreeDevi Kumar
Master branch in github repo at commit 40f4311 has been tagged as tesseract4.0.0beta.1 - Please see https://github.com/tesseract-ocr/tesseract/releases/tag/4.0.0-beta.1 That commit is the one which has be

Re: [tesseract-ocr] Training tesseract 4.0 with large training text

2018-03-12 Thread ShreeDevi Kumar
Please look at tesstrain.sh It is setting max-pages to 3 for text2image invocation. You can change it there. On Tue 13 Mar, 2018, 6:54 AM , wrote: > Dear all, > > I'm trying to train lstm using a large training text, different fonts, > colors etc. I'm trying to use text2image to generate my tif

Re: [tesseract-ocr] Training tesseract 4.0 with large training text

2018-03-13 Thread ShreeDevi Kumar
You have to look in the file called by it tesstrain_utils.sh On Tue 13 Mar, 2018, 12:22 PM 이경준, wrote: > Hi Shree . I saw the tesstrain.sh file. > > But I cannot point to max-pages to 3 ??? where ??? > > Could you tell me about it more details > > 2018년 3월 13일 화요일 오전 10시 57분 29초 UTC+9, shree 님

Re: [tesseract-ocr] How to replace top LSTM top layer ?

2018-03-13 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tesseract/issues/1009 Link works ok On Tue 13 Mar, 2018, 12:37 PM 이경준, wrote: > Shreeshrii commented on 29 Jun 2017 > > • > edited > > I think this h

Re: [tesseract-ocr] How to replace top LSTM top layer ?

2018-03-13 Thread ShreeDevi Kumar
That info is given in the training wiki page. On Tue 13 Mar, 2018, 12:53 PM 이경준, wrote: > There is no way about replacing top layer ... ㅜㅜ > > 2018년 3월 13일 화요일 오후 4시 22분 8초 UTC+9, shree 님의 말: >> >> https://github.com/tesseract-ocr/tesseract/issues/1009 >> >> Link works ok >> >> On Tue 13 Mar, 20

Re: [tesseract-ocr] How to replace top LSTM top layer ?

2018-03-13 Thread ShreeDevi Kumar
That command applies to an older version of the source code. Now you need a starter traineddata. Please see the wiki page at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-just-a-few-layers ShreeDevi भज

Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
remove these two lines and try --fonts_dir $fonts_dir \ --fontlist $fonts_for_training \ this overrides what is given in language-specific.sh ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Mar 13, 2018 at

Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
Give the following command - after changing directories to match your setup text2image --find_fonts \ --fonts_dir /usr/share/fonts \ --text ../langdata/kor/kor.training_text \ --min_coverage .9 \ --render_per_font false \ --outputbase ../langdata/kor/kor \ |& grep raw | sed -e 's/ :.*/" \\/g' |

Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
Did you use the fonts_dir where they are installed??? On Tue 13 Mar, 2018, 9:32 PM 이경준, wrote: > Thank U . I have a fontslist file > > but vim fontlist.txt > > There are no fonts ?? > > It means that I cannot use korena fonts?? > > 2018년 3월 13일 화요일 오후 9시 9분 45초 UTC+9, shree 님의 말: >> >> Give the

Re: [tesseract-ocr] Re: pango library doesn't recognize my font .

2018-03-13 Thread ShreeDevi Kumar
change double quote to single quote " to ' ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Mar 13, 2018 at 10:05 PM, 이경준 wrote: > >

Re: [tesseract-ocr] message from runnig tesseract from my tuned traineddata(korean)

2018-03-13 Thread ShreeDevi Kumar
> > > 2) I'm using my korean tuned fine tuned traineddata but, always give > message like that " Error opening data file /chi_tra.trainddata " > please make sure the TESSDATA_PREFIX environment varialbe > > Is it OK? > > shree you teach me ///refer to kor.config > > and I saw the kor.confi

Re: [tesseract-ocr] Different output by tesseract for same image

2018-03-13 Thread ShreeDevi Kumar
Please send the sample image for testing. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Mar 13, 2018 at 5:13 PM, Preeti Pandey wrote: > Hi all, > Using tesserect-OCR, different outputs are getting generat

Re: [tesseract-ocr] Depending on OS, tesseract (4.0) performance is different?

2018-03-15 Thread ShreeDevi Kumar
> tesseract 4.0 Alpha on Ubuntu 16.04.03 LTS Please use latest version beta.1 or build from source on github. > They are operated by Windows . I Think. No, they are not operated by windows. They run on 'bash under winodws' which provides Ubuntu 14.04. It can use fonts installed under windows.

Re: [tesseract-ocr] Depending on OS, tesseract (4.0) performance is different?

2018-03-15 Thread ShreeDevi Kumar
> 1) how to replace tesseract 4.00 alpha with tesseract 4.00 Beta ? How did you install tesseract 4.00alpha? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to t

Re: [tesseract-ocr] Depending on OS, tesseract (4.0) performance is different?

2018-03-15 Thread ShreeDevi Kumar
sudo apt-get purge packagename, or sudo apt-get remove --purge packagename will remove about *everything* regarding the package packagename, [...] Particularly useful when you want to 'start all over' with an application sudo apt-get autoremove ShreeDevi ___

Re: [tesseract-ocr] Depending on OS, tesseract (4.0) performance is different?

2018-03-15 Thread ShreeDevi Kumar
No. You can use Alex's PPA and install for your version of Ubuntu. On Thu 15 Mar, 2018, 9:16 PM 이경준, wrote: > Now Im installing ubuntu 18.04 for tesseract4.00 beta.1 > > Is it right? > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. >

Re: [tesseract-ocr] Train for a limited set of words

2018-03-17 Thread ShreeDevi Kumar
Look at the config file called digits Also look up User-words and user-patterns For your requirement tesseract 4 may not be a good fit since it is line based. There are other releases on 3.0x branch, maybe others can suggest if a newer release will be more helpful. On Sat 17 Mar, 2018, 6:36 PM

Re: [tesseract-ocr] OCR using Google Cloud

2018-03-22 Thread ShreeDevi Kumar
Tesseract does not use cloud api. You may want to checkout https://m.wikisource.org/wiki/Wikisource:Google_OCR That uses the cloud service. On Thu 22 Mar, 2018, 3:37 PM vedant srivastava, wrote: > Hi, > I am trying to do OCR using Google Cloud Platform but I am unable to do > so. I am using

Re: [tesseract-ocr] figure in JPEG goes undetected

2018-03-24 Thread ShreeDevi Kumar
Tesseract will only extract text. On Sat 24 Mar, 2018, 3:59 PM , wrote: > Hi everyone, > > I have attached two files one of which (input.png) was input to tesseract. > In output file output.txt (screenshot attached: output.jpeg) all the text > was recognised successfully > but figure in input fi

Re: [tesseract-ocr] fractions undetected by teserract

2018-03-24 Thread ShreeDevi Kumar
You can try opencv to choose regions and then OCR them. On Sat 24 Mar, 2018, 3:35 PM , wrote: > Hi everyone, > > I have used tesseract for OCR of JPEG images. > In attached files tesseract is unable to detect the fractions. > one image (ques_divide.jpeg) shows the input to tesseract whereas anot

Re: [tesseract-ocr] Unable to use tesseract api installed with a nuget pkg

2018-03-25 Thread ShreeDevi Kumar
Did you build using cppan and cmake? On Mon 26 Mar, 2018, 1:50 AM sonu sainju, wrote: > Hi, > > I followed instruction in > https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows > to > build tesseract and use it in

Re: [tesseract-ocr] How to merge 2 traineddata into 1 traineddata

2018-03-26 Thread ShreeDevi Kumar
Try the script level traineddata files from tessdata_fast/script Han probably has eng+chi* ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 26, 2018 at 12:01 PM, wrote: > Hi I'm newbie. I'm interested in tess

Re: [tesseract-ocr] How to merge 2 traineddata into 1 traineddata

2018-03-26 Thread ShreeDevi Kumar
Please look at https://github.com/tesseract-ocr/tessdata_fast/tree/master/script Look at all Han* files maybe Hangul is the one you need. See https://github.com/tesseract-ocr/tessdata_fast/blob/master/README.md for more details ShreeDevi _

Re: [tesseract-ocr] Unable to use tesseract api installed with a nuget pkg

2018-03-27 Thread ShreeDevi Kumar
I don't use visual studio. However I know that we support vs installation via cppan cmake. Please follow those directions. On Tue 27 Mar, 2018, 9:24 PM sonu sainju, wrote: > Hey Shree, Thanks for replying. No I didn't build using cppan and cmake. I > used vcpkg install command. Isn't vcpkg suppo

Re: [tesseract-ocr] Any suggestions for more accurate Text conversion?

2018-03-27 Thread ShreeDevi Kumar
Version mismatch. That traineddata is for 4.0. Wiki has pages for training. Look for one appropriate for your version of tesseract. On Wed 28 Mar, 2018, 1:23 AM , wrote: > Hi Shree, > > I just tried using the training data file you provided but it seems that > there is some problem with Tessera

Re: [tesseract-ocr] Extracting pristine rasterized text

2018-03-30 Thread ShreeDevi Kumar
Please check GitHub/issues for similar reports and suggestions. Also specify, Which version/commit of tesseract 4 Which traineddata file, from which repo Which o/s tesseract -v On Fri 30 Mar, 2018, 2:19 PM Patrick Ramsey, wrote: > Hi! > > So, I am running tesseract4 on clean, 1-bit images

Re: [tesseract-ocr] Extracting pristine rasterized text

2018-03-30 Thread ShreeDevi Kumar
Please also note that -enable-debug by itself will make it slower. On Fri 30 Mar, 2018, 2:29 PM ShreeDevi Kumar, wrote: > Please check GitHub/issues for similar reports and suggestions. > > Also specify, > Which version/commit of tesseract 4 > Which traineddata file, from which

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-01 Thread ShreeDevi Kumar
Use --psm as --psm is deprecated On Sun 1 Apr, 2018, 7:25 PM JP T, wrote: > Hi > > I just updated from version 3.04.01 but now tesseract fails with above > message if I give the -psm option. > input files are PNG. > > any idea? > > thanks > > -- > You received this message because you are subs

Re: [tesseract-ocr] Extracting pristine rasterized text

2018-04-02 Thread ShreeDevi Kumar
Thank you for the detailed info. My suggestion is to try recognition with eng.traineddata from the tessdata_fast repository with --oem 1. On Tue 3 Apr, 2018, 3:13 AM Patrick Ramsey, wrote: > Answers below inline. And thank you very much for your help :) > > |PTR > > On Friday, March 30, 2018 a

Re: [tesseract-ocr] does it make sense to train existing languages? how to fix repeatedly wrong letters?

2018-04-02 Thread ShreeDevi Kumar
My suggestion would be to do post processing of the OCR output. On Mon 2 Apr, 2018, 6:09 PM JP T, wrote: > Hi > > I don't really got an understanding of the consequences of training. > > My problem: > I've got tons of pages with a special format. ("one place study" about the > historic inhabitan

Re: [tesseract-ocr] Checkbox Extraction as text after Fine tuning for new characters .

2018-04-03 Thread ShreeDevi Kumar
Try to train with a large number of fonts and see if that improves the result. On Tue 3 Apr, 2018, 2:29 PM Apoorv Khanna, wrote: > Hi all, > > I am able to extract few check boxes after fine tuning the English model > but tesseract is not able to extract all the check boxes . > > Thanks in advan

Re: [tesseract-ocr] Error at training 4.0

2018-04-04 Thread ShreeDevi Kumar
Training tesseract 4.0.0 is different from process for 3.0x. Training using images is not supported for tesseract 4.0.0. See https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 On Thu 5 Apr, 2018, 1:36 AM Fanatico, wrote: > Hi, I'm new to tesseract and ocr in general, and n

Re: [tesseract-ocr] Traineed non unicode font with tesseract

2018-04-04 Thread ShreeDevi Kumar
Training tesseract is only supported using unicode fonts. On Thu 5 Apr, 2018, 12:25 AM gopal bhalala, wrote: > Hi I am new in tesseract-ocr. I want trainned non unicode font using > tesseract, I tried with to trained it with jTextboxeditor to trained that > data but did not get any sucess. > > -

Re: [tesseract-ocr] Traineed non unicode font with tesseract

2018-04-05 Thread ShreeDevi Kumar
ny way to train non unicode font > PDF AND IMAGE? > i have non unicode pdf file and image for ocr shall i box it and assing > the uniode font charcter is it right way to do non unicode pdf or image to > OCR. > > On 05-Apr-2018 7:25 AM, "ShreeDevi Kumar" wrote: > >&

Re: [tesseract-ocr] Traineed non unicode font with tesseract

2018-04-06 Thread ShreeDevi Kumar
o do that? > > Best Regards & Thanking you, > Gopal Dhanjibhai Bhalala > > On Fri, Apr 6, 2018 at 1:20 AM, ShreeDevi Kumar > wrote: > >> Are you trying to recognize the text from a pdf or image with non unicode >> font? >> >> That is possible to do.

Re: [tesseract-ocr] ERROR: exp0.box does not exist or is not readable

2018-04-06 Thread ShreeDevi Kumar
Is your langdata in --langdata_dir ../../langdata On Sat 7 Apr, 2018, 4:51 AM Fanatico, wrote: > I'm trying to execute the training from the 4.o tutorial, but I'm getting > an error, can someone help with this? > > Platform: MAC OS X 10.13.3 > Tesseract: 4.0.0-beta.1 > leptonica: 1.75.3 > libj

Re: [tesseract-ocr] ERROR: exp0.box does not exist or is not readable

2018-04-07 Thread ShreeDevi Kumar
Look in your tmp directory in the sub folders referred in the console output Check the log file and other files there On Sat 7 Apr, 2018, 11:00 AM Fanatico, wrote: > Yes the location is correct, I tried to put the full path to the folder > and go the same error. > > Im just cloned the https://

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-07 Thread ShreeDevi Kumar
Just a word list is not enough for training text. For tesseract 4.0.0 it needs to be representative of the text to be recognized. On Sat 7 Apr, 2018, 2:50 PM Romil Mehla, wrote: > Is there any program to generate it ? i see ambiguous_words.cpp > generating dictionary words and ambiguous words

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-07 Thread ShreeDevi Kumar
see https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03%E2%80%933.05 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Apr 7, 2018 at 4:02 PM, Romil Mehla wrote: > Thanks for your reply , i have

Re: [tesseract-ocr] Failed to build ScrollView.jar on MAC OSX

2018-04-07 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/blob/master/Makefile.am >From which dir did you try make ScrollView.jar ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Apr 7, 2018 at 7:42 PM, Fanatico wrot

Re: [tesseract-ocr] Failed to build ScrollView.jar on MAC OSX

2018-04-07 Thread ShreeDevi Kumar
Please try from the main tesseract folder. On Sat 7 Apr, 2018, 11:50 PM Fanatico, wrote: > from the java folder "cd ~/projects/tesseract/java" in my case > > On Saturday, 7 April 2018 12:40:29 UTC-3, shree wrote: >> >> Please see >> https://github.com/tesseract-ocr/tesseract/blob/master/Makefi

Re: [tesseract-ocr] Install and run tesseract 4.0 on MAC OSX step by step

2018-04-08 Thread ShreeDevi Kumar
Thank you. On Sun 8 Apr, 2018, 3:20 PM Fanatico, wrote: > I just posted at the repo issues a step to step that I needed to do so I > could use tessercat 4.0 from my MAC, so I'm just sharing the link in case > someone has the same problems I got. > Obs.: It can save a few days of your life > > ht

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-08 Thread ShreeDevi Kumar
Which traineddata are you using? Use combine_tessdata and extract the config file to see if chinese is included as sub language. Also look at the lstm-unicharset to see if the Chinese characters are included in it. On Mon 9 Apr, 2018, 11:09 AM Fanatico, wrote: > I'm running tesseract with the

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-08 Thread ShreeDevi Kumar
Please remove the sub language line from config file, and use combine tessdata to overwrite it. Right now it seems to be using chi_tra also. On Mon 9 Apr, 2018, 11:48 AM Fanatico, wrote: > I used one traineddata that I created on removing the top layer from the > kor.traineddata from "tessdata_

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread ShreeDevi Kumar
Leftover from 3.04, my guess. On Mon 9 Apr, 2018, 12:52 PM Fanatico, wrote: > It worked, thanks. > > Any reason for this chi_tra there? > > > On Monday, 9 April 2018 03:24:44 UTC-3, shree wrote: >> >> Please remove the sub language line from config file, and use combine >> tessdata to overwrite

Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-09 Thread ShreeDevi Kumar
://bhajans.ramparivar.com On Mon, Apr 9, 2018 at 1:45 PM, ShreeDevi Kumar wrote: > Leftover from 3.04, my guess. > > On Mon 9 Apr, 2018, 12:52 PM Fanatico, wrote: > >> It worked, thanks. >> >> Any reason for this chi_tra there? >> >> >> On

Re: [tesseract-ocr] How to created training text as provided in langdata for any new language if i have just just have a wordlist.

2018-04-09 Thread ShreeDevi Kumar
act developer answer my question. Please tell me > the way > > Thanks again for your timely reply and help . > > > > > On Sat, Apr 7, 2018 at 6:21 PM, ShreeDevi Kumar > wrote: > >> see https://github.com/tesseract-ocr/tesseract/wiki/Trainin >> g-Tesseract-3.03

Re: [tesseract-ocr] Doubt on "--eval_listfile"

2018-04-10 Thread ShreeDevi Kumar
To make sure that the model is not overfitted to training data, your eval set should be different. You can use a different text file, different fonts from the training set to check that the model performs well on text and fonts it has not seen earlier. On Tue 10 Apr, 2018, 8:16 PM Fanatico, wrot

Re: [tesseract-ocr] Re: Doubt on "--eval_listfile"

2018-04-10 Thread ShreeDevi Kumar
Yes, and you can use different text files for training and eval. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Apr 10, 2018 at 10:01 PM, Fanatico wrote: > wen I asked about passing the ".training_text" as a p

Re: [tesseract-ocr] How to train for multiple languages?

2018-04-10 Thread ShreeDevi Kumar
Ray has not given instructions for multi language or script type training. You can try to concatenate the two training texts, word lists, merge the unicharsets (merge_unicharsets command), and then do replace a layer training with your primary language as base. Also, unpack the Han and Hangul scr

  1   2   3   4   5   6   7   8   9   >