[tesseract-ocr] Some spaces are not recognized

2018-05-18 Thread Sumedhe Dissanayake
Sometimes spaces between words are ignored when tesseract is used to recognize Sinhala text. - The traineddata from tesseract does not have a spacing problem, even though there ware changes in tesseract since it was uploaded. - The spacing problem occurs regardless of whether I start the trainin

Re: [tesseract-ocr] Some spaces are not recognized

2018-05-23 Thread Sumedhe Dissanayake
gt; > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, May 18, 2018 at 5:39 PM, Sumedhe Dissanayake < > sumedhedi...@gmail.com > wrote: > >> Sometimes spaces between words are ignored when tesseract is used to >>

Re: [tesseract-ocr] Some spaces are not recognized

2018-05-29 Thread Sumedhe Dissanayake
On Friday, May 18, 2018 at 6:32:44 PM UTC+5:30, shree wrote: > > image is not visible. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, May 18, 2018 at 5:39 PM, Sumedhe D

[tesseract-ocr] Where to put langdata when training with tesstrain scripts

2022-11-28 Thread Sumedhe Dissanayake
Hi all, I'm trying to train with the tesstrain scripts and it is failing with the following error. Warning: properties incomplete for index 77 = ඣ Config file is optional, continuing... Failed to read data from: /tesstrainer/tmp/dataset/oscar/langda

[tesseract-ocr] Can't encode transcription error with Sinhala language

2018-01-13 Thread Sumedhe Dissanayake
I tried lstmtraining with sinhala language but I always get this error. Command: lstmtraining --traineddata ~/tesstutorial/sintrain/sin/sin.traineddata \ --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c155]' \ --debug_interval 0 --max_iterations 50 --max_image_MB 6

[tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-17 Thread Sumedhe Dissanayake
But now lstmtraining says *--traineddata* flag is not available. <https://lh3.googleusercontent.com/-o3O6cUXp9HY/Wl_L6Sln-9I/CPs/p04w2YfIXgUxhKkywfw_ArA_8og2MUyRwCLcBGAs/s1600/Screenshot%2Bfrom%2B2018-01-18%2B03-48-49.png> On Sunday, January 14, 2018 at 12:31:17 PM UTC+5:30, S

[tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread Sumedhe Dissanayake
I am using the latest version (from the github). <https://lh3.googleusercontent.com/-Ne9c4xgkQLQ/WmCBgmqvKGI/CQo/9ew6gf62RMcdNX_-YpG4K0qt0J26U4fMgCLcBGAs/s1600/Screenshot%2Bfrom%2B2018-01-18%2B16-41-37.png> On Sunday, January 14, 2018 at 12:31:17 PM UTC+5:30, Sumedhe Dissanayake

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread Sumedhe Dissanayake
I am using the latest version (from the github). On Thursday, January 18, 2018 at 12:14:12 PM UTC+5:30, shree wrote: > > What vers

Re: [tesseract-ocr] Re: Can't encode transcription error with Sinhala language

2018-01-18 Thread Sumedhe Dissanayake
I am using the latest version of the tesseract (from the github) On Thursday, January 18, 2018 at 12:14:12 PM UTC+5:30, shree wrot