[tesseract-ocr] Reading handwritten text from printed form

2018-03-12 Thread shantanu oak
I have an image: https://s3.amazonaws.com/todel162/harshad_college_card.jpg The tessaract OCR fails to read it correctly. As seen in this text file - name, Standard and date of birth is missing (that is most important) https://s3.amazonaws.com/todel162/college_card_reading.txt Is there any pac

[tesseract-ocr] Tesseract 4 for old languages

2018-03-12 Thread Guillaume Desforges
Hi I want to try using Tesseract 4 for old manuscript languages ("The Song of Roland" and such). I have looked at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 but the steps are very unclear. I have an image and a text file with the line content for each line of manu

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-03-12 Thread ShreeDevi Kumar
Please try tesseract 4.0.0beta.1 with languages such as *enm* (English, Middle (1100-1500)) and Fraktur script Also, look at the following project from a few years back http://emop.tamu.edu/outcomes/Franken-Plus ShreeDevi भजन - की

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-03-12 Thread ShreeDevi Kumar
>I have an image and a text file with the line content for each line of manuscript text. The doc says what to do, but not how. >I first thought I'd need img/box files pairs, but it seems it was for Tesseract 3 and is now irrelevant... Tesseract4.0.0beta.1 does not officially support LSTM training

[tesseract-ocr] Re: tesseract 4.00 beta is released ? I saw the who use the tesseract 4.00 beta

2018-03-12 Thread adarsh shukla
There is no official release of tesseract 4.0 Beta. There might be some unofficial release, not found anything as such in Google. On Monday, March 12, 2018 at 10:17:35 AM UTC+5:30, 이경준 wrote: > > tesseract 4.00 beta is released ? I saw the who use the tesseract 4.00 > beta (in the github issue)

Re: [tesseract-ocr] Re: tesseract 4.00 beta is released ? I saw the who use the tesseract 4.00 beta

2018-03-12 Thread ShreeDevi Kumar
Master branch in github repo at commit 40f4311 has been tagged as tesseract4.0.0beta.1 - Please see https://github.com/tesseract-ocr/tesseract/releases/tag/4.0.0-beta.1 That commit is the one which has be

Re: [tesseract-ocr] Re: tesseract 4.00 beta is released ? I saw the who use the tesseract 4.00 beta

2018-03-12 Thread Zdenko Podobny
it is official: https://github.com/tesseract-ocr/tesseract/releases Zdenko 2018-03-12 10:09 GMT+01:00 adarsh shukla : > There is no official release of tesseract 4.0 Beta. There might be some > unofficial release, not found anything as such in Google. > > On Monday, March 12, 2018 at 10:17:35 A

[tesseract-ocr] Training tesseract 4.0 with large training text

2018-03-12 Thread john . debye . 7
Dear all, I'm trying to train lstm using a large training text, different fonts, colors etc. I'm trying to use text2image to generate my tif / box file combinations, however text2image appears to be limited to 3 pages and thus truncates my training text. How should I solve this? Call text2image

Re: [tesseract-ocr] Training tesseract 4.0 with large training text

2018-03-12 Thread ShreeDevi Kumar
Please look at tesstrain.sh It is setting max-pages to 3 for text2image invocation. You can change it there. On Tue 13 Mar, 2018, 6:54 AM , wrote: > Dear all, > > I'm trying to train lstm using a large training text, different fonts, > colors etc. I'm trying to use text2image to generate my tif

Re: [tesseract-ocr] Training tesseract 4.0 with large training text

2018-03-12 Thread 이경준
Hi Shree . I saw the tesstrain.sh file. But I cannot point to max-pages to 3 ??? where ??? Could you tell me about it more details 2018년 3월 13일 화요일 오전 10시 57분 29초 UTC+9, shree 님의 말: > > Please look at tesstrain.sh > > It is setting max-pages to 3 for text2image invocation. You can change it >