Re: [tesseract-ocr] tesseract not able to detect handwritten text even after improving image quality

2018-08-07 Thread tri9rahul
Thanks shree. On Tuesday, August 7, 2018 at 6:01:48 PM UTC+5:30, shree wrote: > > see FAQ > > https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-use-tesseract-for-handwriting-recognition > > Recently a lot of people have tried to train 4.0 using handwriting fonts, > however, there has been

Re: [tesseract-ocr] Re: OCR-d failed at Unicharset line -Help!

2018-08-07 Thread Shree Devi Kumar
Re finetuning - see https://github.com/tesseract-ocr/tesseract/issues/1782#issuecomment-411018986 Have you tried to provide each word separately (eg. using opencv ) for recognition? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscrib

[tesseract-ocr] Re: increase the quality of image so that it extracts proper text from it.

2018-08-07 Thread May
Could you share the image that you used to process? On Tuesday, July 31, 2018 at 11:33:41 PM UTC-7, Mahima Goyal wrote: > > I want to increase the quality of the image so that proper text is > extracted. Right now I am using tesseract but I am not able to extract few > things in the image > > In

[tesseract-ocr] Re: error while converting pdf file to tiff using command

2018-08-07 Thread May
Try checking this out: https://github.com/ImageMagick/ImageMagick/issues/396 On Monday, August 6, 2018 at 12:37:33 AM UTC-7, thiyam...@gmail.com wrote: > > hello everyone, for testing tesseract i convert the pdf file to tiff file > and after 10 files(each contains 7000-8000 characters), there is

Re: [tesseract-ocr] Re: OCR-d failed at Unicharset line -Help!

2018-08-07 Thread May
I'm trying to extract data from scanned pdf forms that contain geotechnical data like these. But the tesseract is not recognizing them accurately as some numbers and characters are wrongly interpreted especially some of the keywords like 'N1' and numbers that I am looking for. I have tried pre-

Re: [tesseract-ocr] What is the state of the C and Python APIs?

2018-08-07 Thread Nick White
Hi Luke, On Mon, Aug 06, 2018 at 02:12:38PM -0700, Luke Brandl wrote: > I've been working to understand Tesseract and looking through the C and Python > API code and documentation. It looks like some of the code and documentation > are up to date, while the rest refers to 3.0.2 at least in the com

Re: [tesseract-ocr] tesseract not able to detect handwritten text even after improving image quality

2018-08-07 Thread Shree Devi Kumar
see FAQ https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-use-tesseract-for-handwriting-recognition Recently a lot of people have tried to train 4.0 using handwriting fonts, however, there has been no report as to the level of success they have had doing it. On Tue, Aug 7, 2018 at 3:28

[tesseract-ocr] tesseract not able to detect handwritten text even after improving image quality

2018-08-07 Thread tri9rahul
Hi, I'm trying to extract the handwritten data from image but even after improving the image quality tesseract is not able to detect handwritten text . can you please suggest me the steps to detect handwritten from given image. Reference link - https://github.com/tesseract-ocr/tesseract/wiki/I

Re: [tesseract-ocr] Re: OCR-d failed at Unicharset line -Help!

2018-08-07 Thread Shree Devi Kumar
question: why are you trying to do training? There are hundreds of languages already supported by tesseract. Have you tried them? If none of them work, then you need to define what is required - eg. Is a particular type face required? Is the traineddata missing some required characters? Is the la

Re: [tesseract-ocr] Re: OCR-d failed at Unicharset line -Help!

2018-08-07 Thread Shree Devi Kumar
lstm training can take weeks, days, hours depending on the options chosen. you have given complete network spec, so that is training from scratch. Please see the following training wiki page for training related info: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 On Tue

Re: [tesseract-ocr] Re: OCR-d failed at Unicharset line -Help!

2018-08-07 Thread May
Oh the training started by itself after a long while and still processing. Does it normally take that long to train 6 images? On Monday, August 6, 2018 at 11:42:

[tesseract-ocr] What is the state of the C and Python APIs?

2018-08-07 Thread Luke Brandl
I've been working to understand Tesseract and looking through the C and Python API code and documentation. It looks like some of the code and documentation are up to date, while the rest refers to 3.0.2 at least in the comments. Does anyone know the state of the API? Are any particular function