[tesseract-ocr] Re: Where to find the LSTM network architecture used in Tesseract?

2018-01-11 Thread Alexander Nadeau
The specific network structure is particular to a given traineddata file. I have no idea how the specification gets turned into the entire network's architecture in tesseract 4, but you can get a particular file's specification with combine_tessdata: $ ./combine_tessdata.exe -d tess4traineddata

[tesseract-ocr] Where to find the LSTM network architecture used in Tesseract?

2018-01-11 Thread sujith vemisetty
I have tried a lot to find the network architecture of LSTMs used in Tesseract 4.00Alpha, but I wasn't able to find any. I can only find how to train the new neural network implementation. I would like to understand the architecture first. Can anyone point me to any documentation which details

Re: [tesseract-ocr] Empty result with images taken as marginally low resolution - Nepali

2018-01-11 Thread ShreeDevi Kumar
Works fine for me. What traineddata and options did you use? Attaching the output from the following, I did not change dpi of image. #!/bin/bash img_files=$(ls ./nepali*.png) for img_file in ${img_files}; do echo "" ${img_file} oem 1"*

[tesseract-ocr] Re: I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2018-01-11 Thread THintz
> > See https://github.com/charlesw/tesseract/wiki/Error-2 >> > The Tesseract.dll goes in the folder with your binary and the other two dlls go in either an x64 or an x86 folder below that. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To

[tesseract-ocr] Re: Variables having no effect on C# Tesseract.net 4.0.0.6 wrapper

2018-01-11 Thread James Q
Is anyone else using tesseract 4.0alpha from C# ? On Wednesday, January 10, 2018 at 1:07:28 PM UTC, James Q wrote: > > Here is my code: > string text = ""; > > string tessDataPath = ConfigurationManager.AppSettings["TessPath"]; > using (var engine = new TessBaseAPI(@tessDataPath, @"eng")) > { >

[tesseract-ocr] Re: Very inaccurate output - need help

2018-01-11 Thread James Q
This image is quite skewed. I suggest you straighten it and binarize it before passing it to tesseract. On Thursday, January 11, 2018 at 10:14:13 AM UTC, Zaheer Javi wrote: > > Hi, > > I'm trying to apply tesseract with the attached file, however I get back > extremely low accuracy. Have tried i

Re: [tesseract-ocr] Re: Too few characters. Skipping this page

2018-01-11 Thread Hakan Usakli
Hello Zdenko, Thank you for that tip. Yes I am extremely interested in using Leptonica functions directly, especially if they are expected to run faster. But I am almost illiterate on C - I have the precompiled Leptonica DLL's they are called *liblept-5.dll (7969kb)* or *pvt.cppan.demo.danbloom

[tesseract-ocr] Empty result with images taken as marginally low resolution - Nepali

2018-01-11 Thread Nirajan Pant
Tesseract 4.0 is not working with the image provided here. This is a page from Nepali novel. The resolution is slightly low but not too much. The OCR result only few word or in other pages it returns empty result.

Re: [tesseract-ocr] Re: Too few characters. Skipping this page

2018-01-11 Thread Zdenko Podobný
If you need to detect just orientation it should be faster to use only leptonica functions. See https://tpgit.github.io/Leptonica/flipdetect_8c_source.html http://tpgit.github.io/Leptonica/skew_8c_source.html Zdenko On Thu, Jan 11, 2018 at 12:36 PM, Hakan Usakli wrote: > In case it helps someo

[tesseract-ocr] Re: Too few characters. Skipping this page

2018-01-11 Thread Hakan Usakli
In case it helps someone, Yes there is a way to change the behaviour of 'minimum number of characters' I struggled with the same problem you have as well for a while In this file, https://github.com/tesseract-ocr/tesseract/blob/master/ccmain/osdetect.cpp change the value of this constant to somet

[tesseract-ocr] Can Tesseract OCR Detect lines and rectangles?

2018-01-11 Thread Subhanshu Gupta
Hi All, I am new to Tesseract and am trying to understand what are its capabilities. I need Tesseract to read forms with different sections and dump data in Database. I am not able to find any references which can tell me if Tesseract can Identify Lines and Rectangles automatically and return m

Re: [tesseract-ocr] Re: Need Help with extracting info from Invoice

2018-01-11 Thread saumitra mallick
Hello Shree, Thanks for the API example ,I'm facing issue with base api. I get perfect output, when On my ubuntu terminal when I do $ tesseract Row0_0.tif Row0_0_out But When I try to read same file with BaseAPI code Ex tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI(); I'm run

Re: [tesseract-ocr] Re: How can I do the training using my own image in Tesseract 4.0

2018-01-11 Thread ShreeDevi Kumar
Currently, Ray/Google has NOT released info on how to train Tesseract 4 (LSTM) with real life images. The only supported option is to use synthetic training data created by tesstrain.sh script using training text and unicode fonts. To train an LSTM model from scratch requires a large amount of tra