Hello Mr. Rao,
Can u please send the trainneddata file to me as I am stuck.
I am also stuck with a problem where i have the tesseract reading the text
but the output sent out from the engine in the text file is some special
characters.. I am unable to tell the tesseract engine to write the output
I think I've solved using the tess-two version of tesseract, which contains
four new native methods:
- TessBaseAPI::GetRegions()
- TessBaseAPI::GetTextlines()
- TessBaseAPI::GetWords()
- TessBaseAPI::GetCharacters()
For further info, visit: https://github.com/rmtheis/tess-two
Il g
Hi you all,
I'm searching for a way to getting the bounding box coordinates of the
characters in the tesseract-android-tool version. In the native API
interface (TessBaseAPI.java) there isn't the native method for
static int TesseractExtractResult(char** string, int** lengths, float**
costs, i
On May 22, 3:53 pm, sri1683 wrote:
> hi taha,
>
> thanks for the suggestion..
> i have used 6 tif images for training..
> thats what drove me to think that the traineddata file should be
> bigger..
>
> On May 22, 3:35 pm, Taha Alasli wrote:
>
>
>
>
>
>
>
> > I think that size of the traineddata
On 27 May 2012 15:34, Zdenko Podobný wrote:
>> Made with scanTailor, jbigenc, pdfbeads and Tess3.01.
You can do this in one step with gscan2pdf[1] - which uses Tesseract
for the OCR.
Regards
Jeff
[1] http://gscan2pdf.sourceforge.net/
--
You received this message because you are subscribed to
Well, thanks should go to David who fix the code and Galt who
reported/test it.
My problem (excluding lack of time;-) ) there is no working hocr
validity tool. hocr-tools[1] has something but it looks to have problem
with recent python PyXML[2] (I just did quick test). I saw some attempts
that rep
Maybe you can write a blog (then post link to forum ;-) ) about
work-flow (needed changes, spent time at each step etc.)
This could be useful also for non tesseract communities.
--
Zdenko
Dňa 26.05.2012 09:01, Galt wrote / napísal(a):
> Here's my pdf if anyone is interested:
>
> http://folkpla
Just small correction: tesseract-ocr 3.0x did not use libtiff directly, but
via leptonica.
--
Zdenko
On Sun, May 27, 2012 at 12:25 PM, Stane wrote:
> 1.
> Once litiff is properly installed you shouldn't get any problems later
> on.
> An alternative to the multipage things is to have each page
in 3.0x you can set page segmentation mode (search for SetPageSegMode or
variable "tessedit_pageseg_mode"). I think proper mode should help you.
If I remember correctly, that was report here at forum, who to compile
current tesseract for android.
--
Zdenko
On Sun, May 27, 2012 at 12:06 PM, Joe
1.
Once litiff is properly installed you shouldn't get any problems later
on.
An alternative to the multipage things is to have each page as a
single tiff file, numbered through.
like: [lang].[fontname].exp[num]
2.
Not sure how important the fitting of the bounding box around each
character is,
bu
Tesseract doesn't have any real postprocessing yet.
Which means you will get the word in the same order as you input them.
you have to handle the word order changing yourself.
Maybe if the output of tesseract is good enough with all the comma in
the right place. you could try to use them as a sepa
Thanks Broke, unfortunately I must use the android tesseract version so I
need to find a programmatically solution to this problem.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups
Hi EricD,
Can you please share your source code with me, hope this is related C#
VS2010,
my target is develop a OCR application for language "Sinhala" which use in
my country.
i'm trying with general guides but unable to get success.
Thanks in advance.
Ruwanthaka
--
You received this message
Joe, I got over my problem, though I don't remember how.
I think I updated to the latest svn version, and no longer had the problem.
On Sunday, 27 May 2012, Joe Aspara wrote:
> I have the same problem reported by Brock. Anyone has a solution to force
tesseract to read one line at time ignoring
14 matches
Mail list logo