We are hiring software engineers with OCR and document image processing experience

2009-04-17 Thread joe
We, a Fortune-100 high-tech company, are hiring software engineers with OCR and document image processing experience. The job location is in Pacific Northwest. Compensations including: competitive salary, stock grant, signing cash bonus, 401k, medical insurance. Please contact joeche...@gmail.com

set tessedit_write_ratings

2011-02-21 Thread Joe
..255) which are a quite good measure for the correctness. the thing is i dont know how to set this variable when shell executing. I tried to set it in one of the config files in tessdata folder, without result... I would be happy for any hints, thanks & best regards, Joe -- You received thi

Using other languages - installation goes wrong!!! Need help!

2011-11-06 Thread Joe
\Programme\Tesseract-OCR\doc Output folder: C:\Programme\Tesseract-OCR\doc Extract: AUTHORS Extract: COPYING Extract: eurotext.tif Extract: phototest.tif Extract: README Extract: ReleaseNotes Created uninstaller: C:\Programme\Tesseract-OCR\Uninstall.exe Create folder: C:\Dokumente und Einstel

[tesseract-ocr] Re: How to upgrade to Tesseract 4.0 with C++ in Visual Studio.

2018-06-22 Thread Joe
Add the *--head* flag to the command vcpkg install tesseract:x64-windows --head sexta-feira, 22 de Junho de 2018 às 00:50:14 UTC-3, Chris escreveu: > > Following these steps: > https://github.com/tesseract-ocr/tesseract/wiki/Compiling#windows on the > offic

[tesseract-ocr] OCR-D training process - High error rate [Tess 4]

2018-07-04 Thread Joe
(about 60%). That new training process with LSTM is driving me crazy! I would appreciate if anyone with experience could take a look to my data set. Joe. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from

[tesseract-ocr] Re: OCR-D training process - High error rate [Tess 4]

2018-07-04 Thread Joe
character box value is different while in *.box files created by OCR-D the all have the same values. Is that a problem? quarta-feira, 4 de Julho de 2018 às 11:50:54 UTC-3, Joe escreveu: > > Hi everybody! > > I'm trying this tool https://github.com/OCR-D/ocrd-train/ but with

Re: [tesseract-ocr] Re: OCR-D training process - High error rate [Tess 4]

2018-07-04 Thread Joe
have a look at this thread too: > > https://groups.google.com/forum/#!topic/tesseract-ocr/be4-rjvY2tQ > > > Bye > > Lorenzo > > > 2018-07-04 17:03 GMT+02:00 Joe >: > >> I forgot to mention: >> The *.box files created by OCR-D are not in the same format as

Re: [tesseract-ocr] Re: OCR-D training process - High error rate [Tess 4]

2018-07-07 Thread Joe
ll share it here. Have a nice weekend! Joe. quarta-feira, 4 de Julho de 2018 às 13:39:41 UTC-3, Lorenzo Blz escreveu: > > > I suspect 1800 lines may not be enough data for training from scratch and > you are simply overfitting. I think 5% refers to the evaluation set, with a > d

Re: Need some recommendations

2009-07-02 Thread Joe K
Hi zhi, its hard to tell with out the insurance card, but if the insurance card is in a certain font or only contains a certain number of characters you can train it using that font and those characters to try to increase the accuracy, and that would be your "language". And if you have certain se

Problem creating .traineddata file in tesseract 3.0

2009-09-11 Thread Joe K
ndering if anyone has had a similar problem, or knows what I did wrong and could point me in the right direction. Thanks in advance, Joe K --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. T

Re: Problem creating .traineddata file in tesseract 3.0

2009-09-12 Thread Joe K
still have the original tiff images for my language so I will train in 3.0 and give it another shot! Thanks again, Joe Karlovich On Sep 11, 1:30 pm, SteveP wrote: > For some of the training information for 3.0, there has not been > clarification from Ray Smith. I do not know if the training

Re: word review

2010-03-08 Thread Joe K
Hey Thilanka, I ran into a similar problem when I only needed it to look at hexidecimal values. What I ended up doing was creating a separate "langauge" that only contained the specified characters. So you could create a langauge of numbers and a language with letters and use tesseract to read eac

Which revision of tesseract 3.0 for win7 64bit

2010-08-19 Thread Joe Degenhardt
6) and the current one, considering that I need OCR for a language consisting mostly of english and a focus on a few(but not exclusivly those few) fonts? Best Regards, Joe Degenhardt -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. T

Re: Page layout analysis - don't split columns.

2012-05-26 Thread Joe Aspara
I have the same problem reported by Brock. Anyone has a solution to force tesseract to read one line at time ignoring the multi-column layout. (I guess this was the standard behavior in the 1.xx and 2.xx versions) Il giorno sabato 24 settembre 2011 02:04:23 UTC+2, Brock ha scritto: > > Hi, > >

Re: Page layout analysis - don't split columns.

2012-05-27 Thread Joe Aspara
Thanks Broke, unfortunately I must use the android tesseract version so I need to find a programmatically solution to this problem. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups

Android version: is possible to get the bounding boxes of the recognized words

2012-05-27 Thread Joe Aspara
Hi you all, I'm searching for a way to getting the bounding box coordinates of the characters in the tesseract-android-tool version. In the native API interface (TessBaseAPI.java) there isn't the native method for static int TesseractExtractResult(char** string, int** lengths, float** costs, i

Re: Android version: is possible to get the bounding boxes of the recognized words

2012-05-27 Thread Joe Aspara
Il giorno domenica 27 maggio 2012 18:54:42 UTC+2, Joe Aspara ha scritto: > > Hi you all, > I'm searching for a way to getting the bounding box coordinates of the > characters in the tesseract-android-tool version. In the native API > interface (TessBaseAPI.java) there is

Training Text Best Practices

2012-11-29 Thread Joe Carter
Hello, I'm trying to Train Tesseract to recognize a script with over 200 letters. Is it ok to train Tesseract with gibberish text? Or does the training method rely on a probable distribution of characters i.e. Actual writing? I'd like to train it with a random distribution of characters where e

Training Text Best Practices

2012-11-29 Thread Joe Carter
Hello, I'm trying to Train Tesseract to recognize a script with over 200 letters. Due to the large number of letters, I'm trying to see if I can come with a text that is easy to generate and is optimal for training. I'd like to train it with a random distribution of characters where each chara

How to OCR the attached image

2013-01-29 Thread Joe Chan
The attached image is seemingly simple to ocr but I failed to do it. Any pointers Joe -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this g

[tesseract-ocr] Tesseract 3.02 Orientation Script Detection

2014-05-11 Thread Joe Aspara
I'm struggling with the OSD function of Tesseract 3.02. I tried the standalone version via command line and the Tess4J version too, but I always obtain an error with different input types. I downloaded the osd.traineddata for version 3.01 (I guess no such file still exist for v3.02) from here h

[tesseract-ocr] Re: Tesseract 3.02 Orientation Script Detection

2014-05-14 Thread Joe Aspara
n, > and Textline Order. Check Tess4J unit tests for usage of OSD. > > On Sunday, May 11, 2014 5:48:39 AM UTC-5, Joe Aspara wrote: >> >> I'm struggling with the OSD function of Tesseract 3.02. >> I tried the standalone version via command line and the Tess4J

[tesseract-ocr] OCR - does anyone know how to recognize shapes within a document?

2016-01-19 Thread Joe King
Does anyone know how to recognize shapes within a document? I am looking to find some software that can recognize a square, circle and triangle in multiple scanned PDF document and place a highlight on top of them. -- You received this message because you are subscribed to the Google Groups "