In the context of ctakes, I am not sure. But recently, I am using Google
text recognition services and the results (of print texts) are really good.
Maybe you could try that. Melvin

On Wed, Oct 18, 2017 at 10:19 PM, <abilash.mat...@cognizant.com> wrote:

> Sean,
>
> What is the accuracy that you get from OCR? We are at  60-70% accuracy.
> Most of the documents are 200 DPI ones. Also, are you using any other
> software like Matlab for the OCR pre or  post processing.
>
> Thanks,
> Abilash Mathew
>
> -----Original Message-----
> From: Mathew, Abilash (Cognizant)
> Sent: Monday, October 16, 2017 8:37 PM
> To: dev@ctakes.apache.org
> Subject: RE: OCR engine used [EXTERNAL]
>
> Thanks Sean fir the quick reply and providing the valuable information.
>
> Regards,
> Abilash Mathew
>
> -----Original Message-----
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Monday, October 16, 2017 8:17 PM
> To: dev@ctakes.apache.org
> Subject: RE: OCR engine used [EXTERNAL]
>
> Hi Abilash Mathew,
>
> I have only used Tesseract.  Unfortunately, no ocr is perfect.
> I am by no means an expert on Tesseract, but perhaps I can help to get you
> started ...
>
> There are tricks that you can use to get it to work better with medical
> notes (besides training on fonts).  Possibly the most effective is using a
> whitelist of desired characters using tessedit_char_whitelist and a series
> of characters that doesn't include things like hash, dollar, bar ...
> Another is to add a wordlist that contains words pertinent to your domain.
> See:
> https://github.com/tesseract-ocr/tesseract/wiki/
> ImproveQuality#dictionaries-word-lists-and-patterns
> https://github.com/tesseract-ocr/tesseract/blob/master/doc/
> tesseract.1.asc#config-files-and-augmenting-with-user-data
> https://stackoverflow.com/questions/9568165/custom-
> dictionary-for-tesseract
> https://www.mail-archive.com/tesseract-ocr@googlegroups.com/msg10100.html
>
> Good luck,
> Sean
>
> -----Original Message-----
> From: abilash.mat...@cognizant.com [mailto:abilash.mat...@cognizant.com]
> Sent: Monday, October 16, 2017 10:13 AM
> To: dev@ctakes.apache.org
> Subject: OCR engine used [EXTERNAL]
>
> Hi All,
>
> Can you guys give some of the OCR engines used for Medical record text
> extraction from images? I am currently using tesseract and seeing some
> text extraction quality issues.
>
> Thanks,
> Abilash Mathew
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>

Reply via email to