Thanks Sean fir the quick reply and providing the valuable information. Regards, Abilash Mathew
-----Original Message----- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Monday, October 16, 2017 8:17 PM To: dev@ctakes.apache.org Subject: RE: OCR engine used [EXTERNAL] Hi Abilash Mathew, I have only used Tesseract. Unfortunately, no ocr is perfect. I am by no means an expert on Tesseract, but perhaps I can help to get you started ... There are tricks that you can use to get it to work better with medical notes (besides training on fonts). Possibly the most effective is using a whitelist of desired characters using tessedit_char_whitelist and a series of characters that doesn't include things like hash, dollar, bar ... Another is to add a wordlist that contains words pertinent to your domain. See: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#dictionaries-word-lists-and-patterns https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#config-files-and-augmenting-with-user-data https://stackoverflow.com/questions/9568165/custom-dictionary-for-tesseract https://www.mail-archive.com/tesseract-ocr@googlegroups.com/msg10100.html Good luck, Sean -----Original Message----- From: abilash.mat...@cognizant.com [mailto:abilash.mat...@cognizant.com] Sent: Monday, October 16, 2017 10:13 AM To: dev@ctakes.apache.org Subject: OCR engine used [EXTERNAL] Hi All, Can you guys give some of the OCR engines used for Medical record text extraction from images? I am currently using tesseract and seeing some text extraction quality issues. Thanks, Abilash Mathew This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.