Hello,

I am currently doing the same project under OCR. Anyone had any experience 
detecting redactions using OCR?

Regards,
Fred.
On Thursday, December 18, 2014 at 5:41:20 AM UTC+8 Patrick Durusau wrote:

> Greetings!
>
> I recently had wonderful success with tesseract-ocr on grand jury 
> transcripts but now have a harder problem.
>
> Can tesseract be trained to recognize censoring blocks in text? For 
> example:
>
> Assume this sentence has XXXXXXXXXXXXX a censoring block that obscures all 
> the text it covers. (here represented by the X's, in the text, it is a 
> solid black line)
>
> What I want to do, in addition to recognizing the surrounding text, is to 
> train tesseract to substitute for the black mark, (redaction - N) where N 
> is the length of the redaction. 
>
> There aren't that many different sized redactions, well, probably from one 
> character space or a little better up to an entire line so producing 
> examples of all the blackouts would be tedious but not difficult. 
>
> Is that pushing tesseract in a direction it is not meant to go? 
>
> If so, any suggestions on software that might be better suited to the task?
>
> Thanks!
>
> Patrick
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0eacea36-7c0e-45e4-9a19-52913a402a1fn%40googlegroups.com.

Reply via email to