Hello, I am currently doing the same project under OCR. Anyone had any experience detecting redactions using OCR?
Regards, Fred. On Thursday, December 18, 2014 at 5:41:20 AM UTC+8 Patrick Durusau wrote: > Greetings! > > I recently had wonderful success with tesseract-ocr on grand jury > transcripts but now have a harder problem. > > Can tesseract be trained to recognize censoring blocks in text? For > example: > > Assume this sentence has XXXXXXXXXXXXX a censoring block that obscures all > the text it covers. (here represented by the X's, in the text, it is a > solid black line) > > What I want to do, in addition to recognizing the surrounding text, is to > train tesseract to substitute for the black mark, (redaction - N) where N > is the length of the redaction. > > There aren't that many different sized redactions, well, probably from one > character space or a little better up to an entire line so producing > examples of all the blackouts would be tedious but not difficult. > > Is that pushing tesseract in a direction it is not meant to go? > > If so, any suggestions on software that might be better suited to the task? > > Thanks! > > Patrick > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0eacea36-7c0e-45e4-9a19-52913a402a1fn%40googlegroups.com.

