On Wed, 17 Oct 2018, Matus UHLAR - fantomas wrote:
On 16.10.18 18:42, RW wrote:
Bayes might work, but I wouldn't like to see it added to body text
because corrupted text could look like obfuscation.
it should be pushed back to body text just for filters like bayes.
The same could/should be done for attachhed .doc, .pdf files etc.
...which would be much more reliable than OCR.
If it was a resource-allocation decision for pulling text from doc/pdf vs.
updating OCR, I'd push for the former.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The problem is when people look at Yahoo, slashdot, or groklaw and
jump from obvious and correct observations like "Oh my God, this
place is teeming with utter morons" to incorrect conclusions like
"there's nothing of value here". -- Al Petrofsky, in Y! SCOX
-----------------------------------------------------------------------
566 days since the first commercial re-flight of an orbital booster (SpaceX)