[ https://issues.apache.org/jira/browse/TIKA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-2369: ------------------------------ Fix Version/s: (was: 2.0.0) 2.0.0-BETA > Define a clean Recogniser interface: for objects from binary data; and for > text classification > ---------------------------------------------------------------------------------------------- > > Key: TIKA-2369 > URL: https://issues.apache.org/jira/browse/TIKA-2369 > Project: Tika > Issue Type: Bug > Reporter: Chris A. Mattmann > Assignee: Chris A. Mattmann > Priority: Major > Fix For: 1.17, 2.0.0-BETA, 2.0.1 > > > As described in TIKA-2360 we should refactor the ObjectRecogniser interface. > I propose creating: > 1. TextRecogniser (per [~thammegowda] it takes INPUT:text input and > OUTPUT:set of metadata key values) > 2. ObjectRecogniser (also per Thamme ObjectRecogniser, VideoLabeller, OCR, > Caption - INPUT:raw bytes and OUTPUT:set of metadata key values.) > We should of course rectify this with Tika-DL and how that folds in. -- This message was sent by Atlassian Jira (v8.3.4#803005)