[ 
https://issues.apache.org/jira/browse/TIKA-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2369:
------------------------------
    Fix Version/s:     (was: 2.0.0)
                   2.0.0-BETA

> Define a clean Recogniser interface: for objects from binary data; and for 
> text classification
> ----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2369
>                 URL: https://issues.apache.org/jira/browse/TIKA-2369
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>            Priority: Major
>             Fix For: 1.17, 2.0.0-BETA, 2.0.1
>
>
> As described in TIKA-2360 we should refactor the ObjectRecogniser interface. 
> I propose creating:
> 1. TextRecogniser (per [~thammegowda] it takes INPUT:text input and 
> OUTPUT:set of metadata key values)
> 2. ObjectRecogniser (also per Thamme ObjectRecogniser, VideoLabeller, OCR, 
> Caption - INPUT:raw bytes and OUTPUT:set of metadata key values.)
> We should of course rectify this with Tika-DL and how that folds in. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to