Nicholas DiPiazza created TIKA-4243:
---------------------------------------

             Summary: tika configuration overhaul
                 Key: TIKA-4243
                 URL: https://issues.apache.org/jira/browse/TIKA-4243
             Project: Tika
          Issue Type: New Feature
          Components: config
    Affects Versions: 3.0.0
            Reporter: Nicholas DiPiazza


In 3.0.0 when dealing with Tika, it would greatly help to have a Typed 
Configuration schema. 

In 3.x can we remove the old way of doing configs and replace with Json Schema?

Json Schema can be converted to Pojos using a maven plugin 
[https://github.com/joelittlejohn/jsonschema2pojo]

This automatically creates a Java Pojo model we can use for the configs. 

This can allow for the legacy tika-config XML to be read and converted to the 
new pojos easily using an XML mapper so that users don't have to use JSON 
configurations yet if they do not want.

When complete, configurations can be set as XML, JSON or YAML

tika-config.xml

tika-config.json

tika-config.yaml

Replace all instances of tika config annotations that used the old syntax, and 
replace with the Pojo model serialized from the xml/json/yaml.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to