[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849103#comment-17849103 ]
Tim Allison edited comment on TIKA-4243 at 5/24/24 1:00 PM: ------------------------------------------------------------ Proposed basic roadmap: Add parseContext to fetchers and emitters (and pipesReporter?) Serialize ParseContext as is... Allow for serialization of current XConfigs, eg. PDFParserConfig, etc. Add creation of parsers with e.g. new PDFParser(ParseContext context). Wire config stuff into tika-server, tika-pipes, tika-app Merge tika-grpc-server with new config options This would require serialization of classes that users want to be able to configure + serialization. This would allow us to get rid of all of our custom serialization stuff for Tika 4.x. was (Author: talli...@mitre.org): Proposed basic roadmap: Serialize ParseContext as is... Allow for serialization of current XConfigs, eg. PDFParserConfig, etc. Add creation of parsers with e.g. new PDFParser(ParseContext context). Wire config stuff into tika-server, tika-pipes, tika-app Merge tika-grpc-server with new config options This would require serialization of classes that users want to be able to configure + serialization. This would allow us to get rid of all of our custom serialization stuff for Tika 4.x. > tika configuration overhaul > --------------------------- > > Key: TIKA-4243 > URL: https://issues.apache.org/jira/browse/TIKA-4243 > Project: Tika > Issue Type: New Feature > Components: config > Affects Versions: 3.0.0 > Reporter: Nicholas DiPiazza > Priority: Major > > In 3.0.0 when dealing with Tika, it would greatly help to have a Typed > Configuration schema. > In 3.x can we remove the old way of doing configs and replace with Json > Schema? > Json Schema can be converted to Pojos using a maven plugin > [https://github.com/joelittlejohn/jsonschema2pojo] > This automatically creates a Java Pojo model we can use for the configs. > This can allow for the legacy tika-config XML to be read and converted to the > new pojos easily using an XML mapper so that users don't have to use JSON > configurations yet if they do not want. > When complete, configurations can be set as XML, JSON or YAML > tika-config.xml > tika-config.json > tika-config.yaml > Replace all instances of tika config annotations that used the old syntax, > and replace with the Pojo model serialized from the xml/json/yaml. -- This message was sent by Atlassian Jira (v8.20.10#820010)