Dear Tika Devs: Tika pipes in production had a blocker problem for my peoples in that the extensible Fetcher objects we have loaded into the Tika Server and Tika Grpc Server would have classpath loading issues with other Fetchers. They need to be purely classpath independent of each other.
In order to fix this, I am attempting to introduce pf4j in this pull: https://github.com/apache/tika/pull/1906 In this pull, the shade plugin goes completely bye-bye in favor of Maven dependency plugin and assembly plugin. All Fetchers are now loaded via the plugin manager and classpath pulled in dynamically with a separate classloader than those of other Fetchers. Great. Some changes come as a result: So now instead of having <fetcher> in the tika configuration. It's actually <fetcherConfig> because we don't need a full copy of the Fetcher anymore. So now the fetcherConfig is the only thing stored in the Tika Config and the pf4j plugin manager handles loading the correct Fetcher, and then you send it the configuration that it requires. So now I'm going into the Tika xml serialization stuff I need to place the FetcherConfig to replace the Fetcher objects previously stored there. I figured this is a good time to take a step back and share with everyone. I would like to do a quick zoom with Tim and others to review the PR and discuss how to gracefully make that change to the Tika serialization stuff so that I don't step on toes of other intensions. After this is merged, I'd like to build another RC so I can see if the issues reported by users are fixed. -Nicholas