Dear Tika Devs:

Tika pipes in production had a blocker problem for my peoples in that the
extensible Fetcher objects we have loaded into the Tika Server and Tika
Grpc Server would have classpath loading issues with other Fetchers. They
need to be purely classpath independent of each other.

In order to fix this, I am attempting to introduce pf4j in this pull:

https://github.com/apache/tika/pull/1906

In this pull, the shade plugin goes completely bye-bye in favor of Maven
dependency plugin and assembly plugin.

All Fetchers are now loaded via the plugin manager and classpath pulled in
dynamically with a separate classloader than those of other Fetchers.

Great.

Some changes come as a result:

So now instead of having <fetcher> in the tika configuration. It's actually
<fetcherConfig> because we don't need a full copy of the Fetcher anymore.

So now the fetcherConfig is the only thing stored in the Tika Config and
the pf4j plugin manager handles loading the correct Fetcher, and then you
send it the configuration that it requires.

So now I'm going into the Tika xml serialization stuff I need to place the
FetcherConfig to replace the Fetcher objects previously stored there.

I figured this is a good time to take a step back and share with everyone.
I would like to do a quick zoom with Tim and others to review the PR and
discuss how to gracefully make that change to the Tika serialization stuff
so that I don't step on toes of other intensions.

After this is merged, I'd like to build another RC so I can see if the
issues reported by users are fixed.

-Nicholas

Reply via email to