Awesome. It is a huge lift, and I still need to extract tika-pipes-core out of tika-core. :D
Onwards and let me know how I can help. On Mon, Dec 16, 2024 at 11:51 AM Nicholas DiPiazza <nicholas.dipia...@gmail.com> wrote: > > I have much work done for this here > https://github.com/nddipiazza/tika-pipes > > I am waiting for some time to be set aside to finish > > On Mon, Dec 16, 2024, 10:05 AM Tim Allison <talli...@apache.org> wrote: > > > I don't think you took it this way, but I didn't mean to cast shade on > > Solr! > > > > Y, let's see what we can do in the new year to integrate pf4j or > > otherwise isolat pipes module dependencies in the new year! > > > > Thank you, Eric! > > > > On Mon, Dec 16, 2024 at 10:52 AM Eric Pugh > > <ep...@opensourceconnections.com> wrote: > > > > > > There is some work on getting to Jetty 12…. > > https://github.com/apache/solr/pull/2876 > > > > > > Having said that, I do think pf4j offers some really great capablities, > > and I wish that a POC for that for Solr had been adopted instead of our > > homegrown approach… > > > > > > > > > > > > > > > > On Dec 16, 2024, at 10:50 AM, Tim Allison <talli...@apache.org> wrote: > > > > > > > > Tilman found a good example of why we should try to move to pf4j for > > > > the tika-pipes modules (at least): > > > > https://github.com/apache/tika/blob/main/tika-parent/pom.xml#L345 > > > > https://github.com/apache/tika/blob/main/tika-parent/pom.xml#L383 > > > > > > > > We can't upgrade to cxf 4.1.0/jetty 12 because of the Solr > > > > dependencies in the solr-pipes modules. > > > > > > > > On Mon, Aug 26, 2024 at 9:14 AM Eric Pugh > > > > <ep...@opensourceconnections.com> wrote: > > > >> > > > >> Just wanted to put in a +1 for this idea….! Years ago Jan did a > > spike for this in Solr, https://issues.apache.org/jira/browse/SOLR-10665 > > and I was really excited about it. We ended up deciding to go in a home > > grown manner, and the results today are, well, in my opinion, kind of what > > you expect from a home grown solution. We have a cool, but slightly > > orphaned package manager, and we’ve interacted on how to store plugins at > > least twice. > > > >> > > > >> If I had it all over again, I’d go back to using PF4J. I love how > > few dependencies it brings, and how it has a strong focus! > > > >> > > > >> I will watch this effort with interest and if it succeeds, it might > > reignite my interest in pushing this for Solr. > > > >> > > > >> Eric > > > >> > > > >> > > > >>> On Aug 24, 2024, at 1:09 PM, Nicholas DiPiazza < > > nicholas.dipia...@gmail.com> wrote: > > > >>> > > > >>> Dear Tika Devs: > > > >>> > > > >>> Tika pipes in production had a blocker problem for my peoples in > > that the > > > >>> extensible Fetcher objects we have loaded into the Tika Server and > > Tika > > > >>> Grpc Server would have classpath loading issues with other Fetchers. > > They > > > >>> need to be purely classpath independent of each other. > > > >>> > > > >>> In order to fix this, I am attempting to introduce pf4j in this pull: > > > >>> > > > >>> https://github.com/apache/tika/pull/1906 > > > >>> > > > >>> In this pull, the shade plugin goes completely bye-bye in favor of > > Maven > > > >>> dependency plugin and assembly plugin. > > > >>> > > > >>> All Fetchers are now loaded via the plugin manager and classpath > > pulled in > > > >>> dynamically with a separate classloader than those of other Fetchers. > > > >>> > > > >>> Great. > > > >>> > > > >>> Some changes come as a result: > > > >>> > > > >>> So now instead of having <fetcher> in the tika configuration. It's > > actually > > > >>> <fetcherConfig> because we don't need a full copy of the Fetcher > > anymore. > > > >>> > > > >>> So now the fetcherConfig is the only thing stored in the Tika Config > > and > > > >>> the pf4j plugin manager handles loading the correct Fetcher, and > > then you > > > >>> send it the configuration that it requires. > > > >>> > > > >>> So now I'm going into the Tika xml serialization stuff I need to > > place the > > > >>> FetcherConfig to replace the Fetcher objects previously stored there. > > > >>> > > > >>> I figured this is a good time to take a step back and share with > > everyone. > > > >>> I would like to do a quick zoom with Tim and others to review the PR > > and > > > >>> discuss how to gracefully make that change to the Tika serialization > > stuff > > > >>> so that I don't step on toes of other intensions. > > > >>> > > > >>> After this is merged, I'd like to build another RC so I can see if > > the > > > >>> issues reported by users are fixed. > > > >>> > > > >>> -Nicholas > > > >> > > > >> _______________________ > > > >> Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | > > http://www.opensourceconnections.com < > > http://www.opensourceconnections.com/> | My Free/Busy < > > http://tinyurl.com/eric-cal> > > > >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > > > >> This e-mail and all contents, including attachments, is considered to > > be Company Confidential unless explicitly stated otherwise, regardless of > > whether attachments are marked as such. > > > >> > > > > > > _______________________ > > > Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | > > http://www.opensourceconnections.com < > > http://www.opensourceconnections.com/> | My Free/Busy < > > http://tinyurl.com/eric-cal> > > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < > > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw > > > > > > This e-mail and all contents, including attachments, is considered to be > > Company Confidential unless explicitly stated otherwise, regardless of > > whether attachments are marked as such. > > > > >