I have much work done for this here
https://github.com/nddipiazza/tika-pipes

I am waiting for some time to be set aside to finish

On Mon, Dec 16, 2024, 10:05 AM Tim Allison <talli...@apache.org> wrote:

> I don't think you took it this way, but I didn't mean to cast shade on
> Solr!
>
> Y, let's see what we can do in the new year to integrate pf4j or
> otherwise isolat pipes module dependencies in the new year!
>
> Thank you, Eric!
>
> On Mon, Dec 16, 2024 at 10:52 AM Eric Pugh
> <ep...@opensourceconnections.com> wrote:
> >
> > There is some work on getting to Jetty 12….
> https://github.com/apache/solr/pull/2876
> >
> > Having said that, I do think pf4j offers some really great capablities,
> and I wish that a POC for that for Solr had been adopted instead of our
> homegrown approach…
> >
> >
> >
> >
> > > On Dec 16, 2024, at 10:50 AM, Tim Allison <talli...@apache.org> wrote:
> > >
> > > Tilman found a good example of why we should try to move to pf4j for
> > > the tika-pipes modules (at least):
> > > https://github.com/apache/tika/blob/main/tika-parent/pom.xml#L345
> > > https://github.com/apache/tika/blob/main/tika-parent/pom.xml#L383
> > >
> > > We can't upgrade to cxf 4.1.0/jetty 12 because of the Solr
> > > dependencies in the solr-pipes modules.
> > >
> > > On Mon, Aug 26, 2024 at 9:14 AM Eric Pugh
> > > <ep...@opensourceconnections.com> wrote:
> > >>
> > >> Just wanted to put in a +1 for this idea….!  Years ago Jan did a
> spike for this in Solr, https://issues.apache.org/jira/browse/SOLR-10665
> and I was really excited about it.  We ended up deciding to go in a home
> grown manner, and the results today are, well, in my opinion, kind of what
> you expect from a home grown solution.  We have a cool, but slightly
> orphaned package manager, and we’ve interacted on how to store plugins at
> least twice.
> > >>
> > >> If I had it all over again, I’d go back to using PF4J.   I love how
> few dependencies it brings, and how it has a strong focus!
> > >>
> > >> I will watch this effort with interest and if it succeeds, it might
> reignite my interest in pushing this for Solr.
> > >>
> > >> Eric
> > >>
> > >>
> > >>> On Aug 24, 2024, at 1:09 PM, Nicholas DiPiazza <
> nicholas.dipia...@gmail.com> wrote:
> > >>>
> > >>> Dear Tika Devs:
> > >>>
> > >>> Tika pipes in production had a blocker problem for my peoples in
> that the
> > >>> extensible Fetcher objects we have loaded into the Tika Server and
> Tika
> > >>> Grpc Server would have classpath loading issues with other Fetchers.
> They
> > >>> need to be purely classpath independent of each other.
> > >>>
> > >>> In order to fix this, I am attempting to introduce pf4j in this pull:
> > >>>
> > >>> https://github.com/apache/tika/pull/1906
> > >>>
> > >>> In this pull, the shade plugin goes completely bye-bye in favor of
> Maven
> > >>> dependency plugin and assembly plugin.
> > >>>
> > >>> All Fetchers are now loaded via the plugin manager and classpath
> pulled in
> > >>> dynamically with a separate classloader than those of other Fetchers.
> > >>>
> > >>> Great.
> > >>>
> > >>> Some changes come as a result:
> > >>>
> > >>> So now instead of having <fetcher> in the tika configuration. It's
> actually
> > >>> <fetcherConfig> because we don't need a full copy of the Fetcher
> anymore.
> > >>>
> > >>> So now the fetcherConfig is the only thing stored in the Tika Config
> and
> > >>> the pf4j plugin manager handles loading the correct Fetcher, and
> then you
> > >>> send it the configuration that it requires.
> > >>>
> > >>> So now I'm going into the Tika xml serialization stuff I need to
> place the
> > >>> FetcherConfig to replace the Fetcher objects previously stored there.
> > >>>
> > >>> I figured this is a good time to take a step back and share with
> everyone.
> > >>> I would like to do a quick zoom with Tim and others to review the PR
> and
> > >>> discuss how to gracefully make that change to the Tika serialization
> stuff
> > >>> so that I don't step on toes of other intensions.
> > >>>
> > >>> After this is merged, I'd like to build another RC so I can see if
> the
> > >>> issues reported by users are fixed.
> > >>>
> > >>> -Nicholas
> > >>
> > >> _______________________
> > >> Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> > >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> > >> This e-mail and all contents, including attachments, is considered to
> be Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> > >>
> >
> > _______________________
> > Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> > This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >
>

Reply via email to