Awesome. It is a huge lift, and I still need to extract
tika-pipes-core out of tika-core. :D

Onwards and let me know how I can help.

On Mon, Dec 16, 2024 at 11:51 AM Nicholas DiPiazza
<nicholas.dipia...@gmail.com> wrote:
>
> I have much work done for this here
> https://github.com/nddipiazza/tika-pipes
>
> I am waiting for some time to be set aside to finish
>
> On Mon, Dec 16, 2024, 10:05 AM Tim Allison <talli...@apache.org> wrote:
>
> > I don't think you took it this way, but I didn't mean to cast shade on
> > Solr!
> >
> > Y, let's see what we can do in the new year to integrate pf4j or
> > otherwise isolat pipes module dependencies in the new year!
> >
> > Thank you, Eric!
> >
> > On Mon, Dec 16, 2024 at 10:52 AM Eric Pugh
> > <ep...@opensourceconnections.com> wrote:
> > >
> > > There is some work on getting to Jetty 12….
> > https://github.com/apache/solr/pull/2876
> > >
> > > Having said that, I do think pf4j offers some really great capablities,
> > and I wish that a POC for that for Solr had been adopted instead of our
> > homegrown approach…
> > >
> > >
> > >
> > >
> > > > On Dec 16, 2024, at 10:50 AM, Tim Allison <talli...@apache.org> wrote:
> > > >
> > > > Tilman found a good example of why we should try to move to pf4j for
> > > > the tika-pipes modules (at least):
> > > > https://github.com/apache/tika/blob/main/tika-parent/pom.xml#L345
> > > > https://github.com/apache/tika/blob/main/tika-parent/pom.xml#L383
> > > >
> > > > We can't upgrade to cxf 4.1.0/jetty 12 because of the Solr
> > > > dependencies in the solr-pipes modules.
> > > >
> > > > On Mon, Aug 26, 2024 at 9:14 AM Eric Pugh
> > > > <ep...@opensourceconnections.com> wrote:
> > > >>
> > > >> Just wanted to put in a +1 for this idea….!  Years ago Jan did a
> > spike for this in Solr, https://issues.apache.org/jira/browse/SOLR-10665
> > and I was really excited about it.  We ended up deciding to go in a home
> > grown manner, and the results today are, well, in my opinion, kind of what
> > you expect from a home grown solution.  We have a cool, but slightly
> > orphaned package manager, and we’ve interacted on how to store plugins at
> > least twice.
> > > >>
> > > >> If I had it all over again, I’d go back to using PF4J.   I love how
> > few dependencies it brings, and how it has a strong focus!
> > > >>
> > > >> I will watch this effort with interest and if it succeeds, it might
> > reignite my interest in pushing this for Solr.
> > > >>
> > > >> Eric
> > > >>
> > > >>
> > > >>> On Aug 24, 2024, at 1:09 PM, Nicholas DiPiazza <
> > nicholas.dipia...@gmail.com> wrote:
> > > >>>
> > > >>> Dear Tika Devs:
> > > >>>
> > > >>> Tika pipes in production had a blocker problem for my peoples in
> > that the
> > > >>> extensible Fetcher objects we have loaded into the Tika Server and
> > Tika
> > > >>> Grpc Server would have classpath loading issues with other Fetchers.
> > They
> > > >>> need to be purely classpath independent of each other.
> > > >>>
> > > >>> In order to fix this, I am attempting to introduce pf4j in this pull:
> > > >>>
> > > >>> https://github.com/apache/tika/pull/1906
> > > >>>
> > > >>> In this pull, the shade plugin goes completely bye-bye in favor of
> > Maven
> > > >>> dependency plugin and assembly plugin.
> > > >>>
> > > >>> All Fetchers are now loaded via the plugin manager and classpath
> > pulled in
> > > >>> dynamically with a separate classloader than those of other Fetchers.
> > > >>>
> > > >>> Great.
> > > >>>
> > > >>> Some changes come as a result:
> > > >>>
> > > >>> So now instead of having <fetcher> in the tika configuration. It's
> > actually
> > > >>> <fetcherConfig> because we don't need a full copy of the Fetcher
> > anymore.
> > > >>>
> > > >>> So now the fetcherConfig is the only thing stored in the Tika Config
> > and
> > > >>> the pf4j plugin manager handles loading the correct Fetcher, and
> > then you
> > > >>> send it the configuration that it requires.
> > > >>>
> > > >>> So now I'm going into the Tika xml serialization stuff I need to
> > place the
> > > >>> FetcherConfig to replace the Fetcher objects previously stored there.
> > > >>>
> > > >>> I figured this is a good time to take a step back and share with
> > everyone.
> > > >>> I would like to do a quick zoom with Tim and others to review the PR
> > and
> > > >>> discuss how to gracefully make that change to the Tika serialization
> > stuff
> > > >>> so that I don't step on toes of other intensions.
> > > >>>
> > > >>> After this is merged, I'd like to build another RC so I can see if
> > the
> > > >>> issues reported by users are fixed.
> > > >>>
> > > >>> -Nicholas
> > > >>
> > > >> _______________________
> > > >> Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 |
> > http://www.opensourceconnections.com <
> > http://www.opensourceconnections.com/> | My Free/Busy <
> > http://tinyurl.com/eric-cal>
> > > >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > >
> > > >> This e-mail and all contents, including attachments, is considered to
> > be Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> > > >>
> > >
> > > _______________________
> > > Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 |
> > http://www.opensourceconnections.com <
> > http://www.opensourceconnections.com/> | My Free/Busy <
> > http://tinyurl.com/eric-cal>
> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > >
> > > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> > >
> >

Reply via email to