[jira] [Resolved] (SOLR-3808) Extraction contrib to utilize Boilerpipe

Eric Pugh (Jira) Mon, 01 Dec 2025 12:35:09 -0800


     [ 
https://issues.apache.org/jira/browse/SOLR-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eric Pugh resolved SOLR-3808.
-----------------------------
    Resolution: Fixed

In Solr 10 we are leveraging either Tika Server (running in it's own seperate 
server process) or maybe Tika Pipes (again, running in a seperate JVM).   This 
is through a pluggable interface which could also support boilerpipe.

 

Please revalidate your issue against Solr 10 with one of those options, and if 
it is still present need, happy to work with you on a fix using the new 
approach for Tika.

> Extraction contrib to utilize Boilerpipe
> ----------------------------------------
>
>                 Key: SOLR-3808
>                 URL: https://issues.apache.org/jira/browse/SOLR-3808
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Markus Jelsma
>            Priority: Minor
>             Fix For: 6.0
>
>         Attachments: SOLR-3808-trunk-1.patch
>
>
> Solr's extraction contrib uses Tika for document parsing and should be able 
> te use Boilerpipe. Tika comes with Boilerpipe, a library capable of removing 
> boilerplate text from HTML pages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SOLR-3808) Extraction contrib to utilize Boilerpipe

Reply via email to