[ 
https://issues.apache.org/jira/browse/TIKA-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851822#comment-16851822
 ] 

Jonathan Essex commented on TIKA-2882:
--------------------------------------

I built the 2.x branch and deployed tika-core and tika-parser-modules to my own 
maven repo. Then I added the ooxml parser from tika-parser-modules as a 
dependency for my app and built a couple of test cases.

Unsurprisingly, I had *FAR* fewer dependency issues. In fact the whole thing 
went remarkably smoothly, the only problem was a failing test BundleIT test in 
core related to OSGI (I just commented it out...) and the fact that 
tika-parser-office-module includes slf4j-log4j12 (I just removed it...).

Given how much more useful this makes tika as a library, I think it would be a 
great shame to allow tika-parser-modules to languish in a development branch 
any longer than it has to. What can I do to help (...given my limited 
experience of the Tika codebase)?

> Parsers should not include HTTP client code
> -------------------------------------------
>
>                 Key: TIKA-2882
>                 URL: https://issues.apache.org/jira/browse/TIKA-2882
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.21
>            Reporter: Jonathan Essex
>            Priority: Major
>
> Folks, does it really make sense for a parser to have a REST client built in?
> The GROBID and NLTKNERecogniser parsers use the apache CXF client directly. 
>  
> Since I don't use CXF and my entire app is built on a different JAX-RS stack 
> this just dropped me straight into dependency hell.
> Surely it would make more sense to keep the parsers... well, parsers... and 
> build support for delegating parsing to other services into some higher level 
> in the stack (such as the server, where the CXF dependency is more benign). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to