[ 
https://issues.apache.org/jira/browse/TIKA-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910959#comment-13910959
 ] 

Ken Krugler commented on TIKA-1247:
-----------------------------------

There was a very lengthy discussion of this a few years ago, some of which was 
captured by TIKA-686.

Unfortunately there was no clear consensus on a way forward (all solutions had 
various problems or concerns) so Jukka resolved TIKA-686 as "Won't Fix".

> Explode monolithic parsers module into smaller ones
> ---------------------------------------------------
>
>                 Key: TIKA-1247
>                 URL: https://issues.apache.org/jira/browse/TIKA-1247
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Cservenak, Tamas
>
> Right now, there is one monolithic parsers module, that, if used in Maven, 
> pulls in not only the whole Internet, but beyond. Also, am not certain that 
> every use case that for example uses HTML parser needs Microsoft related 
> parsers, etc. Make it more granular.
> Proposed solution: Explode the parsers module into smaller set of modules. 
> Let the build tool figure out what user need, for example if user using Maven 
> adds "chm" parser as dependency, Maven will figure out the "chm" > "html" > 
> "txt" and "tike-core" dependencies by itself, and no transitive dependency 
> hunting (for inclusion or exclusion) is needed.
> There is a PR in WIP state with ongoing work:
> https://github.com/apache/tika/pull/5



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to