[
https://issues.apache.org/jira/browse/TIKA-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910959#comment-13910959
]
Ken Krugler commented on TIKA-1247:
-----------------------------------
There was a very lengthy discussion of this a few years ago, some of which was
captured by TIKA-686.
Unfortunately there was no clear consensus on a way forward (all solutions had
various problems or concerns) so Jukka resolved TIKA-686 as "Won't Fix".
> Explode monolithic parsers module into smaller ones
> ---------------------------------------------------
>
> Key: TIKA-1247
> URL: https://issues.apache.org/jira/browse/TIKA-1247
> Project: Tika
> Issue Type: Improvement
> Reporter: Cservenak, Tamas
>
> Right now, there is one monolithic parsers module, that, if used in Maven,
> pulls in not only the whole Internet, but beyond. Also, am not certain that
> every use case that for example uses HTML parser needs Microsoft related
> parsers, etc. Make it more granular.
> Proposed solution: Explode the parsers module into smaller set of modules.
> Let the build tool figure out what user need, for example if user using Maven
> adds "chm" parser as dependency, Maven will figure out the "chm" > "html" >
> "txt" and "tike-core" dependencies by itself, and no transitive dependency
> hunting (for inclusion or exclusion) is needed.
> There is a PR in WIP state with ongoing work:
> https://github.com/apache/tika/pull/5
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)