I think excluding more deps would be good…but challenging. The problem is that some of the jars only wind up getting used for edge cases (e.g. you have an encrypted email, and so you need bouncy castle, or something like that which had bitten me in the past).
So it’s hard to know what’s really required or not. Is there a good Java tool for tracing all possible calls from starting points, to see if it’s even possible to reach a jar? Though that would need some help for cases where we’re dynamically loading classes (mostly plug-in support?) — Ken > On Aug 24, 2016, at 10:59am, Konstantin Gribov <gros...@gmail.com> wrote: > > Hi, folks. > > It seems that we have too much dependencies in `tika-parsers` and many of > them could actually be not used. As Tim found in TIKA-2007 [1] > `jackson-core` wasn't necessary for `tika-parsers` at all. > > When I looked into current parser deps I found a lot of strange deps like > `quartz` with `c3p0` (jdbc connection pool impl) and `ehcache-core` via > `cdm`, lucene parts (via `ctakes-core`), spring framework 3.x (also via > `ctakes-core`) et cetera. Latter could even break app if you have another > spring version in transitive deps. > > Also, there seems to be no tests for ctakes parser on the first glance and > I have no easy way to check what I can exclude from deps without breaking > things. > > What do you think about shrinking some of such deps? With at least minimal > test coverage to ensure common usecases won't be broken, of course. > > [1]: > https://issues.apache.org/jira/browse/TIKA-2007?focusedCommentId=15435206&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15435206 > -- > > Best regards, > Konstantin Gribov -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr