[ 
https://issues.apache.org/jira/browse/SOLR-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438003#comment-17438003
 ] 

David Eric Pugh commented on SOLR-15765:
----------------------------------------

Thanks for reporting this issue....    We've seen for many years that Tika 
brings a LOT of dependencies to Solr via SolrCell, and that means that for 
these more edge cases for content extraction, the level of testing just isn't 
there.   SolrCell works great on PDF's and the like, but your usecase is a lot 
more niche ;).

If you wanted to look at the dependency issue, I'd be happy to review a PR for 
this.  

However, I think the future of content extraction in Solr is to actually figure 
out how to seperate out the extraction process (and all of it's attendent 
dependencies), but still make it VERY simple to do content extraction with 
Solr.    I've just started looking at 
[https://cwiki.apache.org/confluence/display/TIKA/tika-pipes#tikapipes-examples]
 and I believe that we could evolve our setup to where instead of runnign 
content extraction in the Solr process, we instead take the binary, send it to 
seperate Tika process (or farm of processes!) and then recieve back the 
extracted content.   This is still very early ruminations.  

> Conflicting dependencies on Jackcess in solr-cell 8.10.1
> --------------------------------------------------------
>
>                 Key: SOLR-15765
>                 URL: https://issues.apache.org/jira/browse/SOLR-15765
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 8.10.1
>            Reporter: Markus Günther
>            Priority: Major
>
> I'm currently in the process of migrating from Solr 8.9.0 to 8.10.1 and 
> noticed that extracting content from a Microsoft Access database seems to no 
> longer work. Our testcase throws a java.lang.NoClassDefFoundError on 
> com.healthmarketscience.jackcess.crypt.CryptCodecProvider. There seems to be 
> a misunderstanding between the versions that solr-cell provides for the 
> dependency on com.healthmarketscience.jackcess:* and the version of 
> tika-parser (1.27) uses as a compile-time dependency for solr-cell 8.10.1.
> solr-cell:8.10.1 has a compile-time dependency on:
>  * com.healthmarketscience.jackcess : jackcess : 3.0.1
>  * com.healthmarketscience.jackcess : jackcess-encrypt : 3.0.0
>  * org.apache.tika : tika-parser : 1.27
> tika-parser:1.27 has compile-time dependencies on:
>  * com.healthmarketscience.jackcess : jackcess : 4.0.1
>  * com.healthmarketscience.jackcess : jackcess-encrypt : 4.0.1
> This is in line with the following stacktrace, where the missing class is 
> said to be com.healthmarketscience.jackcess.*crypt*.CryptCodecProvide 
> (4.0.1), while the provided one (3.0.1) is 
> com.healthmarketscience.jackcess.CryptCodecProvider.
> {code:java}
> java.lang.NoClassDefFoundError: 
> com/healthmarketscience/jackcess/crypt/CryptCodecProvider
>       at 
> __randomizedtesting.SeedInfo.seed([5D668C26B254282:F7E1705A93B246C]:0)
>       at 
> org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:93)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:229)
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoaderWithWriteLimit.load(ExtractingDocumentLoaderWithWriteLimit.java:36)
>       at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
>       at 
> org.apache.solr.util.TestHarness.queryAndResponse(TestHarness.java:373)
>       at our.own.package.loadLocalFromHandler(SolrTestCaseBase.java:109)
>       at our.own.package.SolrTestCaseBase.loadLocal(SolrTestCaseBase.java:114)
>       at our.own.package.tika.AccessSearchTest.setUp(AccessSearchTest.java:16)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:972)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
>       at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>       at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>       at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>       at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>       at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>       at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>       at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>       at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
>       at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
>       at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
>       at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
>       at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>       at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>       at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>       at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>       at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
>       at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>       at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>       at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>       at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>       at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>       at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>       at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>       at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
>       at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>       at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>       at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ClassNotFoundException: 
> com.healthmarketscience.jackcess.crypt.CryptCodecProvider
>       at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
>       at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
>       at 
> java.base/java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:899)
>       at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>       ... 54 more
> {code}
> Any suggestions on this matter? I guess this is a dependency management issue 
> that should best be addressed in a (possibly bugfix?) future release of Solr 
> Cell. Any help on this is highly appreciated! Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to