janhoy commented on a change in pull request #621: URL: https://github.com/apache/solr/pull/621#discussion_r806642830
########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: `tika-core` should contain everything needed for language identification. It is 718kb, so not a big deal. If we don't care for windows users we could symlink that jar from extraction to langid in the tarball :) ########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: Yea, I think Tika 1.x has the original tika detector in tika-core, while 2.x have moved it out to a separate tika-langdetect. Solr still depends on 1.x. Solr also has a choice of detectors (https://github.com/apache/solr/tree/main/solr/modules/langid/src/java/org/apache/solr/update/processor) which is overlapping with Tika's own (https://github.com/apache/tika/tree/main/tika-langdetect) with our own abstraction. Perhaps it would be better to delegate everything to tika-langdetect and kill our own custom ones. We'd still duplicate tika-core, but our langid module would use tika-langdetect which would not necessarily be needed in extraction module (unless we want to detect language during extraction). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org