janhoy commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806642830



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       `tika-core` should contain everything needed for language 
identification. It is 718kb, so not a big deal.
   If we don't care for windows users we could symlink that jar from extraction 
to langid in the tarball :) 

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Yea, I think Tika 1.x has the original tika detector in tika-core, while 
2.x have moved it out to a separate tika-langdetect. Solr still depends on 1.x. 
Solr also has a choice of detectors 
(https://github.com/apache/solr/tree/main/solr/modules/langid/src/java/org/apache/solr/update/processor)
 which is overlapping with Tika's own 
(https://github.com/apache/tika/tree/main/tika-langdetect) with our own 
abstraction. Perhaps it would be better to delegate everything to 
tika-langdetect and kill our own custom ones. We'd still duplicate tika-core, 
but our langid module would use tika-langdetect which would not necessarily be 
needed in extraction module (unless we want to detect language during 
extraction).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to