risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806070690



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       I checked this and tika-core is not included in the langid lib dir.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       So I looked into this more and it doesn't seem like there is clear 
documentation on how this worked in Solr 8.x. It looks like there is an 
implicit assumption that both `extraction` and `langid` would be on the Solr 
classpath for `langid` to work with Tika. 
   
   Personally it feels way cleaner to me for each `module` to be independent 
and therefore be included as standalone. This would end up with duplicate jars. 
I think this is how packages would work though.
   
   I understand the concern about not having duplicate jars in the distribution 
due to size though. If we are that concerned about distribution size, wouldn't 
it make more sense to have each contrib as a separate download instead of 
trying to play tricks with these specific jars?
   
   I played around with `api` vs `implementation` for this change - and it has 
no effect on the `langid` module lib. The only way to remove the jar from 
`langid` is to switch to `compileOnly` instead of `implementation` in the 
`langid` module. I can make that change if that would be the preferred approach.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Addressed the jar duplication in e65f2f559b5a8a36c909f34574289e2e6b815dd1

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       So I'm on the fence about this with Tika. Since `tika-core` is NOT the 
only thing needed to do language detection as far as I know. It would be a lot 
more duplication of jars just to get language detection working. (I'm not even 
100% sure that all the tika parsers in the extraction module are right for Tika 
to do language detection with the langid module). I think leaving it like this 
is going to be the most expected for now. I do think this needs more thought 
separately though.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       There is an entire `tika-langdetect` dependency - 
https://search.maven.org/artifact/org.apache.tika/tika-langdetect/1.28.1/bundle 
or https://mvnrepository.com/artifact/org.apache.tika/tika-langdetect/1.28.1. I 
played with `langid` years ago and remember having to add jars to get it to 
actually work. I could add `tika-langdetect` as `runtimeOnly` in the `langid` 
module, but that adds more jars.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Created https://issues.apache.org/jira/browse/SOLR-16010 to follow up 
separately on this.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Thanks to Jan for looking into this in SOLR-16010 - I reverted the 
`langid` changes since what is on `main` works and only duplicates `tika-core` 
jar as expected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to