risdenk commented on a change in pull request #621: URL: https://github.com/apache/solr/pull/621#discussion_r806070690
########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: I checked this and tika-core is not included in the langid lib dir. ########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: So I looked into this more and it doesn't seem like there is clear documentation on how this worked in Solr 8.x. It looks like there is an implicit assumption that both `extraction` and `langid` would be on the Solr classpath for `langid` to work with Tika. Personally it feels way cleaner to me for each `module` to be independent and therefore be included as standalone. This would end up with duplicate jars. I think this is how packages would work though. I understand the concern about not having duplicate jars in the distribution due to size though. If we are that concerned about distribution size, wouldn't it make more sense to have each contrib as a separate download instead of trying to play tricks with these specific jars? I played around with `api` vs `implementation` for this change - and it has no effect on the `langid` module lib. The only way to remove the jar from `langid` is to switch to `compileOnly` instead of `implementation` in the `langid` module. I can make that change if that would be the preferred approach. ########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: Addressed the jar duplication in e65f2f559b5a8a36c909f34574289e2e6b815dd1 ########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: So I'm on the fence about this with Tika. Since `tika-core` is NOT the only thing needed to do language detection as far as I know. It would be a lot more duplication of jars just to get language detection working. (I'm not even 100% sure that all the tika parsers in the extraction module are right for Tika to do language detection with the langid module). I think leaving it like this is going to be the most expected for now. I do think this needs more thought separately though. ########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: There is an entire `tika-langdetect` dependency - https://search.maven.org/artifact/org.apache.tika/tika-langdetect/1.28.1/bundle or https://mvnrepository.com/artifact/org.apache.tika/tika-langdetect/1.28.1. I played with `langid` years ago and remember having to add jars to get it to actually work. I could add `tika-langdetect` as `runtimeOnly` in the `langid` module, but that adds more jars. ########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: Created https://issues.apache.org/jira/browse/SOLR-16010 to follow up separately on this. ########## File path: solr/modules/extraction/build.gradle ########## @@ -26,55 +26,18 @@ dependencies { implementation 'org.apache.lucene:lucene-core' implementation 'org.slf4j:slf4j-api' - // We export tika because other modules depend on it (and its submodules) Review comment: Thanks to Jan for looking into this in SOLR-16010 - I reverted the `langid` changes since what is on `main` works and only duplicates `tika-core` jar as expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org