Hi Ayush, thanks for your time investigating!
I followed your recommendation and it seems to work (also for some of our consumer projects), so thanks a lot for your time! Gruß Richard Am Samstag, dem 13.04.2024 um 03:35 +0530 schrieb Ayush Saxena: > Hi Richard, > Thanx for sharing the steps to reproduce the issue. I cloned the > Apache Storm repo and was able to repro the issue. The build was > indeed failing due to missing classes. > > Spent some time to debug the issue, might not be very right (no > experience with Storm), There are Two ways to get this going > > First Approach: If we want to use the shaded classes > > 1. I think the artifact to be used for minicluster should be `hadoop- > client-minicluster`, even spark uses the same [1], the one which you > are using is `hadoop-minicluster`, which in its own is empty > ``` > ayushsaxena@ayushsaxena ~ % jar tf > /Users/ayushsaxena/.m2/repository/org/apache/hadoop/hadoop- > minicluster/3.3.6/hadoop-minicluster-3.3.6.jar | grep .class > ayushsaxena@ayushsaxena ~ % > ``` > > It just defines artifacts which are to be used by `hadoop-client- > minicluster` and this jar has that shading and stuff, using `hadoop- > minicluster` is like adding the hadoop dependencies into the pom > transitively, without any shading or so, which tends to conflict with > `hadoop-client-api` and `hadoop-client-runtime` jars, which uses the > shaded classes. > > 2. Once you change `hadoop-minicluster` to `hadoop-client- > minicluster`, still the tests won't pass, the reason being the > `storm-autocreds` dependency which pulls in the hadoop jars via > `hbase-client` & `hive-exec`, So, we need to exclude them as well > > 3. I reverted your classpath hack, changed the jar, & excluded the > dependencies from storm-autocreds & ran the storm-hdfs tests & all > the tests passed, which were failing initially without any code > change > ``` > [INFO] Results: > [INFO] > [INFO] Tests run: 57, Failures: 0, Errors: 0, Skipped: 0 > [INFO] > [INFO] -------------------------------------------------------------- > ---------- > [INFO] BUILD SUCCESS > [INFO] -------------------------------------------------------------- > ---------- > ``` > > 4. Putting the code diff here might make this mail unreadable, so I > am sharing the link to the commit which fixed Storm for me here [2], > let me know if it has any access issues, I will put the diff on the > mail itself in text form. > > Second Approach: If we don't want to use the shaded classes > > 1. The `hadoop-client-api` & the` hadoop-client-runtime` jars uses > shading which tends to conflict with your non shaded `hadoop- > minicluster`, Rather than using these jars use the `hadoop-client` > jar > > 2. I removed your hack & changed those two jars with `hadoop-client` > jar & the storm-hdfs tests passes > > 3. I am sharing the link to the commit in my fork, it is here at [3], > one advantage is, you don't have to change your existing jar nor you > would need to add those exclusions in the `storm-cred` dependency. > > ++ Adding common-dev, in case any fellow developers with more > experience around using the hadoop-client jars can help, if things > still don't work or Storm needs something more. The downstream > projects which I have experience with don't use these jars (which > they should ideally) :-) > > -Ayush > > > [1] https://github.com/apache/spark/blob/master/pom.xml#L1382 > [2] > https://github.com/ayushtkn/storm/commit/e0cd8e21201e01d6d0e1f3ac1bc5ada8354436e6 > [3] > https://github.com/apache/storm/commit/fb5acdedd617de65e494c768b6ae4b > ab9b3f7ac8 > > > On Fri, 12 Apr 2024 at 10:41, Richard Zowalla <r...@apache.org> > wrote: > > Hi, > > > > thanks for the fast reply. The PR is here [1]. > > > > It works, if I exclude the client-api and client-api-runtime from > > being scanned in surefire, which is a hacky workaround for the > > actual issue. > > > > The hadoop-commons jar is a transient dependency of the > > minicluster, which is used for testing. > > > > Debugging the situation shows, that HttpServer2 is in the same > > package in hadoop-commons as well as in the client-api but with > > differences in methods / classes used, so depending on the > > classpath order the wrong class is loaded. > > > > Stacktraces are in the first GH Action run.here: [1]. > > > > A reproducer would be to check out Storm, go to storm-hdfs and > > remove the exclusion in [2] and run the tests in that module, which > > will fail due to a missing jetty server class (as the HTTPServer2 > > class is loaded from client-api instead of minicluster). > > > > Gruß & Thx > > Richard > > > > [1] https://github.com/apache/storm/pull/3637 > > [2] > > https://github.com/apache/storm/blob/e44f72767370d10a682446f8f36b75242040f675/external/storm-hdfs/pom.xml#L120 > > > > On 2024/04/11 21:29:13 Ayush Saxena wrote: > > > Hi Richard, > > > I am not able to decode the issue properly here, It would have > > > been > > > better if you shared the PR or the failure trace as well. > > > QQ: Why are you having hadoop-common as an explicit dependency? > > > Those > > > hadoop-common stuff should be there in hadoop-client-api > > > I quickly checked once on the 3.4.0 release and I think it does > > > have them. > > > > > > ``` > > > ayushsaxena@ayushsaxena client % jar tf hadoop-client-api- > > > 3.4.0.jar | > > > grep org/apache/hadoop/fs/FileSystem.class > > > org/apache/hadoop/fs/FileSystem.class > > > `` > > > > > > You didn't mention which shaded classes are being reported as > > > missing... I think spark uses these client jars, you can use that > > > as > > > an example, can grab pointers from here: [1] & [2] > > > > > > -Ayush > > > > > > [1] https://github.com/apache/spark/blob/master/pom.xml#L1361 > > > [2] https://issues.apache.org/jira/browse/SPARK-33212 > > > > > > On Thu, 11 Apr 2024 at 17:09, Richard Zowalla <r...@apache.org> > > > wrote: > > > > > > > > Hi all, > > > > > > > > we are using "hadoop-minicluster" in Apache Storm to test our > > > > hdfs > > > > integration. > > > > > > > > Recently, we were cleaning up our dependencies and I noticed, > > > > that if I > > > > am adding > > > > > > > > <dependency> > > > > <groupId>org.apache.hadoop</groupId> > > > > <artifactId>hadoop-client-api</artifactId> > > > > <version>${hadoop.version}</version> > > > > </dependency> > > > > <dependency> > > > > <groupId>org.apache.hadoop</groupId> > > > > <artifactId>hadoop-client-runtime</artifactId> > > > > <version>${hadoop.version}</version> > > > > </dependency> > > > > > > > > and have > > > > <dependency> > > > > <groupId>org.apache.hadoop</groupId> > > > > <artifactId>hadoop-minicluster</artifactId> > > > > <version>${hadoop.version}</version> > > > > <scope>test</scope> > > > > </dependency> > > > > > > > > as a test dependency to setup a mini-cluster to test our storm- > > > > hdfs > > > > integration. > > > > > > > > This fails weirdly because of missing (shaded) classes as well > > > > as a > > > > class ambiquity with HttpServer2. > > > > > > > > It is present as a class inside of the "hadoop-client-api" and > > > > within > > > > "hadoop-common". > > > > > > > > Is this setup wrong or should we try something different here? > > > > > > > > Gruß > > > > Richard > > > > > > ----------------------------------------------------------------- > > > ---- > > > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > > > For additional commands, e-mail: user-h...@hadoop.apache.org > > > > > > > > > > ------------------------------------------------------------------- > > -- > > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: user-h...@hadoop.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org