Hi all, (Sorry Kenn, I didn't mean to interrupt the flow of your previous conversation. I was just finishing this email when yours came through...)
I need some opinion from you all regarding Spark and the SLF4J 2.x upgrade. Here is some background. - Spark used to have compile dependency on SLF4J 1.x.(e.g. https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0). But since 3.4.0 (( https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.4.0)), it has switched to SLF4J 2.x. - In an internal logging module of Spark <3.4.0, it references a class `StaticLoggerBinder` ( https://github.com/apache/spark/blob/v3.2.1/core/src/main/scala/org/apache/spark/internal/Logging.scala#L222) which only exists in SLF4J 1.x binding artifacts (e.g. org.slf4j:slf4j-simple, org.slf4j:slf4j-reload4j, etc). When we upgrade SLF4J and its related artifacts to 2.x, the previously mentioned class no longer exists, this will cause error like " java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder" During the upgrade, I have seen test failures in two sub-projects in Beam. 1. SparkReceiverIO ( https://github.com/apache/beam/tree/master/sdks/java/io/sparkreceiver/2). - This IO was built on top of Spark 2.x and so some tests are failing when upgrading SLF4J to 2.x - I am wondering *if there is any objection* to upgrading it to use Spark 3.x. (In fact, I tested out this idea and tests related to SparkReceiverIo run fine on Spark 3.x) 2. SparkRunner ( https://github.com/apache/beam/tree/master/runners/spark/3) - As I mentioned above, some versions of Spark 3.x do not work along well with SLF4J 2.x. Specifically, version check tests (e.g. runners:spark:3:sparkVersionsTest) failed on 3.2.x. - In theory, any Spark < 3.4.0 might be impacted, but due to certain transitive dependency (and also some luck?), only 3.2.x tests failed. See my comment at https://github.com/apache/beam/pull/33574/files#diff-78a108ab469ee9be0d8fae0f18c0c143e04fc24d44f9f78f65b97434fc234890 for more details. - Do we want to continue to support Spark <3.4.0 or do we want to take some of the versions out because of this SLF4J upgrade. Last thing, there is a workaround (more like a hack) to support Spark < 3.4.0 under SLF4J 2.x: by putting a SLF4J 1.x binding which is not under group `org.slf4j` in the dependency ( https://github.com/apache/beam/pull/33574). An example is `org.apache.logging.log4j:log4j-slf4j-impl`. Check out my link above for more details. In my opinion, the mix use of SFL4J 1.x and 2.x could be a problem, but this seems to be a way if we want to continue our support on older versions of Spark. If you have any other ideas, please feel free to share here. Thanks, Shunping On Tue, Feb 4, 2025 at 3:13 PM Shunping Huang <mark.sphu...@gmail.com> wrote: > Hi everyone, > > I put together a short doc to summarize the existing logging > infrastructure(dependencies) in Beam Java and outline a plan to improve it. > Basically, we are on the path towards slf4j 2.x. > > > https://docs.google.com/document/d/1IkbiM4m8D-aB3NYI1aErFZHt6M7BQ-8eCULh284Davs/edit?usp=sharing > > If you are interested in this topic, please take a look and share any > feedback. > > Regards, > > Shunping >