Hi all, I am resending this message again in case you missed it. (Bcc'ing contributors who had recent activities on SparkRunner and SparkReceiverIO)
Particularly, for users or contributors of "*SparkRunner"* and " *SparkReceiverIO"*, please take a look and feel free to share your ideas or concerns. If no objections are received, we will proceed with the upgrade *by the end of this week*. Thanks! Shunping On Wed, Feb 5, 2025 at 10:34 AM Shunping Huang <mark.sphu...@gmail.com> wrote: > Hi all, > > (Sorry Kenn, I didn't mean to interrupt the flow of your previous > conversation. I was just finishing this email when yours came through...) > > I need some opinion from you all regarding Spark and the SLF4J 2.x upgrade. > > Here is some background. > > - Spark used to have compile dependency on SLF4J 1.x.(e.g. > https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0). > But since 3.4.0 (( > > https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.4.0)), > it has switched to SLF4J 2.x. > - In an internal logging module of Spark <3.4.0, it references a class > `StaticLoggerBinder` ( > > https://github.com/apache/spark/blob/v3.2.1/core/src/main/scala/org/apache/spark/internal/Logging.scala#L222) > which only exists in SLF4J 1.x binding artifacts (e.g. > org.slf4j:slf4j-simple, org.slf4j:slf4j-reload4j, etc). When we upgrade > SLF4J and its related artifacts to 2.x, the previously mentioned class no > longer exists, this will cause error like " java.lang.NoClassDefFoundError: > org/slf4j/impl/StaticLoggerBinder" > > > During the upgrade, I have seen test failures in two sub-projects in Beam. > > 1. *SparkReceiverIO* ( > https://github.com/apache/beam/tree/master/sdks/java/io/sparkreceiver/2 > ). > - This IO was built on top of Spark 2.x and so some tests are > failing when upgrading SLF4J to 2.x > - I am wondering *if there is any objection* to upgrading it to use > Spark 3.x. (In fact, I tested out this idea and tests related to > SparkReceiverIo run fine on Spark 3.x) > 2. *SparkRunner* ( > https://github.com/apache/beam/tree/master/runners/spark/3) > - As I mentioned above, some versions of Spark 3.x do not work > along well with SLF4J 2.x. Specifically, version check tests (e.g. > runners:spark:3:sparkVersionsTest) failed on 3.2.x. > - In theory, any Spark < 3.4.0 might be impacted, but due to > certain transitive dependency (and also some luck?), only 3.2.x tests > failed. See my comment at > > https://github.com/apache/beam/pull/33574/files#diff-78a108ab469ee9be0d8fae0f18c0c143e04fc24d44f9f78f65b97434fc234890 > for more details. > - Do we want to continue to support Spark <3.4.0 or do we want to > take some of the versions out because of this SLF4J upgrade. > > > Last thing, there is a workaround (more like a hack) to support Spark < > 3.4.0 under SLF4J 2.x: by putting a SLF4J 1.x binding which is not under > group `org.slf4j` in the dependency ( > https://github.com/apache/beam/pull/33574). An example is > `org.apache.logging.log4j:log4j-slf4j-impl`. In my opinion, the mixed use > of SFL4J 1.x and 2.x could be a problem, but this seems to be a way if we > want to continue our support on older versions of Spark. If you have any > other ideas, please feel free to share here. > > Thanks, > > Shunping > > > > > > On Tue, Feb 4, 2025 at 3:13 PM Shunping Huang <mark.sphu...@gmail.com> > wrote: > >> Hi everyone, >> >> I put together a short doc to summarize the existing logging >> infrastructure(dependencies) in Beam Java and outline a plan to improve it. >> Basically, we are on the path towards slf4j 2.x. >> >> >> https://docs.google.com/document/d/1IkbiM4m8D-aB3NYI1aErFZHt6M7BQ-8eCULh284Davs/edit?usp=sharing >> >> If you are interested in this topic, please take a look and share any >> feedback. >> >> Regards, >> >> Shunping >> >