Hi all,

(Sorry Kenn, I didn't mean to interrupt the flow of your previous
conversation. I was just finishing this email when yours came through...)

I need some opinion from you all regarding Spark and the SLF4J 2.x upgrade.

Here is some background.

   - Spark used to have compile dependency on SLF4J 1.x.(e.g.
   https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0).
   But since 3.4.0 ((
   https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.4.0)),
   it has switched to SLF4J 2.x.
   - In an internal logging module of Spark <3.4.0, it references a class
   `StaticLoggerBinder` (
   
https://github.com/apache/spark/blob/v3.2.1/core/src/main/scala/org/apache/spark/internal/Logging.scala#L222)
   which only exists in SLF4J 1.x binding artifacts (e.g.
   org.slf4j:slf4j-simple, org.slf4j:slf4j-reload4j, etc). When we upgrade
   SLF4J and its related artifacts to 2.x, the previously mentioned class no
   longer exists, this will cause error like " java.lang.NoClassDefFoundError:
   org/slf4j/impl/StaticLoggerBinder"


During the upgrade, I have seen test failures in two sub-projects in Beam.

   1. SparkReceiverIO (
   https://github.com/apache/beam/tree/master/sdks/java/io/sparkreceiver/2).
      - This IO was built on top of Spark 2.x and so some tests are failing
      when upgrading SLF4J to 2.x
      - I am wondering *if there is any objection* to upgrading it to use
      Spark 3.x. (In fact, I tested out this idea and tests related to
      SparkReceiverIo run fine on Spark 3.x)
   2. SparkRunner (
   https://github.com/apache/beam/tree/master/runners/spark/3)
      - As I mentioned above, some versions of Spark 3.x do not work along
      well with SLF4J 2.x. Specifically, version check tests (e.g.
      runners:spark:3:sparkVersionsTest) failed on 3.2.x.
      - In theory, any Spark < 3.4.0 might be impacted, but due to certain
      transitive dependency (and also some luck?), only 3.2.x tests failed. See
      my comment at
      
https://github.com/apache/beam/pull/33574/files#diff-78a108ab469ee9be0d8fae0f18c0c143e04fc24d44f9f78f65b97434fc234890
      for more details.
      - Do we want to continue to support Spark <3.4.0 or do we want to
      take some of the versions out because of this SLF4J upgrade.


Last thing, there is a workaround (more like a hack) to support Spark <
3.4.0 under SLF4J 2.x: by putting a SLF4J 1.x binding which is not under
group `org.slf4j` in the dependency (
https://github.com/apache/beam/pull/33574). An example is
`org.apache.logging.log4j:log4j-slf4j-impl`. Check out my link above for
more details. In my opinion, the mix use of SFL4J 1.x and 2.x could be a
problem, but this seems to be a way if we want to continue our support on
older versions of Spark. If you have any other ideas, please feel free to
share here.

Thanks,

Shunping





On Tue, Feb 4, 2025 at 3:13 PM Shunping Huang <mark.sphu...@gmail.com>
wrote:

> Hi everyone,
>
> I put together a short doc to summarize the existing logging
> infrastructure(dependencies) in Beam Java and outline a plan to improve it.
> Basically, we are on the path towards slf4j 2.x.
>
>
> https://docs.google.com/document/d/1IkbiM4m8D-aB3NYI1aErFZHt6M7BQ-8eCULh284Davs/edit?usp=sharing
>
> If you are interested in this topic, please take a look and share any
> feedback.
>
> Regards,
>
> Shunping
>

Reply via email to