Missed to mention , we are exploring this in Spark 4.0. Be it a configuration change or explicit code changes, throughout. We are keen to accommodate the recommended and the future proof solution approach.
Any guidance, insights, or pointers to relevant documentation, JIRAs, or previous discussions on this topic would be immensely helpful. Thanks, Balaji From: Balaji Sudharsanam V Sent: 02 June 2025 21:02 To: dev@spark.apache.org Cc: Dongjoon Hyun <dongj...@apache.org>; Steven Jones <s...@us.ibm.com>; NICHOLAS MARION <nmar...@us.ibm.com>; Vishal Kolki <vishal.ko...@ibm.com>; ANTO JOHN <antoj...@in.ibm.com> Subject: Inquiry: Best Practices for Replacing Snappy with LZ4/LZF Compression Across Spark Codebase (including test cases) Dear Spark Developer Community, I hope this email finds you well. My name is Balaji, and I am a Software Engineer working with Apache Spark in IBM Z Systems (z/OS). We are exploring a scenario where we would like to move away from using the Snappy compression library within our Spark applications and leverage either LZ4 or LZF compression exclusively. This includes ensuring that all data persistence, shuffle operations, and internal data representations consistently utilize the chosen alternative (LZ4 or LZF), including the test cases. Product Owner, IBM Z Platform for Apache Spark India Systems Development Lab Bangalore, EGL D Block 6th Floor Mobile : +91 9600778246 Mail : balaji.sudharsa...@ibm.com<mailto:balaji.sudharsa...@ibm.com>