Missed to mention , we are exploring this in Spark 4.0.

Be it a configuration change or explicit code changes, throughout. We are keen 
to accommodate the recommended and the future proof solution approach.

Any guidance, insights, or pointers to relevant documentation, JIRAs, or 
previous discussions on this topic would be immensely helpful.

Thanks,
Balaji

From: Balaji Sudharsanam V
Sent: 02 June 2025 21:02
To: dev@spark.apache.org
Cc: Dongjoon Hyun <dongj...@apache.org>; Steven Jones <s...@us.ibm.com>; 
NICHOLAS MARION <nmar...@us.ibm.com>; Vishal Kolki <vishal.ko...@ibm.com>; ANTO 
JOHN <antoj...@in.ibm.com>
Subject: Inquiry: Best Practices for Replacing Snappy with LZ4/LZF Compression 
Across Spark Codebase (including test cases)

Dear Spark Developer Community,
I hope this email finds you well.
My name is Balaji, and I am a Software Engineer working with Apache Spark in 
IBM Z Systems (z/OS).
We are exploring a scenario where we would like to move away from using the 
Snappy compression library within our Spark applications and leverage either 
LZ4 or LZF compression exclusively. This includes ensuring that all data 
persistence, shuffle operations, and internal data representations consistently 
utilize the chosen alternative (LZ4 or LZF), including the test cases.

Product Owner, IBM Z Platform for Apache Spark
India Systems Development Lab
Bangalore, EGL D Block 6th Floor
Mobile : +91 9600778246
Mail : balaji.sudharsa...@ibm.com<mailto:balaji.sudharsa...@ibm.com>

Reply via email to