Hello,
Parquet encryption ensures integrity if you use the default encryption algorithm AES_GCM (not AES_CTR). You don't have to checksum the file yourself. Regards Antoine. On Tue, 25 Feb 2025 16:19:59 +0700 Jason Sebastian Kusuma <jsjasons...@gmail.com> wrote: > Hi everyone, > I want to ask the proper practice for doing checksums on parquet with > modular encryption using pyarrow. My current process is: > 1. Create a parquet file (not yet encrypred) and generate checksum. > 2. Create the encrypted version of the file using the ParquetWriter with > encryption properties. > 3. Send the encrypted file and checksum to somewhere. > 4. Decrypt the file using ParquetFile and write it as a decrypted parquet > file. > 5. Compare checksum > > I want to do checksum on the original file and the decrypted file to ensure > data integrity. But, I am aware that there could be metadata difference > because of different writer version. What is the proper way to do this? > > I am also wondering if checksums are not necessary in this case. Is there > already a mechanism to ensure the integrity in between encrypt, transfer, > and decrypt process? > > Thank you >