Hello,

Parquet encryption ensures integrity if you use the default encryption
algorithm AES_GCM (not AES_CTR). You don't have to checksum the file
yourself.

Regards

Antoine.


On Tue, 25 Feb 2025 16:19:59 +0700
Jason Sebastian Kusuma <jsjasons...@gmail.com> wrote:
> Hi everyone,
> I want to ask the proper practice for doing checksums on parquet with
> modular encryption using pyarrow. My current process is:
> 1. Create a parquet file (not yet encrypred) and generate checksum.
> 2. Create the encrypted version of the file using the ParquetWriter with
> encryption properties.
> 3. Send the encrypted file and checksum to somewhere.
> 4. Decrypt the file using ParquetFile and write it as a decrypted parquet
> file.
> 5. Compare checksum
> 
> I want to do checksum on the original file and the decrypted file to ensure
> data integrity. But, I am aware that there could be metadata difference
> because of different writer version. What is the proper way to do this?
> 
> I am also wondering if checksums are not necessary in this case. Is there
> already a mechanism to ensure the integrity in between encrypt, transfer,
> and decrypt process?
> 
> Thank you
> 



Reply via email to