Yet another good resource would be parquet encryption docs [1]. Search for "integrity" to see how AES-GCM is used to ensure it.
[1] https://parquet.apache.org/docs/file-format/data-pages/encryption/ Rok On Thu, Feb 27, 2025 at 8:22 PM Felipe Oliveira Carvalho < felipe...@gmail.com> wrote: > Further reading: https://en.wikipedia.org/wiki/Authenticated_encryption > > AES-GCM is a form of Authenticated Encryption. > > On Thu, Feb 27, 2025 at 3:33 AM Antoine Pitrou <anto...@python.org> wrote: > >> >> Hello, >> >> Parquet encryption ensures integrity if you use the default encryption >> algorithm AES_GCM (not AES_CTR). You don't have to checksum the file >> yourself. >> >> Regards >> >> Antoine. >> >> >> On Tue, 25 Feb 2025 16:19:59 +0700 >> Jason Sebastian Kusuma <jsjasons...@gmail.com> wrote: >> > Hi everyone, >> > I want to ask the proper practice for doing checksums on parquet with >> > modular encryption using pyarrow. My current process is: >> > 1. Create a parquet file (not yet encrypred) and generate checksum. >> > 2. Create the encrypted version of the file using the ParquetWriter with >> > encryption properties. >> > 3. Send the encrypted file and checksum to somewhere. >> > 4. Decrypt the file using ParquetFile and write it as a decrypted >> parquet >> > file. >> > 5. Compare checksum >> > >> > I want to do checksum on the original file and the decrypted file to >> ensure >> > data integrity. But, I am aware that there could be metadata difference >> > because of different writer version. What is the proper way to do this? >> > >> > I am also wondering if checksums are not necessary in this case. Is >> there >> > already a mechanism to ensure the integrity in between encrypt, >> transfer, >> > and decrypt process? >> > >> > Thank you >> > >> >> >> >>