Yet another good resource would be parquet encryption docs [1]. Search for
"integrity" to see how AES-GCM is used to ensure it.

[1] https://parquet.apache.org/docs/file-format/data-pages/encryption/

Rok

On Thu, Feb 27, 2025 at 8:22 PM Felipe Oliveira Carvalho <
felipe...@gmail.com> wrote:

> Further reading: https://en.wikipedia.org/wiki/Authenticated_encryption
>
> AES-GCM is a form of Authenticated Encryption.
>
> On Thu, Feb 27, 2025 at 3:33 AM Antoine Pitrou <anto...@python.org> wrote:
>
>>
>> Hello,
>>
>> Parquet encryption ensures integrity if you use the default encryption
>> algorithm AES_GCM (not AES_CTR). You don't have to checksum the file
>> yourself.
>>
>> Regards
>>
>> Antoine.
>>
>>
>> On Tue, 25 Feb 2025 16:19:59 +0700
>> Jason Sebastian Kusuma <jsjasons...@gmail.com> wrote:
>> > Hi everyone,
>> > I want to ask the proper practice for doing checksums on parquet with
>> > modular encryption using pyarrow. My current process is:
>> > 1. Create a parquet file (not yet encrypred) and generate checksum.
>> > 2. Create the encrypted version of the file using the ParquetWriter with
>> > encryption properties.
>> > 3. Send the encrypted file and checksum to somewhere.
>> > 4. Decrypt the file using ParquetFile and write it as a decrypted
>> parquet
>> > file.
>> > 5. Compare checksum
>> >
>> > I want to do checksum on the original file and the decrypted file to
>> ensure
>> > data integrity. But, I am aware that there could be metadata difference
>> > because of different writer version. What is the proper way to do this?
>> >
>> > I am also wondering if checksums are not necessary in this case. Is
>> there
>> > already a mechanism to ensure the integrity in between encrypt,
>> transfer,
>> > and decrypt process?
>> >
>> > Thank you
>> >
>>
>>
>>
>>

Reply via email to