Re: Splittable or not?

Sid Wed, 14 Sep 2022 12:58:19 -0700

Okay so you mean to say that parquet compresses the denormalized data using
snappy so it won't affect the performance.


Only using snappy will affect the performance

Am I correct?

On Thu, 15 Sep 2022, 01:08 Amit Joshi, <mailtojoshia...@gmail.com> wrote:

> Hi Sid,
>
> Snappy itself is not splittable. But the format that contains the actual
> data like parquet (which are basically divided into row groups) can be
> compressed using snappy.
> This works because blocks(pages of parquet format) inside the parquet can
> be independently compressed using snappy.
>
> Thanks
> Amit
>
> On Wed, Sep 14, 2022 at 8:14 PM Sid <flinkbyhe...@gmail.com> wrote:
>
>> Hello experts,
>>
>> I know that Gzip and snappy files are not splittable i.e data won't be
>> distributed into multiple blocks rather it would try to load the data in a
>> single partition/block
>>
>> So, my question is when I write the parquet data via spark it gets stored
>> at the destination with something like *part*.snappy.parquet*
>>
>> So, when I read this data will it affect my performance?
>>
>> Please help me if there is any understanding gap.
>>
>> Thanks,
>> Sid
>>
>

Re: Splittable or not?

Reply via email to