Okay so you mean to say that parquet compresses the denormalized data using snappy so it won't affect the performance.
Only using snappy will affect the performance Am I correct? On Thu, 15 Sep 2022, 01:08 Amit Joshi, <mailtojoshia...@gmail.com> wrote: > Hi Sid, > > Snappy itself is not splittable. But the format that contains the actual > data like parquet (which are basically divided into row groups) can be > compressed using snappy. > This works because blocks(pages of parquet format) inside the parquet can > be independently compressed using snappy. > > Thanks > Amit > > On Wed, Sep 14, 2022 at 8:14 PM Sid <flinkbyhe...@gmail.com> wrote: > >> Hello experts, >> >> I know that Gzip and snappy files are not splittable i.e data won't be >> distributed into multiple blocks rather it would try to load the data in a >> single partition/block >> >> So, my question is when I write the parquet data via spark it gets stored >> at the destination with something like *part*.snappy.parquet* >> >> So, when I read this data will it affect my performance? >> >> Please help me if there is any understanding gap. >> >> Thanks, >> Sid >> >