Yes, I think he's asking about the motivation for the project. My
understanding is that Snappy is used more often than Gzip with Parquet

On Wed, Oct 21, 2020 at 8:53 PM Xie, Qi <qi....@intel.com> wrote:
>
> Hi, Antoine
>
> Do you mean the performance data HW-GZIP compared with LZ4/ZSTD?
>
> Thanks,
> XieQi
>
> -----Original Message-----
> From: Antoine Pitrou <anto...@python.org>
> Sent: Tuesday, October 20, 2020 10:38 PM
> To: dev@arrow.apache.org; Xie, Qi <qi....@intel.com>
> Cc: Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin <xin.d...@intel.com>; 
> Zhang, Jie1 <jie1.zh...@intel.com>
> Subject: Re: [Discuss] Provide pluggable APIs to support user customized 
> compression codec
>
>
>
> Le 20/10/2020 à 12:09, Xie, Qi a écrit :
> > Hi, Wes
> >
> > Yes currently the purpose of the key-value metadata is just a hint to 
> > indicate that the parquet file is compressed by plugin so that the parquet 
> > reader can load the plugin library and use plugin to decompress the file.
> > There are many optimized GZIP implementations and may not compatible with 
> > the standard gzip, for example due to hardware limit, the HW-GZIP history 
> > window size maybe smaller than the standard gzip, so that HW-GZIP can't 
> > decompress the file compressed by standard gzip and because we are still 
> > use the Compression::GZIP as Compression::type, we need that metadata to 
> > distinguish it from the standard gzip.
>
> What does it bring over ZSTD or LZ4 exactly?
>
> Regards
>
> Antoine.

Reply via email to