Yes, I think he's asking about the motivation for the project. My understanding is that Snappy is used more often than Gzip with Parquet
On Wed, Oct 21, 2020 at 8:53 PM Xie, Qi <qi....@intel.com> wrote: > > Hi, Antoine > > Do you mean the performance data HW-GZIP compared with LZ4/ZSTD? > > Thanks, > XieQi > > -----Original Message----- > From: Antoine Pitrou <anto...@python.org> > Sent: Tuesday, October 20, 2020 10:38 PM > To: dev@arrow.apache.org; Xie, Qi <qi....@intel.com> > Cc: Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin <xin.d...@intel.com>; > Zhang, Jie1 <jie1.zh...@intel.com> > Subject: Re: [Discuss] Provide pluggable APIs to support user customized > compression codec > > > > Le 20/10/2020 à 12:09, Xie, Qi a écrit : > > Hi, Wes > > > > Yes currently the purpose of the key-value metadata is just a hint to > > indicate that the parquet file is compressed by plugin so that the parquet > > reader can load the plugin library and use plugin to decompress the file. > > There are many optimized GZIP implementations and may not compatible with > > the standard gzip, for example due to hardware limit, the HW-GZIP history > > window size maybe smaller than the standard gzip, so that HW-GZIP can't > > decompress the file compressed by standard gzip and because we are still > > use the Compression::GZIP as Compression::type, we need that metadata to > > distinguish it from the standard gzip. > > What does it bring over ZSTD or LZ4 exactly? > > Regards > > Antoine.