Hi, Antoine About the API you mentioned, I want to know what scope this API will be covered, about the configure API to overwrite the built-in gzip?
Thanks, XieQi -----Original Message----- From: Antoine Pitrou <anto...@python.org> Sent: Tuesday, October 27, 2020 11:39 PM To: Xie, Qi <qi....@intel.com>; dev@arrow.apache.org Cc: Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin <xin.d...@intel.com>; Zhang, Jie1 <jie1.zh...@intel.com> Subject: Re: [Discuss] Provide pluggable APIs to support user customized compression codec Hi, Le 27/10/2020 à 09:55, Xie, Qi a écrit : > > The HW decompressor can't fall back automatically on SW decompression, but we > can fallback to SW in the HW library. Yes, that's what I meant :-) > How about HW-Gzip as an enhanced Gzip and still use the Compression::GZIP as > Compression::type, the end user can through some configurations to enable > HW-Gzip instead of the built-in Gzip in MakeGZipCodec? That sounds reasonable. API details will have to be discussed in a PR, but that sounds (IMHO) reasonable on the principle. Also, note that this can be beneficial for other things than Parquet, for example reading a GZip-compressed CSV file (which right now would be bottlenecked by zlib performance). I'll let others chime in. Best regards Antoine. > > Thanks, > XieQi > -----Original Message----- > From: Antoine Pitrou <anto...@python.org> > Sent: Thursday, October 22, 2020 7:20 PM > To: dev@arrow.apache.org; Xie, Qi <qi....@intel.com> > Cc: Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin > <xin.d...@intel.com>; Zhang, Jie1 <jie1.zh...@intel.com> > Subject: Re: [Discuss] Provide pluggable APIs to support user > customized compression codec > > > Ok, thank you. Another question: why doesn't the HW decompressor fall back > automatically on SW decompression when the window size is too large? > > That would avoid having to define metadata strings for this. > > Regards > > Antoine. > > > Le 22/10/2020 à 10:38, Xie, Qi a écrit : >> Yes, the HW-GZIP is able to work on multiple threads too, but the test >> program lzbench https://github.com/inikep/lzbench seems work on single >> thread, so I can't run it with multiple threads. >> >> Thanks, >> XieQi >> >> -----Original Message----- >> From: Antoine Pitrou <anto...@python.org> >> Sent: Thursday, October 22, 2020 4:30 PM >> To: Xie, Qi <qi....@intel.com>; dev <dev@arrow.apache.org> >> Cc: Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin >> <xin.d...@intel.com>; Zhang, Jie1 <jie1.zh...@intel.com> >> Subject: Re: [Discuss] Provide pluggable APIs to support user >> customized compression codec >> >> >> Le 22/10/2020 à 05:38, Xie, Qi a écrit : >>> Hi, >>> >>> I just tested with the Intel QuickAssist Technology, which provide >>> hardware accelerate to GZIP, you can see detail here >>> https://www.intel.com/content/www/us/en/architecture-and-technology/ >>> i ntel-quick-assist-technology-overview.html >>> >>> Here is the benchmark result run on Intel(R) Xeon(R) Gold 6252 CPU @ >>> 2.10GHz with single thread >>> >>> lzbench 1.7.2 (64-bit Linux) Assembled by P.Skibinski >>> | Compressor name | Compression| Decompress.| Compr. size | Ratio | >>> Filename | >>> | memcpy | 4942 MB/s | 5688 MB/s | 3263523 | 1.00 | >>> calgary/calgary.tar | >>> | qat 1.0.0 | 2312 MB/s | 3538 MB/s | 1274379 | 2.56 >>> | calgary/calgary.tar | >>> | snappy 1.1.4 | 283 MB/s | 1144 MB/s | 1686240 | 1.94 | >>> calgary/calgary.tar | >>> | lz4 1.7.5 | 453 MB/s | 2514 MB/s | 1685795 | >>> 1.94 | calgary/calgary.tar | >>> | zstd 1.3.1 -1 | 279 MB/s | 723 MB/s | 1187211 | 2.75 >>> | calgary/calgary.tar | >>> | zlib 1.2.11 -1 | 79 MB/s | 261 MB/s | 1240838 | >>> 2.63 | calgary/calgary.tar | >> >> Very nice, thank you. Is it able to work on multiple threads too? >> >> Regards >> >> Antoine. >>