Re: Proposal for the plugin API to support user customized compression codec

Antoine Pitrou Thu, 25 Jun 2020 01:33:25 -0700


What is the performance of, say, HW GZip against SW ZSTD?


Regards

Antoine.


On Thu, 25 Jun 2020 07:06:58 +0000
"Xu, Cheng A" <[email protected]> wrote:
> Thanks Micha and Wes for the reply. W.R.T the scope, we’re working in 
> progress together with Parquet community to refine our proposal. 
> https://www.mail-archive.com/[email protected]/msg12463.html
> 
> This proposal here is more general to Arrow (indeed it can be used by native 
> Parquet as well). Since Arrow is more in memory format mostly for 
> intermediate data, I would expect less consideration in backward 
> compatibility different from on-disk Parquet format. Considering this, we can 
> discuss those two things separately. For Parquet part, it should be 
> consistent behavior as Java Parquet. For Arrow part, it should also be 
> compatible with new extendable Parquet compression codec framework. And we 
> can start with Parquet part first.
> 
> Thanks
> Cheng Xu
> 
> From: Micah Kornfield <[email protected]>
> Sent: Tuesday, June 23, 2020 12:11 PM
> To: dev <[email protected]>
> Cc: Xu, Cheng A <[email protected]>; Xie, Qi <[email protected]>
> Subject: Re: Proposal for the plugin API to support user customized 
> compression codec
> 
> It would be good to clarify the exact scope of this.  If it is particular to 
> parquet then we should wait for the discussion on dev@parquet to conclude 
> before moving forward.  If it is more general to Arrow, then working through 
> scenarios of how this would be used for decompression when the Codec can't 
> support generic input would be useful (the codec library is a singleton 
> across the arrow codebase).
> 
> On Mon, Jun 22, 2020 at 4:23 PM Wes McKinney 
> <[email protected]<mailto:[email protected]>> wrote:
> hi XieQi,
> 
> Is the idea that your custom Gzip implementation would automatically
> override any places in the codebase where the built-in one would be
> used (like the Parquet codebase)? I see some things in the design doc
> about serializing the plugin information in the Parquet file metadata
> (assuming you want to speed up decompression Parquet data pages) -- is
> there a reason to believe that the plugin would be _required_ in order
> to read the file? I recall some messages to the Parquet mailing list
> about user-defined codecs.
> 
> In general, having a plugin API to provide a means to substitute one
> functionally identical for another seems reasonable to me (I could
> envision having people customizing kernel execution in the future). We
> should try to create a general enough API so that it can be used for
> customizations beyond compression codecs so we don't have to go
> through a design exercise to support plugin/algorithm overrides for
> something else. This is something we could hash out during code review
> -- I should have some opinions and I'm sure others will as well
> 
> - Wes
> 
> On Fri, Jun 19, 2020 at 10:21 AM Xie, Qi 
> <[email protected]<mailto:[email protected]>> wrote:
> >
> > Hi,
> >
> >
> > In demand of better performance, quite some end users want to leverage 
> > accelerators (e.g. FPGA, Intel QAT) to offload compression. However, in 
> > current Arrow compression framework, it only supports codec name based 
> > compression implementation and can't be customized to leverage 
> > accelerators. For example, for gzip format, we can't call customized codec 
> > to accelerate the compression. We would like to proposal a plugin API to 
> > support the customized compression codec. We've put the proposal here:
> >
> >
> >
> > https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit
> >
> >
> >
> > Any comment is welcome and please let us know your feedback.
> >
> >
> >
> > Thanks,
> >
> > XieQi
> >
> >
> >

Re: Proposal for the plugin API to support user customized compression codec

Reply via email to