hi XieQi, Is the idea that your custom Gzip implementation would automatically override any places in the codebase where the built-in one would be used (like the Parquet codebase)? I see some things in the design doc about serializing the plugin information in the Parquet file metadata (assuming you want to speed up decompression Parquet data pages) -- is there a reason to believe that the plugin would be _required_ in order to read the file? I recall some messages to the Parquet mailing list about user-defined codecs.
In general, having a plugin API to provide a means to substitute one functionally identical for another seems reasonable to me (I could envision having people customizing kernel execution in the future). We should try to create a general enough API so that it can be used for customizations beyond compression codecs so we don't have to go through a design exercise to support plugin/algorithm overrides for something else. This is something we could hash out during code review -- I should have some opinions and I'm sure others will as well - Wes On Fri, Jun 19, 2020 at 10:21 AM Xie, Qi <qi....@intel.com> wrote: > > Hi, > > > In demand of better performance, quite some end users want to leverage > accelerators (e.g. FPGA, Intel QAT) to offload compression. However, in > current Arrow compression framework, it only supports codec name based > compression implementation and can't be customized to leverage accelerators. > For example, for gzip format, we can't call customized codec to accelerate > the compression. We would like to proposal a plugin API to support the > customized compression codec. We've put the proposal here: > > > > https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit > > > > Any comment is welcome and please let us know your feedback. > > > > Thanks, > > XieQi > > >