hi XieQi,

Is the idea that your custom Gzip implementation would automatically
override any places in the codebase where the built-in one would be
used (like the Parquet codebase)? I see some things in the design doc
about serializing the plugin information in the Parquet file metadata
(assuming you want to speed up decompression Parquet data pages) -- is
there a reason to believe that the plugin would be _required_ in order
to read the file? I recall some messages to the Parquet mailing list
about user-defined codecs.

In general, having a plugin API to provide a means to substitute one
functionally identical for another seems reasonable to me (I could
envision having people customizing kernel execution in the future). We
should try to create a general enough API so that it can be used for
customizations beyond compression codecs so we don't have to go
through a design exercise to support plugin/algorithm overrides for
something else. This is something we could hash out during code review
-- I should have some opinions and I'm sure others will as well

- Wes

On Fri, Jun 19, 2020 at 10:21 AM Xie, Qi <qi....@intel.com> wrote:
>
> Hi,
>
>
> In demand of better performance, quite some end users want to leverage 
> accelerators (e.g. FPGA, Intel QAT) to offload compression. However, in 
> current Arrow compression framework, it only supports codec name based 
> compression implementation and can't be customized to leverage accelerators. 
> For example, for gzip format, we can't call customized codec to accelerate 
> the compression. We would like to proposal a plugin API to support the 
> customized compression codec. We've put the proposal here:
>
>
>
> https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit
>
>
>
> Any comment is welcome and please let us know your feedback.
>
>
>
> Thanks,
>
> XieQi
>
>
>

Reply via email to