Surya Hebbar has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/22718 )
Change subject: IMPALA-13923: Support more compression levels for ZSTD and ZLIB ...................................................................... IMPALA-13923: Support more compression levels for ZSTD and ZLIB This patch adds support for more compression levels for ZLIB, ZSTD and BZIP2. The following additional compression levels are now supported. For ZSTD, ZSTD_minCLevel(-ZSTD_TARGETLENGTH_MAX) to ZSTD_maxCLevel(20) For ZLIB i.e. ZLIB, GZIP and DEFLATE, Z_DEFAULT_COMPRESSION(1) to Z_BEST_COMPRESSION(9) For BZIP2 i.e. ZLIB, GZIP and DEFLATE, BlockSize100k * (1) to BlockSize100k * (9) Note: Currently, BZIP2 is only used by TmpFileMgr. It is not supported by Parquet(i.e. for writing tables). These are now supported with the "compression_codec" query option. This has been implemented by refactoring compression levels as an optional parameter in CodecInfo. For ZSTD, negative compression levels are now supported IMPALA-10630. Usage of compression level has been refactored with std::optional in - exec/parquet/hdfs-parquet-table-writer - runtime/tmp-file-mgr - service/query-options - util/codec - util/compress To validate compression levels externally, the following method has been added - Status Codec::ValidateCompressionLevel Added new tests for - * Additional compression levels for ZLIB, ZSTD and BZIP2 * Query option - "compression_codec" for the newly added formats and compression levels The following tests were executed to verify codecs and compression levels. - DecompressorTest.ZSTD* - DecompressorTest.Gzip - DecompressorTest.Bzip - QueryOptions.CompressionCodec - TestComputeStats::test_compute_stats_compression_codec For the stored Parquet, manually verified the compression codec used for ZSTD and ZLIB. Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/codec.cc M be/src/util/codec.h M be/src/util/compress.cc M be/src/util/compress.h M be/src/util/decompress-test.cc M be/src/util/parse-util.cc M be/src/util/parse-util.h 12 files changed, 240 insertions(+), 91 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/22718/10 -- To view, visit http://gerrit.cloudera.org:8080/22718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a Gerrit-Change-Number: 22718 Gerrit-PatchSet: 10 Gerrit-Owner: Surya Hebbar <sheb...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Surya Hebbar <sheb...@cloudera.com>