Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/22718 )
Change subject: IMPALA-13923: Support more compression levels for ZSTD and ZLIB ...................................................................... IMPALA-13923: Support more compression levels for ZSTD and ZLIB This patch adds support for more compression levels for ZLIB, ZSTD and BZIP2. The following additional compression levels are now supported. For ZSTD, ZSTD_minCLevel(-ZSTD_TARGETLENGTH_MAX) to ZSTD_maxCLevel(20) For ZLIB i.e. ZLIB, GZIP and DEFLATE, Z_DEFAULT_COMPRESSION(1) to Z_BEST_COMPRESSION(9) For BZIP2 i.e. ZLIB, GZIP and DEFLATE, BlockSize100k * (1) to BlockSize100k * (9) Note: Currently, BZIP2 is only used by TmpFileMgr. It is not supported by Parquet(i.e. for writing tables). These are now supported with the "compression_codec" query option. This has been implemented by refactoring compression levels as an optional parameter in CodecInfo. For ZSTD, negative compression levels are now supported IMPALA-10630. Usage of compression level has been refactored with std::optional in - exec/parquet/hdfs-parquet-table-writer - runtime/tmp-file-mgr - service/query-options - util/codec - util/compress To validate compression levels externally, the following method has been added - Status Codec::ValidateCompressionLevel Added new tests for - * Additional compression levels for ZLIB, ZSTD and BZIP2 * Query option - "compression_codec" for the newly added formats and compression levels The following tests were executed to verify codecs and compression levels. - DecompressorTest.ZSTD* - DecompressorTest.Gzip - DecompressorTest.Bzip - QueryOptions.CompressionCodec - TestComputeStats::test_compute_stats_compression_codec For the stored Parquet, manually verified the compression codec used for ZSTD and ZLIB. Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a Reviewed-on: http://gerrit.cloudera.org:8080/22718 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Michael Smith <michael.sm...@cloudera.com> Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/codec.cc M be/src/util/codec.h M be/src/util/compress.cc M be/src/util/compress.h M be/src/util/decompress-test.cc M be/src/util/parse-util.cc M be/src/util/parse-util.h 12 files changed, 240 insertions(+), 91 deletions(-) Approvals: Impala Public Jenkins: Verified Michael Smith: Looks good to me, but someone else must approve Joe McDonnell: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/22718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a Gerrit-Change-Number: 22718 Gerrit-PatchSet: 12 Gerrit-Owner: Surya Hebbar <sheb...@cloudera.com> Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Surya Hebbar <sheb...@cloudera.com>