Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/22718 )

Change subject: IMPALA-13923: Support more compression levels for ZSTD and ZLIB
......................................................................

IMPALA-13923: Support more compression levels for ZSTD and ZLIB

This patch adds support for more compression levels for ZLIB, ZSTD
and BZIP2.

The following additional compression levels are now supported.

For ZSTD,
  ZSTD_minCLevel(-ZSTD_TARGETLENGTH_MAX) to ZSTD_maxCLevel(20)

For ZLIB i.e. ZLIB, GZIP and DEFLATE,
  Z_DEFAULT_COMPRESSION(1) to Z_BEST_COMPRESSION(9)

For BZIP2 i.e. ZLIB, GZIP and DEFLATE,
  BlockSize100k * (1) to BlockSize100k * (9)

Note:
Currently, BZIP2 is only used by TmpFileMgr. It is not supported
by Parquet(i.e. for writing tables).

These are now supported with the "compression_codec" query option.

This has been implemented by refactoring compression levels as an
optional parameter in CodecInfo.

For ZSTD, negative compression levels are now supported IMPALA-10630.

Usage of compression level has been refactored with std::optional in
- exec/parquet/hdfs-parquet-table-writer
- runtime/tmp-file-mgr
- service/query-options
- util/codec
- util/compress

To validate compression levels externally, the following method has
been added
- Status Codec::ValidateCompressionLevel

Added new tests for -
  * Additional compression levels for ZLIB, ZSTD and BZIP2
  * Query option - "compression_codec" for the newly added formats
    and compression levels

The following tests were executed to verify codecs and compression levels.
- DecompressorTest.ZSTD*
- DecompressorTest.Gzip
- DecompressorTest.Bzip
- QueryOptions.CompressionCodec
- TestComputeStats::test_compute_stats_compression_codec

For the stored Parquet, manually verified the compression codec used for
ZSTD and ZLIB.

Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
Reviewed-on: http://gerrit.cloudera.org:8080/22718
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Michael Smith <michael.sm...@cloudera.com>
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/codec.cc
M be/src/util/codec.h
M be/src/util/compress.cc
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/parse-util.cc
M be/src/util/parse-util.h
12 files changed, 240 insertions(+), 91 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Smith: Looks good to me, but someone else must approve
  Joe McDonnell: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/22718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
Gerrit-Change-Number: 22718
Gerrit-PatchSet: 12
Gerrit-Owner: Surya Hebbar <sheb...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Surya Hebbar <sheb...@cloudera.com>

Reply via email to