Surya Hebbar has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/22718 )

Change subject: IMPALA-13923: Support more compression levels for ZSTD and ZLIB
......................................................................

IMPALA-13923: Support more compression levels for ZSTD and ZLIB

This patch adds support for more compression levels for ZLIB, ZSTD
and BZIP2.

The following additional compression levels are now supported.

For ZSTD,
  ZSTD_minCLevel(-ZSTD_TARGETLENGTH_MAX) to ZSTD_maxCLevel(20)

For ZLIB i.e. ZLIB, GZIP and DEFLATE,
  Z_DEFAULT_COMPRESSION(1) to Z_BEST_COMPRESSION(9)

For BZIP2 i.e. ZLIB, GZIP and DEFLATE,
  BlockSize100k * (1) to BlockSize100k * (9)

Note:
Currently, BZIP2 is only used by TmpFileMgr. It is not supported
by Parquet(i.e. for writing tables).

These are now supported with the "compression_codec" query option.

This has been implemented by refactoring compression levels as an
optional parameter in CodecInfo.

For ZSTD, negative compression levels are now supported IMPALA-10630.

Usage of compression level has been refactored with std::optional in
- exec/parquet/hdfs-parquet-table-writer
- runtime/tmp-file-mgr
- service/query-options
- util/codec
- util/compress

To validate compression levels externally, the following method has
been added
- Status Codec::ValidateCompressionLevel

Added new tests for -
  * Additional compression levels for ZLIB, ZSTD and BZIP2
  * Query option - "compression_codec" for the newly added formats
    and compression levels

The following tests were executed to verify codecs and compression levels.
- DecompressorTest.ZSTD*
- DecompressorTest.Gzip
- DecompressorTest.Bzip
- QueryOptions.CompressionCodec
- TestComputeStats::test_compute_stats_compression_codec

For the stored Parquet, manually verified the compression codec used for
ZSTD and ZLIB.

Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/codec.cc
M be/src/util/codec.h
M be/src/util/compress.cc
M be/src/util/compress.h
M be/src/util/decompress-test.cc
M be/src/util/parse-util.cc
M be/src/util/parse-util.h
12 files changed, 240 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/22718/10
--
To view, visit http://gerrit.cloudera.org:8080/22718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
Gerrit-Change-Number: 22718
Gerrit-PatchSet: 10
Gerrit-Owner: Surya Hebbar <sheb...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Surya Hebbar <sheb...@cloudera.com>

Reply via email to