[
https://issues.apache.org/jira/browse/IMPALA-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949772#comment-17949772
]
ASF subversion and git services commented on IMPALA-10630:
----------------------------------------------------------
Commit b9419ee32c98e95b5f1ea378624562673ead35be in impala's branch
refs/heads/master from Surya Hebbar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b9419ee32 ]
IMPALA-13923: Support more compression levels for ZSTD and ZLIB
This patch adds support for more compression levels for ZLIB, ZSTD
and BZIP2.
The following additional compression levels are now supported.
For ZSTD,
ZSTD_minCLevel(-ZSTD_TARGETLENGTH_MAX) to ZSTD_maxCLevel(20)
For ZLIB i.e. ZLIB, GZIP and DEFLATE,
Z_DEFAULT_COMPRESSION(1) to Z_BEST_COMPRESSION(9)
For BZIP2 i.e. ZLIB, GZIP and DEFLATE,
BlockSize100k * (1) to BlockSize100k * (9)
Note:
Currently, BZIP2 is only used by TmpFileMgr. It is not supported
by Parquet(i.e. for writing tables).
These are now supported with the "compression_codec" query option.
This has been implemented by refactoring compression levels as an
optional parameter in CodecInfo.
For ZSTD, negative compression levels are now supported IMPALA-10630.
Usage of compression level has been refactored with std::optional in
- exec/parquet/hdfs-parquet-table-writer
- runtime/tmp-file-mgr
- service/query-options
- util/codec
- util/compress
To validate compression levels externally, the following method has
been added
- Status Codec::ValidateCompressionLevel
Added new tests for -
* Additional compression levels for ZLIB, ZSTD and BZIP2
* Query option - "compression_codec" for the newly added formats
and compression levels
The following tests were executed to verify codecs and compression levels.
- DecompressorTest.ZSTD*
- DecompressorTest.Gzip
- DecompressorTest.Bzip
- QueryOptions.CompressionCodec
- TestComputeStats::test_compute_stats_compression_codec
For the stored Parquet, manually verified the compression codec used for
ZSTD and ZLIB.
Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
Reviewed-on: http://gerrit.cloudera.org:8080/22718
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Joe McDonnell <[email protected]>
> Allow specifying negative compression_levels for ZSTD
> -----------------------------------------------------
>
> Key: IMPALA-10630
> URL: https://issues.apache.org/jira/browse/IMPALA-10630
> Project: IMPALA
> Issue Type: Improvement
> Affects Versions: Impala 4.0.0
> Reporter: Joe McDonnell
> Priority: Major
>
> ParseCompressionCodec prohibits the compression_level from being below 1:
> {noformat}
> if (status != StringParser::PARSE_SUCCESS || compression_level < 1
> || compression_level > ZSTD_maxCLevel()) {
> return Status(Substitute("Invalid ZSTD compression level '$0'."
> " Valid values are in [1,$1]",
> clevel, ZSTD_maxCLevel()));
> }{noformat}
> [https://github.com/apache/impala/blob/ebbe52b4bed944d3012e3679dc984827ce11d5a8/be/src/util/parse-util.cc#L142-L147]
> ZSTD now supports negative compression levels that speed up compression and
> decompression at the expense of reduced compression ratio. See theĀ "--fast"
> options on [https://github.com/facebook/zstd]
> We should add support for those negative compression levels, as they seem
> very competitive with Snappy.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]