Surya Hebbar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22718 )

Change subject: IMPALA-13923: Support more compression levels for ZSTD and ZLIB
......................................................................


Patch Set 4:

(10 comments)

>
 > There is some logic in hdfs-parquet-table-writer.cc that you need
 > to look at:
 > https://github.com/apache/impala/blob/master/be/src/exec/parquet/hdfs-parquet-table-writer.cc#L1329-L1331
 > https://github.com/apache/impala/blob/master/be/src/exec/parquet/hdfs-parquet-table-writer.cc#L1362-L1365
 > https://github.com/apache/impala/blob/master/be/src/exec/parquet/hdfs-parquet-table-writer.cc#L1383
 >
 > I think those locations should be preserving the std::optional so
 > that it is not set unless there is an explicit value from the user.

Thank you for the comments, I have corrected it now.


 > If we are supporting negative compression levels, that has some
 > implications for some of our performance testing infrastructure.
 > The compression level gets baked into the database name used for
 > loading data. "-" is not valid in a database name, so the negative
 > levels need some adjustment. See https://gerrit.cloudera.org/#/c/17265/
 > That doesn't necessarily need to be fixed as part of this. It could
 > be addressed in a separate JIRA.
I will make sure to address this in a separate patch.

http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/runtime/tmp-file-mgr.h
File be/src/runtime/tmp-file-mgr.h:

http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/runtime/tmp-file-mgr.h@368
PS3, Line 368:   /// and ignored otherwise.
> Please update the comment to reflect support for negative levels.
Done


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/service/query-options-test.cc
File be/src/service/query-options-test.cc:

http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/service/query-options-test.cc@626
PS3, Line 626:     // Test with compression levels for supported codecs i.e. 
ZLIB and ZSTD
> Is this really "all codecs"? It doesn't cover snappy, lz4, etc.
Done


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/service/query-options-test.cc@652
PS3, Line 652:   // Test minimum negative compression level
> typo: comression -> compression
Done


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/service/query-options-test.cc@657
PS3, Line 657:   // The values could begin from -11000 or even less
> This sentence seems incomplete.
Done


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/service/query-options.cc@310
PS3, Line 310:         std::optional<int> compression_level;
> nit: this is the default value, explicit initialization to nullopt isn't ne
Done


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/codec.cc
File be/src/util/codec.cc:

http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/codec.cc@84
PS3, Line 84: Status Codec::CreateCompressor(MemPool* mem_pool, bool reuse, 
const CodecInfo& codec_info,
            :     scoped_ptr<Codec>* compressor) {
            :   THdfsCompression::type format = codec_info.format_;
            :   switch (format) {
            :     case THdfsCompression::NONE:
            :       compressor->reset(nullptr);
            :       return Status::OK();
            :     case THdfsCompression::GZIP:
            :       compressor->reset(new GzipCompressor(GzipCompressor::GZIP, 
mem_pool, reuse,
            :           
codec_info.compression_level_.value_or(Z_DEFAULT_COMPRESSION)));
            :
> I believe this function is unused.
Done.


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/compress.h
File be/src/util/compress.h:

http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/compress.h@48
PS3, Line 48:       int compression_level = Z_DEFAULT_COMPRESSION);
> I think it would make sense to carry through the std::optional usage throug
As the implementations still need a default value, this seems to be the 
cleanest way. Please let me know, if something else is preferred. I think, 
CodecInfo being optional is sufficient.


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/compress.h@66
PS3, Line 66:   int compression_level_;
> Now that we have more than one compressor that uses levels, it would make s
If we approach this, CodecInfo will only have a single attribute "format", also 
the compression levels are not a part of many codecs like snappy.


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/parse-util.cc
File be/src/util/parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/parse-util.cc@135
PS3, Line 135:     string& clevel = tokens[1];
             :     trim(clevel);
             :     int compression_level = StringParser::StringToInt<int>(
             :         clevel.c_str(), static_cast<int>(clevel.size()), 
&status);
             :
             :     bool invalid_compression_level = false;
             :     if (status == StringParser::PARSE_SUCCESS) {
             :       return Codec::ValidateCompressionLevel(enum_type, 
compression_level);
             :     } else {
             :       return Status(Substitute("Invalid compression level value 
- $0", clevel));
             :     }
             :   }
             :   *type = enum_type;
             :   return Status::OK();
             : }
             :
             : // Return all enum values in a string format, e.g. FOO(1), 
BAR(2), BAZ(3).
             : string GetThriftEnumValues(const map<int, const char*>& 
enum_values_to_names) {
             :   bool first = true;
             :   stringstream ss;
             :   for (const auto& e : enum_values_to_names) {
             :     if (!first) {
             :       ss << ", ";
             :     } else {
             :       first = false;
             :     }
             :     ss << e.second << "(" << e.first << ")";
             :   }
             :   return ss.str();
             : }
             : }
             :
             :
             :
             :
             :
> I think it would be cleaner for a lot of the compression-codec specific log
Done. I have moved the validation into Codec class.


http://gerrit.cloudera.org:8080/#/c/22718/3/be/src/util/parse-util.cc@173
PS3, Line 173:
> I have a slight preference for providing the error message at each site and
Ack. I will add these error messages instead.



--
To view, visit http://gerrit.cloudera.org:8080/22718
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5b98c735246f08e04598a4e752c8cca04e31a88a
Gerrit-Change-Number: 22718
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar <sheb...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ara...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <ale...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Surya Hebbar <sheb...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Apr 2025 11:57:21 +0000
Gerrit-HasComments: Yes

Reply via email to