[
https://issues.apache.org/jira/browse/HADOOP-19167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844925#comment-17844925
]
ASF GitHub Bot commented on HADOOP-19167:
-----------------------------------------
skyskyhu opened a new pull request, #6807:
URL: https://github.com/apache/hadoop/pull/6807
[HADOOP-19167](https://issues.apache.org/jira/browse/HADOOP-19167) Change of
Codec configuration does not work
### Description of PR
In one of my projects, I need to dynamically adjust compression level for
different files.
However, I found that in most cases the new compression level does not take
effect as expected, the old compression level continues to be used.
Here is the relevant code snippet:
```
ZStandardCodec zStandardCodec = new ZStandardCodec();
zStandardCodec.setConf(conf);
conf.set("io.compression.codec.zstd.level", "5"); // level may change
dynamically
conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
writer = SequenceFile.createWriter(conf,
SequenceFile.Writer.file(sequenceFilePath),
SequenceFile.Writer.keyClass(LongWritable.class),
SequenceFile.Writer.valueClass(BytesWritable.class),
SequenceFile.Writer.compression(CompressionType.BLOCK));
```
Take my unit test as another example:
```
DefaultCodec codec1 = new DefaultCodec();
Configuration conf = new Configuration();
ZlibFactory.setCompressionLevel(conf, CompressionLevel.TWO);
codec1.setConf(conf);
Compressor comp1 = CodecPool.getCompressor(codec1);
CodecPool.returnCompressor(comp1);
DefaultCodec codec2 = new DefaultCodec();
Configuration conf2 = new Configuration();
CompressionLevel newCompressionLevel = CompressionLevel.THREE;
ZlibFactory.setCompressionLevel(conf2, newCompressionLevel);
codec2.setConf(conf2);
Compressor comp2 = CodecPool.getCompressor(codec2);
```
In the current code, the compression level of comp2 is 2, rather than the
intended level of 3.
The reason is SequenceFile.Writer.init() method will call
CodecPool.getCompressor(codec) to get a compressor, eventually
CodecPool.getCompressor(codec, null) will be called.
If the compressor is a reused instance, the conf is not applied because it
is passed as null:
```
public static Compressor getCompressor(CompressionCodec codec, Configuration
conf) {
Compressor compressor = borrow(compressorPool, codec.getCompressorType());
if (compressor == null) {
compressor = codec.createCompressor();
LOG.info("Got brand-new compressor ["+codec.getDefaultExtension()+"]");
} else {
compressor.reinit(conf); //conf is null here
......
```
Please also refer to my unit test to reproduce the bug.
To address this bug, I modified the code to ensure that the configuration is
read back from the codec when a compressor is reused.
### How was this patch tested?
unit test
> Change of Codec configuration does not work
> -------------------------------------------
>
> Key: HADOOP-19167
> URL: https://issues.apache.org/jira/browse/HADOOP-19167
> Project: Hadoop Common
> Issue Type: Bug
> Components: compress
> Reporter: Zhikai Hu
> Priority: Minor
> Labels: pull-request-available
>
> In one of my projects, I need to dynamically adjust compression level for
> different files.
> However, I found that in most cases the new compression level does not take
> effect as expected, the old compression level continues to be used.
> Here is the relevant code snippet:
> ZStandardCodec zStandardCodec = new ZStandardCodec();
> zStandardCodec.setConf(conf);
> conf.set("io.compression.codec.zstd.level", "5"); // level may change
> dynamically
> conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
> writer = SequenceFile.createWriter(conf,
> SequenceFile.Writer.file(sequenceFilePath),
>
> SequenceFile.Writer.keyClass(LongWritable.class),
>
> SequenceFile.Writer.valueClass(BytesWritable.class),
>
> SequenceFile.Writer.compression(CompressionType.BLOCK));
> The reason is SequenceFile.Writer.init() method will call
> CodecPool.getCompressor(codec, null) to get a compressor.
> If the compressor is a reused instance, the conf is not applied because it is
> passed as null:
> public static Compressor getCompressor(CompressionCodec codec, Configuration
> conf) {
> Compressor compressor = borrow(compressorPool, codec.getCompressorType());
> if (compressor == null)
> { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor
> ["+codec.getDefaultExtension()+"]"); }
> else {
> compressor.reinit(conf); //conf is null here
> ......
>
> Please also refer to my unit test to reproduce the bug.
> To address this bug, I modified the code to ensure that the configuration is
> read back from the codec when a compressor is reused.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]