skyskyhu opened a new pull request, #6798:
URL: https://github.com/apache/hadoop/pull/6798

   HDFS-17510 Change of Codec configuration does not work
   
   ### Description of PR
   In one of my projects, I need to dynamically adjust compression level for 
different files. 
   However, I found that in most cases the new compression level does not take 
effect as expected, the old compression level continues to be used.
   
   Here is the relevant code snippet:
   ZStandardCodec zStandardCodec = new ZStandardCodec();
   zStandardCodec.setConf(conf);
   conf.set("io.compression.codec.zstd.level", "5"); // level may change 
dynamically
   conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
   writer = SequenceFile.createWriter(conf, 
SequenceFile.Writer.file(sequenceFilePath),
                                   
SequenceFile.Writer.keyClass(LongWritable.class),
                                   
SequenceFile.Writer.valueClass(BytesWritable.class),
                                   
SequenceFile.Writer.compression(CompressionType.BLOCK));
   
   The reason is SequenceFile.Writer.init() method will call 
CodecPool.getCompressor(codec, null) to get a compressor. 
   If the compressor is a reused instance, the conf is not applied because it 
is passed as null:
   public static Compressor getCompressor(CompressionCodec codec, Configuration 
conf) {
   Compressor compressor = borrow(compressorPool, codec.getCompressorType());
   if (compressor == null)
   
   { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor 
["+codec.getDefaultExtension()+"]"); }
   else {
   compressor.reinit(conf);   //conf is null here
   ......
   
   Please also refer to my unit test to reproduce the bug. 
   To address this bug, I modified the code to ensure that the configuration is 
read back from the codec when a compressor is reused.
   
   ### How was this patch tested?
   unit test 
   
   ### For code changes:
   
   - [Y] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to