Ruslan Dautkhanov created HADOOP-17231:
------------------------------------------

             Summary: empty getDefaultExtension() is ignored
                 Key: HADOOP-17231
                 URL: https://issues.apache.org/jira/browse/HADOOP-17231
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 3.1.3, 3.2.0
            Reporter: Ruslan Dautkhanov


Use case - source files are gz-compressed but have no extensions.

Attempt to auto-decompress them through 
{code:java}
package com.my.codec.test

import org.apache.hadoop.io.compress.GzipCodec

class GZCodec extends GzipCodec {
  override def getDefaultExtension(): String = ""
 }
{code}
 (notice empty getDefaultExtension ) and then setting *io.compression.codecs* 
to com.my.codec.test.GZCodec makes no effect 

Similar tests with one-character encoding for last possible names makes it 
work. So only the empty-string getDefaultExtension case is broken. 

I guess the issue is somewhere in 
[https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CompressionCodecFactory.java#L109]
 

but it's not obvious. 

Folks have built some workarounds using custom readers, for example, 
 # 
[https://daynebatten.com/2015/11/override-hadoop-compression-codec-file-extension/]
 # 
[https://stackoverflow.com/questions/52011697/how-to-read-a-compressed-gzip-file-without-extension-in-spark?rq=1]
 

Hopefully it would be an easy fix to support empty getDefaultExtension? 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to