[Discuss] reading of Snappy files

Mikhail Lipkovich Fri, 01 Sep 2017 13:41:08 -0700

Hi All,

I'm working on adding support of Snappy files read (
https://issues.apache.org/jira/browse/FLINK-5944) and would like to get
your opinion about one question


As you can see from the ticket description it's desired to support both
Java Snappy (xerial) and Hadoop Snappy codecs which are incompatible. The
thing is that desired codec for InputFormat is selected based on file
extension (e.g. '.gzip' or '.snappy'). So the question is how we can
distinguish whether the Hadoop Snappy codec or Java Snappy codec is needed.

I can propose the following options:
1. Add new config option to flink-conf.yaml like fs.hadoop-snappy and
select InputStreamFactory based on this option
2. Add flag parameter to API method readTextFile whether the file is Hadoop
Snappy
3. Add separate API method for reading snappy-compressed files
4. Ask users to use '.snappy' extension for Java Snappy and some other
extension like '.hsnappy' for Hadoop Snappy

>From my point of view option 1. seems more natural. Please let me know your
opinion


Thanks,

Mikhail

[Discuss] reading of Snappy files

Reply via email to