[ https://issues.apache.org/jira/browse/FLINK-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178593#comment-16178593 ]
ASF GitHub Bot commented on FLINK-5944: --------------------------------------- Github user haohui commented on a diff in the pull request: https://github.com/apache/flink/pull/4683#discussion_r140696992 --- Diff: flink-core/pom.xml --- @@ -52,6 +52,12 @@ under the License. <artifactId>flink-shaded-asm</artifactId> </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-shaded-hadoop2</artifactId> + <version>${project.version}</version> + </dependency> --- End diff -- Internally we have several users that try Flink to read the files generated by Hadoop (e.g. lz4 / gz / snappy). I think the support of Hadoop is quite important. I'm not sure supporting the xerial snappy format is a good idea. The two file formats are actually incompatible -- it would be quite confusing for the users to find out that they can't access the files using Spark / MR / Hive due to a missed configuration. I suggest at least we should make the Hadoop file format as the default -- or to just get rid of the xerial version of the file format. Putting the dependency in provided sounds fine to me -- if we need even tighter controls on the dependency, we can start thinking about having a separate module for it. What do you think? > Flink should support reading Snappy Files > ----------------------------------------- > > Key: FLINK-5944 > URL: https://issues.apache.org/jira/browse/FLINK-5944 > Project: Flink > Issue Type: New Feature > Components: Batch Connectors and Input/Output Formats > Reporter: Ilya Ganelin > Assignee: Mikhail Lipkovich > Labels: features > > Snappy is an extremely performant compression format that's widely used > offering fast decompression/compression. > This can be easily implemented by creating a SnappyInflaterInputStreamFactory > and updating the initDefaultInflateInputStreamFactories in FileInputFormat. > Flink already includes the Snappy dependency in the project. > There is a minor gotcha in this. If we wish to use this with Hadoop, then we > must provide two separate implementations since Hadoop uses a different > version of the snappy format than Snappy Java (which is the xerial/snappy > included in Flink). -- This message was sent by Atlassian JIRA (v6.4.14#64029)