Github user mlipkovich commented on a diff in the pull request: https://github.com/apache/flink/pull/4683#discussion_r140652438 --- Diff: flink-core/pom.xml --- @@ -52,6 +52,12 @@ under the License. <artifactId>flink-shaded-asm</artifactId> </dependency> + <dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-shaded-hadoop2</artifactId> + <version>${project.version}</version> + </dependency> --- End diff -- What do you think about adding this dependency to compile-time only? Regarding to difference between codecs as I understand the thing is that Snappy compressed files are not splittable. So Hadoop splits raw files into blocks and compresses each block separately using regular Snappy. If you download the whole Hadoop Snappy compressed file regular Snappy will not be able to decompress it since it's not aware of block boundaries
---