Gyula Fora created FLINK-20221:
----------------------------------

             Summary: DelimitedInputFormat does not restore compressed 
filesplits correctly leading to dataloss
                 Key: FLINK-20221
                 URL: https://issues.apache.org/jira/browse/FLINK-20221
             Project: Flink
          Issue Type: Bug
          Components: Connectors / FileSystem
    Affects Versions: 1.11.2, 1.10.2, 1.12.0
            Reporter: Gyula Fora
            Assignee: Gyula Fora


It seems that the delimited input format cannot correctly restore input splits 
if they belong to compressed files. Basically when a compressed filesplit is 
restored in the middle, it won't read it anymore leading to dataloss.

The cause of the problem is that for compressed splits that use an inflater 
stream, the splitlength is set to the magic number -1 which is ignored in the 
reopen method and causes the split to go to `end` state immediately.

The problem and the fix is shown in this commit:
[https://github.com/gyfora/flink/commit/4adc8ba8d1989fff2db43881c9cb3799848c6e0d]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to