[ 
https://issues.apache.org/jira/browse/FLINK-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237850#comment-17237850
 ] 

Yun Gao commented on FLINK-20295:
---------------------------------

Hi [~sewen], the minimal example is submitted: [The minimal 
example|https://github.com/gaoyunhaii/flink/commit/8874b3494dc25bda1859d332a301771ff98238c3]
 .

I also debugged the case and found that it should be due to that 
_DeserializationSchemaAdapter_ always returned the same cached iterator for 
different batches, thus it might be the data get override before it is fully 
emitted. Returns different iterators should be able to solve this issue. 

> File Source lost data when reading from directories created by 
> FileSystemTableSink with JSON format
> ---------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-20295
>                 URL: https://issues.apache.org/jira/browse/FLINK-20295
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Table SQL / Ecosystem
>            Reporter: Yun Gao
>            Priority: Critical
>             Fix For: 1.12.0
>
>         Attachments: compaction.tgz
>
>
> When testing the compaction functionality of the FileSystemTableSink, I found 
> that when using json format, the produced directories could not be read 
> correctly by the file source, namely only a part of records are read.
> By checking the produced directories, the number of the records in it is the 
> same as expected, thus it seems to be the issue of the source side.
>  
> The issue only exists for JSON format.
> The data is produced by 
> [FileCompactionTest|https://github.com/gaoyunhaii/flink1.12test/blob/main/src/main/java/FileCompactionTest.java]
>  and read by  
> [FileCompactionCheckTest|https://github.com/gaoyunhaii/flink1.12test/blob/main/src/main/java/FileCompactionCheckTest.java]
>  . An example directories tar file of 8000 records are also attached.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to