[ 
https://issues.apache.org/jira/browse/FLINK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529556#comment-15529556
 ] 

ASF GitHub Bot commented on FLINK-4329:
---------------------------------------

Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2546
  
    Actually, let me take a step back and understand a few things deeper, first.
    Who actually generates the watermarks (in ingestion time)? The operator 
that creates the file splits, or the operator that reads the splits?
    
    If the configuration is set to IngestionTime, will the operator that 
creates the file splits emit a final LongMax watermark? Is that one passing 
through by the split-reading operator? Is there a test that test that specific 
scenario? (I believe it was the initially reported bug).


> Fix Streaming File Source Timestamps/Watermarks Handling
> --------------------------------------------------------
>
>                 Key: FLINK-4329
>                 URL: https://issues.apache.org/jira/browse/FLINK-4329
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>    Affects Versions: 1.1.0
>            Reporter: Aljoscha Krettek
>            Assignee: Kostas Kloudas
>             Fix For: 1.2.0, 1.1.3
>
>
> The {{ContinuousFileReaderOperator}} does not correctly deal with watermarks, 
> i.e. they are just passed through. This means that when the 
> {{ContinuousFileMonitoringFunction}} closes and emits a {{Long.MAX_VALUE}} 
> that watermark can "overtake" the records that are to be emitted in the 
> {{ContinuousFileReaderOperator}}. Together with the new "allowed lateness" 
> setting in window operator this can lead to elements being dropped as late.
> Also, {{ContinuousFileReaderOperator}} does not correctly assign ingestion 
> timestamps since it is not technically a source but looks like one to the 
> user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to