[ 
https://issues.apache.org/jira/browse/FLINK-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426400#comment-15426400
 ] 

ASF GitHub Bot commented on FLINK-4329:
---------------------------------------

Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2350#discussion_r75303846
  
    --- Diff: 
flink-fs-tests/src/test/java/org/apache/flink/hdfstests/ContinuousFileMonitoringTest.java
 ---
    @@ -106,6 +109,140 @@ public static void destroyHDFS() {
        //                                              TESTS
     
        @Test
    +   public void testFileReadingOperatorWithIngestionTime() throws Exception 
{
    +           Set<org.apache.hadoop.fs.Path> filesCreated = new HashSet<>();
    +           Map<Integer, String> expectedFileContents = new HashMap<>();
    +           for(int i = 0; i < NO_OF_FILES; i++) {
    +                   Tuple2<org.apache.hadoop.fs.Path, String> file = 
fillWithData(hdfsURI, "file", i, "This is test line.");
    +                   filesCreated.add(file.f0);
    +                   expectedFileContents.put(i, file.f1);
    +           }
    +
    +           TextInputFormat format = new TextInputFormat(new Path(hdfsURI));
    +           TypeInformation<String> typeInfo = 
TypeExtractor.getInputFormatTypes(format);
    +
    +           ContinuousFileReaderOperator<String, ?> reader = new 
ContinuousFileReaderOperator<>(format);
    +
    +           StreamConfig streamConfig = new StreamConfig(new 
Configuration());
    +           
streamConfig.setTimeCharacteristic(TimeCharacteristic.IngestionTime);
    +
    +           ExecutionConfig executionConfig = new ExecutionConfig();
    +           executionConfig.setAutoWatermarkInterval(100);
    +
    +           TestTimeServiceProvider timeServiceProvider = new 
TestTimeServiceProvider();
    +           OneInputStreamOperatorTestHarness<FileInputSplit, String> 
tester =
    +                   new OneInputStreamOperatorTestHarness<>(reader, 
executionConfig, timeServiceProvider, streamConfig);
    +
    +           reader.setOutputType(typeInfo, new ExecutionConfig());
    +           tester.open();
    +
    +           timeServiceProvider.setCurrentTime(0);
    +
    +           long elementTimestamp = 201;
    +           timeServiceProvider.setCurrentTime(elementTimestamp);
    +
    +           // test that a watermark is actually emitted
    +           Assert.assertTrue(tester.getOutput().size() == 1 &&
    +                   tester.getOutput().peek() instanceof Watermark &&
    +                   ((Watermark) tester.getOutput().peek()).getTimestamp() 
== 200);
    --- End diff --
    
    You don't need to change it, but I think it's a good idea to test the 
conditions independently. This allows you to see which condition was false, 
based on the line number.


> Fix Streaming File Source Timestamps/Watermarks Handling
> --------------------------------------------------------
>
>                 Key: FLINK-4329
>                 URL: https://issues.apache.org/jira/browse/FLINK-4329
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming Connectors
>    Affects Versions: 1.1.0
>            Reporter: Aljoscha Krettek
>            Assignee: Kostas Kloudas
>             Fix For: 1.1.1
>
>
> The {{ContinuousFileReaderOperator}} does not correctly deal with watermarks, 
> i.e. they are just passed through. This means that when the 
> {{ContinuousFileMonitoringFunction}} closes and emits a {{Long.MAX_VALUE}} 
> that watermark can "overtake" the records that are to be emitted in the 
> {{ContinuousFileReaderOperator}}. Together with the new "allowed lateness" 
> setting in window operator this can lead to elements being dropped as late.
> Also, {{ContinuousFileReaderOperator}} does not correctly assign ingestion 
> timestamps since it is not technically a source but looks like one to the 
> user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to