Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/2028#issuecomment-221853220
  
    Thanks for looking into this.
    Making the iteration timeout longer makes the test more stable, but still 
not absolutely stable. It also increases the time that the build takes, for 
every build.
    
    I think a good start would be to increase the timeout to 5000 (a bit more 
stability), and then actually improve the test and the iterations themselves:
    
      - The test case could be modified to actually run the program with a 
large iteration timeout. But the program is started in a separate thread, and 
the main thread checks for the result periodically (every 50ms or so) and 
cancels the program as soon as the expected result is there. That way, the test 
will return fast if the execution is fast, and will still be pretty reliable, 
because the termination has a long timeout.
    
      - We should change the stream iteration behavior from a timeout-based 
termination to a proper "end-of-stream-event" based termination. That would 
also automatically make the test reliable and predictable. This is a bit more 
involved, but would be the best solution. I'd be happy to write more about that 
in a design doc, if you want to do that.
    
    Please let us know if you would be up for helping out with these efforts!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to