----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51346/#review146982 -----------------------------------------------------------
samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java (line 537) <https://reviews.apache.org/r/51346/#comment213925> I think it might be cleaner to have two flags, one indicating endOfStream, the other indicating the task is complete. So we can directly set the endOfStream flags based on empty ssps. And the other flag can be set from the taskWorker. Let's chat offline. samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java (line 541) <https://reviews.apache.org/r/51346/#comment213926> I understand what you mean here but the naming may be a little confusing. sspsToProcess or sspsInProgress? Pleaes come up some name having a clear meaning. samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java (line 612) <https://reviews.apache.org/r/51346/#comment213927> I have some idea to integrate this logic inside isReady(), since this is pretty much the condition for endOfStream. Let's chat offline. - Xinyu Liu On Aug. 25, 2016, 11:52 p.m., Jagadish Venkatraman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51346/ > ----------------------------------------------------------- > > (Updated Aug. 25, 2016, 11:52 p.m.) > > > Review request for samza, Boris Shkolnik, Chris Pettitt, Yi Pan (Data > Infrastructure), Navina Ramesh, and Xinyu Liu. > > > Repository: samza > > > Description > ------- > > Samza currently works with unbounded data sources (kafka streams). However, > for bounded data sources like HDFS files, snapshot files which are not > infinite, we need a notion of 'end-of-stream'. > > This is a step towards realizing a 'finite' Samza job that terminates once > data processing is complete.(as opposed to an infinite stream job that keeps > running) > > RB changes: > - New interface for EndOfStreamListener > - New 'end-of-stream' state in the state-machine of AsyncStreamTask > (Invariant: When end-of-stream is reached there are no buffered messages, > no-callbacks are in-flight and no-window or commit call shall be in progress) > - Changes to allow clean shut-downs of the tasks/container/job for > end-of-stream. > > Design Doc and Implementation Notes: > https://issues.apache.org/jira/secure/attachment/12825119/ProposalforEndofStreaminSamza.pdf > > > Diffs > ----- > > > samza-api/src/main/java/org/apache/samza/system/IncomingMessageEnvelope.java > cc860cf7eb4d514736913c1dceaa80534b61d71a > > samza-api/src/main/java/org/apache/samza/system/SystemStreamPartitionIterator.java > a8f858aa7e4f4ce436f450cf439fe1a102983c64 > samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java > PRE-CREATION > samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java > a510bb0c5914c772438930d27f100b4d360c1296 > samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala > 89f6857014489aba2db4129bc2e26dfec5b10652 > samza-core/src/test/java/org/apache/samza/task/TestAsyncRunLoop.java > ca913dea79fecbcecdfd1010dc794318055c5764 > > Diff: https://reviews.apache.org/r/51346/diff/ > > > Testing > ------- > > Unit tests to test scenarios for inorder processing, out-of-order processing > and commit semantics. > > > Thanks, > > Jagadish Venkatraman > >