----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50143/#review142580 -----------------------------------------------------------
samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java (line 16) <https://reviews.apache.org/r/50143/#comment208204> Is it always true that user will need to commit before shutting down the task (I don't see a use case that use will not commit in the end)? Do we really need this api? samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala (line 78) <https://reviews.apache.org/r/50143/#comment208202> this is not thread-safe samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala (line 135) <https://reviews.apache.org/r/50143/#comment208203> Please make this thread safe since we might have multiple threads for a task. - Xinyu Liu On July 18, 2016, 4:29 p.m., Jagadish Venkatraman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50143/ > ----------------------------------------------------------- > > (Updated July 18, 2016, 4:29 p.m.) > > > Review request for samza, Boris Shkolnik, Chris Pettitt, Fred Ji, Jake Maes, > Yi Pan (Data Infrastructure), Navina Ramesh, and Xinyu Liu. > > > Repository: samza > > > Description > ------- > > Samza currently works with unbounded data sources. However, for bounded data > sources like HDFS files, snapshot files which are not infinite, we need a > notion of 'end-of-stream'. > The following are the logical tasks: > 1. SystemConsumer will indicate to Samza that the end of stream has been > reached for an SSP. (by constructing an envelope with eof set to true) > 2. Samza will shut down the task if all SSPs in the task are at end of stream. > 3. Samza will provide a callback to the task so that it can perform cleanups/ > commits once tasks are at end of stream. > 4. Samza will shut down the container if all tasks in the container have been > shut down. > 5. Samza will ultimately shut down the job if all containers in the job have > been shut down. > > This is a step towards realizing a 'finite' Samza job that terminates (as > opposed to an infinite stream job that keeps running) once data processing is > complete. > > > === This RB is an RFC for design feedback ==== > > TODO: > 1. Add more unit tests > 2. Verify behavior with multiple containers > > > Diffs > ----- > > > samza-api/src/main/java/org/apache/samza/system/IncomingMessageEnvelope.java > cc860cf7eb4d514736913c1dceaa80534b61d71a > samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java > PRE-CREATION > samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala > d32a92976e43ca24033b48c91851ee706de7de6b > samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala > e280daa9626757cb4d17c0c03eed923277230c3e > > Diff: https://reviews.apache.org/r/50143/diff/ > > > Testing > ------- > > Added an unit test and verified that an End of stream message terminates the > runloop. > > > Thanks, > > Jagadish Venkatraman > >