-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50143/#review142580
-----------------------------------------------------------




samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java 
(line 16)
<https://reviews.apache.org/r/50143/#comment208204>

    Is it always true that user will need to commit before shutting down the 
task (I don't see a use case that use will not commit in the end)? Do we really 
need this api?



samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala (line 
78)
<https://reviews.apache.org/r/50143/#comment208202>

    this is not thread-safe



samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala (line 
135)
<https://reviews.apache.org/r/50143/#comment208203>

    Please make this thread safe since we might have multiple threads for a 
task.


- Xinyu Liu


On July 18, 2016, 4:29 p.m., Jagadish Venkatraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50143/
> -----------------------------------------------------------
> 
> (Updated July 18, 2016, 4:29 p.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Fred Ji, Jake Maes, 
> Yi Pan (Data Infrastructure), Navina Ramesh, and Xinyu Liu.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Samza currently works with unbounded data sources. However, for bounded data 
> sources like HDFS files, snapshot files which are not infinite, we need a 
> notion of 'end-of-stream'.
> The following are the logical tasks:
> 1. SystemConsumer will indicate to Samza that the end of stream has been 
> reached for an SSP. (by constructing an envelope with eof set to true)
> 2. Samza will shut down the task if all SSPs in the task are at end of stream.
> 3. Samza will provide a callback to the task so that it can perform cleanups/ 
> commits once tasks are at end of stream.
> 4. Samza will shut down the container if all tasks in the container have been 
> shut down.
> 5. Samza will ultimately shut down the job if all containers in the job have 
> been shut down.
> 
> This is a step towards realizing a 'finite' Samza job that terminates (as 
> opposed to an infinite stream job that keeps running) once data processing is 
> complete.
> 
> 
> === This RB is an RFC for design feedback ====
> 
> TODO:
> 1. Add more unit tests
> 2. Verify behavior with multiple containers
> 
> 
> Diffs
> -----
> 
>   
> samza-api/src/main/java/org/apache/samza/system/IncomingMessageEnvelope.java 
> cc860cf7eb4d514736913c1dceaa80534b61d71a 
>   samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala 
> d32a92976e43ca24033b48c91851ee706de7de6b 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
> e280daa9626757cb4d17c0c03eed923277230c3e 
> 
> Diff: https://reviews.apache.org/r/50143/diff/
> 
> 
> Testing
> -------
> 
> Added an unit test and verified that an End of stream message terminates the 
> runloop.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>

Reply via email to