> On July 18, 2016, 5:12 p.m., Xinyu Liu wrote:
> > samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java, 
> > line 16
> > <https://reviews.apache.org/r/50143/diff/1/?file=1445940#file1445940line16>
> >
> >     Is it always true that user will need to commit before shutting down 
> > the task (I don't see a use case that use will not commit in the end)? Do 
> > we really need this api?

Let's say a stream contains messages: [1,2,3,4, EOF]. Lets say that autocommit 
is turned off. In such a scenario, when EOF is delivered, don't you think 
you'll need to give the user the ability to control whether you commit or not? 
(Because with auto-commit off , my understanding was we don't invoke commit)


> On July 18, 2016, 5:12 p.m., Xinyu Liu wrote:
> > samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala, 
> > line 85
> > <https://reviews.apache.org/r/50143/diff/1/?file=1445941#file1445941line85>
> >
> >     this is not thread-safe

Good point, this change does not consider multi-threading, I'll make this a 
concurrent data structure. Thanks for the observation!


- Jagadish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50143/#review142580
-----------------------------------------------------------


On July 18, 2016, 4:29 p.m., Jagadish Venkatraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50143/
> -----------------------------------------------------------
> 
> (Updated July 18, 2016, 4:29 p.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Fred Ji, Jake Maes, 
> Yi Pan (Data Infrastructure), Navina Ramesh, and Xinyu Liu.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Samza currently works with unbounded data sources. However, for bounded data 
> sources like HDFS files, snapshot files which are not infinite, we need a 
> notion of 'end-of-stream'.
> The following are the logical tasks:
> 1. SystemConsumer will indicate to Samza that the end of stream has been 
> reached for an SSP. (by constructing an envelope with eof set to true)
> 2. Samza will shut down the task if all SSPs in the task are at end of stream.
> 3. Samza will provide a callback to the task so that it can perform cleanups/ 
> commits once tasks are at end of stream.
> 4. Samza will shut down the container if all tasks in the container have been 
> shut down.
> 5. Samza will ultimately shut down the job if all containers in the job have 
> been shut down.
> 
> This is a step towards realizing a 'finite' Samza job that terminates (as 
> opposed to an infinite stream job that keeps running) once data processing is 
> complete.
> 
> 
> === This RB is an RFC for design feedback ====
> 
> TODO:
> 1. Add more unit tests
> 2. Verify behavior with multiple containers
> 
> 
> Diffs
> -----
> 
>   
> samza-api/src/main/java/org/apache/samza/system/IncomingMessageEnvelope.java 
> cc860cf7eb4d514736913c1dceaa80534b61d71a 
>   samza-api/src/main/java/org/apache/samza/task/EndOfStreamListenerTask.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala 
> d32a92976e43ca24033b48c91851ee706de7de6b 
>   samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
> e280daa9626757cb4d17c0c03eed923277230c3e 
> 
> Diff: https://reviews.apache.org/r/50143/diff/
> 
> 
> Testing
> -------
> 
> Added an unit test and verified that an End of stream message terminates the 
> runloop.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>

Reply via email to