https://issues.apache.org/jira/browse/SAMZA-255 is the JIRA for rewinding a stream without restarting.
On Thu, Mar 3, 2016 at 8:44 AM, Jagadish Venkatraman <jagadish1...@gmail.com > wrote: > You can use the checkpoint tool to publish the desired offset, and restart > your job. It will pick up the new offset. > Please look at > https://samza.apache.org/learn/documentation/0.10/container/checkpointing.html > . > > On Thu, Mar 3, 2016 at 6:28 AM, Jeff Ramin <jeff.ra...@singlewire.com> > wrote: > >> >> Thanks Jacob. >> >> Regarding 2) below - is there a way to reprocess messages from an >> arbitrary position, >> instead of from the beginning? >> >> >> >> On 03/01/2016 06:32 PM, Jacob Maes wrote: >> >>> A couple notes that may be helpful: >>> >>> 1. When you have a stateful processor that dies, the changelog is the >>> default means by which the state is restored. Change logging is enabled >>> with this config: >>> stores.store-name.changelog >>> >>> 2. If, when the job comes back up, it needs to reprocess historical >>> messages, it sounds like you actually don't want checkpoints, but you >>> want >>> to rewind to the beginning of the topic. You can achieve this with the >>> following configs >>> systems.system-name.streams.stream-name.samza.reset.offset = true >>> systems.system-name.streams.stream-name.samza.offset.default = oldest >>> and possibly >>> systems.system-name.streams.stream-name.samza.bootstrap = true // read >>> the doc on this one to decide if you need it >>> >>> >>> http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html >>> >>> On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman < >>> jagadish1...@gmail.com >>> >>>> wrote: >>>> Users need not worry about checkpointing. Samza will automatically >>>> commit >>>> offsets every 60s. You can choose to commit more often by either >>>> 1. Setting task.commit.ms to a smaller value (or) >>>> 2. Doing manual commit yourself by setting task.commit.ms = -1. and >>>> calling >>>> taskCoordinator.commit(); >>>> >>>> I'm curious as to Why processing from the exact previous offset is >>>> unacceptable in your usecase? >>>> >>>> Let's say you process till offfset 100, and crash. Should you not want >>>> to >>>> resume from 100? >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ra...@singlewire.com> >>>> wrote: >>>> >>>> >>>>> On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote: >>>>> >>>>> You don't have to implement any state checkpoint. Samza automatically >>>>>> checkpoints state for you. When you recover from a failure/restart you >>>>>> will >>>>>> resume processing from the previous checkpoint. >>>>>> >>>>>> So, it's merely a configuration issue? >>>>> >>>>> What's your usecase? >>>>> Pretty standard: have a consumer processing messages, which dies. When >>>>> it >>>>> comes back up, >>>>> it needs to process messages not just from when it died, but perhaps 24 >>>>> hours prior to that time. >>>>> >>>>> >>>>> -- >>>>> Jeff Ramin >>>>> Software Engineer >>>>> Singlewire Software >>>>> 2601 W Beltline Hwy #510 >>>>> Madison, WI 53713 >>>>> >>>>> Phone Direct - 608.661.1172 >>>>> www.singlewire.com >>>>> >>>>> >>>>> >>>> -- >>>> Jagadish V, >>>> Graduate Student, >>>> Department of Computer Science, >>>> Stanford University >>>> >>>> >> -- >> Jeff Ramin >> Software Engineer >> Singlewire Software >> 2601 W Beltline Hwy #510 >> Madison, WI 53713 >> >> Phone Direct - 608.661.1172 >> www.singlewire.com >> >> > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University