You can use the checkpoint tool to publish the desired offset, and restart your job. It will pick up the new offset. Please look at https://samza.apache.org/learn/documentation/0.10/container/checkpointing.html .
On Thu, Mar 3, 2016 at 6:28 AM, Jeff Ramin <jeff.ra...@singlewire.com> wrote: > > Thanks Jacob. > > Regarding 2) below - is there a way to reprocess messages from an > arbitrary position, > instead of from the beginning? > > > > On 03/01/2016 06:32 PM, Jacob Maes wrote: > >> A couple notes that may be helpful: >> >> 1. When you have a stateful processor that dies, the changelog is the >> default means by which the state is restored. Change logging is enabled >> with this config: >> stores.store-name.changelog >> >> 2. If, when the job comes back up, it needs to reprocess historical >> messages, it sounds like you actually don't want checkpoints, but you want >> to rewind to the beginning of the topic. You can achieve this with the >> following configs >> systems.system-name.streams.stream-name.samza.reset.offset = true >> systems.system-name.streams.stream-name.samza.offset.default = oldest >> and possibly >> systems.system-name.streams.stream-name.samza.bootstrap = true // read >> the doc on this one to decide if you need it >> >> >> http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html >> >> On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman < >> jagadish1...@gmail.com >> >>> wrote: >>> Users need not worry about checkpointing. Samza will automatically commit >>> offsets every 60s. You can choose to commit more often by either >>> 1. Setting task.commit.ms to a smaller value (or) >>> 2. Doing manual commit yourself by setting task.commit.ms = -1. and >>> calling >>> taskCoordinator.commit(); >>> >>> I'm curious as to Why processing from the exact previous offset is >>> unacceptable in your usecase? >>> >>> Let's say you process till offfset 100, and crash. Should you not want to >>> resume from 100? >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ra...@singlewire.com> >>> wrote: >>> >>> >>>> On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote: >>>> >>>> You don't have to implement any state checkpoint. Samza automatically >>>>> checkpoints state for you. When you recover from a failure/restart you >>>>> will >>>>> resume processing from the previous checkpoint. >>>>> >>>>> So, it's merely a configuration issue? >>>> >>>> What's your usecase? >>>> Pretty standard: have a consumer processing messages, which dies. When >>>> it >>>> comes back up, >>>> it needs to process messages not just from when it died, but perhaps 24 >>>> hours prior to that time. >>>> >>>> >>>> -- >>>> Jeff Ramin >>>> Software Engineer >>>> Singlewire Software >>>> 2601 W Beltline Hwy #510 >>>> Madison, WI 53713 >>>> >>>> Phone Direct - 608.661.1172 >>>> www.singlewire.com >>>> >>>> >>>> >>> -- >>> Jagadish V, >>> Graduate Student, >>> Department of Computer Science, >>> Stanford University >>> >>> > -- > Jeff Ramin > Software Engineer > Singlewire Software > 2601 W Beltline Hwy #510 > Madison, WI 53713 > > Phone Direct - 608.661.1172 > www.singlewire.com > > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University