Thank you!

One more question - for the config provided below:

systems.system-name.streams.stream-name.samza.reset.offset = true
systems.system-name.streams.stream-name.samza.offset.default = oldest

How do I determine what the "stream-name" is? I'm running the hello-samza example,
which consumes wikipedia edits.


On 03/03/2016 10:44 AM, Jagadish Venkatraman wrote:
You can use the checkpoint tool to publish the desired offset, and restart
your job. It will pick up the new offset.
Please look at
https://samza.apache.org/learn/documentation/0.10/container/checkpointing.html
.

On Thu, Mar 3, 2016 at 6:28 AM, Jeff Ramin <jeff.ra...@singlewire.com>
wrote:

Thanks Jacob.

Regarding 2) below - is there a way to reprocess messages from an
arbitrary position,
instead of from the beginning?



On 03/01/2016 06:32 PM, Jacob Maes wrote:

A couple notes that may be helpful:

1. When you have a stateful processor that dies, the changelog is the
default means by which the state is restored. Change logging is enabled
with this config:
stores.store-name.changelog

2. If, when the job comes back up, it needs to reprocess historical
messages, it sounds like you actually don't want checkpoints, but you want
to rewind to the beginning of the topic. You can achieve this with the
following configs
systems.system-name.streams.stream-name.samza.reset.offset = true
systems.system-name.streams.stream-name.samza.offset.default = oldest
and possibly
systems.system-name.streams.stream-name.samza.bootstrap = true   // read
the doc on this one to decide if you need it


http://samza.apache.org/learn/documentation/0.10/jobs/configuration-table.html

On Tue, Mar 1, 2016 at 2:57 PM, Jagadish Venkatraman <
jagadish1...@gmail.com

wrote:
Users need not worry about checkpointing. Samza will automatically commit
offsets every 60s. You can choose to commit more often by either
1. Setting task.commit.ms to a smaller value (or)
2. Doing manual commit yourself by setting task.commit.ms = -1. and
calling
taskCoordinator.commit();

I'm curious as to Why processing from the exact previous offset is
unacceptable in your usecase?

Let's say you process till offfset 100, and crash. Should you not want to
resume from 100?







On Tue, Mar 1, 2016 at 1:41 PM, Jeff Ramin <jeff.ra...@singlewire.com>
wrote:


On 03/01/2016 03:10 PM, Jagadish Venkatraman wrote:

You don't have to implement any state checkpoint. Samza automatically
checkpoints state for you. When you recover from a failure/restart you
will
resume processing from the previous checkpoint.

So, it's merely a configuration issue?
    What's your usecase?
Pretty standard: have a consumer processing messages, which dies. When
it
comes back up,
it needs to process messages not just from when it died, but perhaps 24
hours prior to that time.


--
Jeff Ramin
Software Engineer
Singlewire Software
2601 W Beltline Hwy #510
Madison, WI 53713

Phone Direct - 608.661.1172
www.singlewire.com



--
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University


--
Jeff Ramin
Software Engineer
Singlewire Software
2601 W Beltline Hwy #510
Madison, WI 53713

Phone Direct - 608.661.1172
www.singlewire.com




--
Jeff Ramin
Software Engineer
Singlewire Software
2601 W Beltline Hwy #510
Madison, WI 53713

Phone Direct - 608.661.1172
www.singlewire.com

Reply via email to