Hi Yi,

Thanks for the response. We are running Samza 0.9.1, so we do not yet have the 
coordinator stream. But to answer your other questions, the number of task 
instances did not change. Specifically, none of the input topic, the number of 
partitions in that topic, nor the grouper algorithm changed. The 
non-determinism I am referring to can be seen here:

https://github.com/apache/samza/blob/0.9.1/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala#L141

Since we lost the original mapping, there is no previousChangelogeMapping (sic) 
and the code creates a new mapping by simply assigning sequential partition 
numbers from 0 to the number of tasks. But the order in which these are 
assigned seems to be determined by the order of the TaskNames in the map 
returned by the SystemStreamPartitionGrouper (i.e. no meaningful order). So 
there is no guarantee that this code will produce the same changelog mapping 
each time it runs, even if the number of tasks is the same. Does that make 
sense?  The code has changed some since 0.9.1 but seems to have the same issue 
even in 0.10.1.

-Tommy

On 08/11/2016 06:12 PM, Yi Pan wrote:

Hi, Tommy,

Which version of Samza are you using? Since 0.10, the changelog partition
mapping has been moved to the coordinator stream, not in the checkpoint
topic any more.

That said, I want to ask a few more questions to understand what you
referred to as "non-deterministic" behavior. So, between the job restarts,
did the total number of tasks change? As you have observed, the total
number of partitions in a changelog topic is equivalent to the total number
of tasks in a job. And the reasons for the total number of tasks to change
include:
- the input topic partition changed
- the grouper algorithm changed
In both cases, the states are no longer considered valid, since data may
have been shuffled between the Kafka partitions, or between the tasks
already.

Could you clarify whether you saw the "non-determinism" w/ or w/o the total
number of tasks changed?

Thanks!

-Yi

On Thu, Aug 11, 2016 at 11:56 AM, Tommy Becker 
<tobec...@tivo.com><mailto:tobec...@tivo.com> wrote:



We recently had an issue that caused us to lose the contents of one of our
Samza job's checkpoint topics. We were not that concerned about losing the
checkpointed offsets and so we restarted the job. We then started seeing
some very strange results and were able to trace it back to the fact that
changelog paritition mapping changed. We were unaware this data was stored
in the checkpoint topic. Can someone explain why this mapping is necessary?
I was under the impression that the number of changelog partitions is
identical to the number of task instances. If this is so, can't partitions
just be assigned based on the task number? Assuming the mapping is
necessary, it would be nice if it was deterministic. Looking at
JobCoordinator, it seems to be dependent on the order in which things come
back in the map produced by the SystemStreamPartitionGrouper. This
non-determinism seems to have been the cause of our issues. Obviously data
loss is a problem, but it seems like Samza could have recreated the
original mapping. Should I file a bug on this?

--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com><http://www.digitalsmiths.com><http://www.digitalsmiths.com>
tobec...@tivo.com<mailto:tobec...@tivo.com><mailto:tobec...@tivo.com><mailto:tobec...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review, copying,
or distribution of this email (or any attachments) by others is prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.






--
Tommy Becker
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com>
tobec...@tivo.com<mailto:tobec...@tivo.com>

________________________________

This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.

Reply via email to