Hi Jun,

Answers inline:

210. Pid snapshots: Is the number of pid snapshot configurable or hardcoded
> with 2? When do we decide to roll a new snapshot? Based on time, byte, or
> offset? Is that configurable too?
>

These are good questions. We haven't fleshed out the policy by which the
snapshots will be generated. I guess there will be some scheduled task that
takes snapshots, and we will retain the two latest ones. These will be map
to different points in time, but each will be the complete view of the
PID->Sequence map at the point it time it was created. At start time, the
Pid mapping will be built from the latest snapshot unless it is somehow
corrupt, in which case the older one will be used. Otherwise, the older one
will be ignored.

I don't think there are good reason to keep more than one, to be honest. If
the snapshot is corrupt, we can always rebuild the map from the log itself.
I have updated the doc to state that there will be exactly one snapshot
file.

With one snapshot, we don't need additional configs. Do you agree?


>
> 211. I am wondering if we should store ExpirationTime in the producer
> transactionalId mapping message as we do in the producer transaction status
> message. If a producer only calls initTransactions(), but never publishes
> any data, we still want to be able to expire and remove the producer
> transactionalId mapping message.
>
>
Actually, the document was inaccurate. The transactionalId will be expired
only if there is no active transaction, and the age of the last transaction
with that transactionalId is older than the transactioanlId expiration
time. With these semantics, storing the expiration time in the
transactionalId mapping message won't be useful, since the expiration time
is a moving target based on transaction activity.

I have updated the doc with a clarification.



> 212. The doc says "The LSO is always equal to one less than the minimum of
> the initial offsets across all active transactions". This implies that LSO
> is inclusive. However, currently, both high watermark and log end offsets
> are exclusive. For consistency, it seems that we should make LSO exclusive
> as well.
>

Sounds good. Doc updated.


>
> 213. The doc says "If the topic is configured for compaction and deletion,
> we will use the topic’s own retention limit. Otherwise, we will use the
> default topic retention limit. Once the last message set produced by a
> given PID has aged beyond the retention time, the PID will be expired." For
> topics configured with just compaction, it seems it's more intuitive to
> expire PID based on transactional.id.expiration.ms?
>

This could work. The problem is that all idempotent producers get a PID,
but only transactional producers have a transactionalId. Since they are
separate concepts, it seems better not to conflate PID expiration with the
settings for transactionalId expiration.

Another way of putting it: it seems more natural for the retention of the
PID to be based on the retention of the messages in a topic. The
transactionalId expiration is based on the transaction activity of a
producer, which is quite a different thing.

But I see your point: having these separate can put us in a situation where
the PID gets expired even if the transactionalId is not. To solve this we
should set the default transactionalId expiration time to be less than the
default topic retention time, so that the transactionalId can be expired
before the PID. Does that seem reasonable?


>
> 214. In the Coordinator-Broker request handling section, the doc says "If
> the broker has a corresponding PID, verify that the received epoch is
> greater than or equal to the current epoch. " Is the epoch the coordinator
> epoch or the producer epoch?
>

This is the producer epoch, as known by the coordinator. I have clarified
this in the document.  However, there may be producers with the same PID on
a different epoch, which will be fenced out future calls.


> 215. The doc says "Append to the offset topic, but skip updating the offset
> cache in the delayed produce callback, until a WriteTxnMarkerRequest from
> the transaction coordinator is received including the offset topic
> partitions." How do we do this efficiently? Do we need to cache pending
> offsets per pid?
>

I think we would need to cache the consumerGroup/partition-> offset for the
written but uncommitted offsets. Once the transaction commits, these
entries can be moved to the main cache.

However, I think that the overhead for these duplicate temporary entries
should be marginal because it would be proportional to the number of input
topics per transaction.


>
> 216. Do we need to add any new JMX metrics? For example, on the broker and
> transaction coordinator side, it would be useful to know the number of live
> pids.
>

Yes, I think we will definitely need to introduce new metrics. Some would
be:

1. Number of live PIDs (a proxy for the size of the PID->Sequence map)
2. Current LSO per partition (useful to detect stuck consumers and lost
commit/abort markers).
3. Number of active transactionalIds (proxy for the memory consumed by the
transaction coordinator).

 I have added these to the docs, though I think we will probably think of
more as the work progresses.

Reply via email to