Hi, Apurva, Thanks for the reply. A couple of comment below.
On Wed, Feb 15, 2017 at 9:45 PM, Apurva Mehta <apu...@confluent.io> wrote: > Hi Jun, > > Answers inline: > > 210. Pid snapshots: Is the number of pid snapshot configurable or hardcoded > > with 2? When do we decide to roll a new snapshot? Based on time, byte, or > > offset? Is that configurable too? > > > > These are good questions. We haven't fleshed out the policy by which the > snapshots will be generated. I guess there will be some scheduled task that > takes snapshots, and we will retain the two latest ones. These will be map > to different points in time, but each will be the complete view of the > PID->Sequence map at the point it time it was created. At start time, the > Pid mapping will be built from the latest snapshot unless it is somehow > corrupt, in which case the older one will be used. Otherwise, the older one > will be ignored. > > I don't think there are good reason to keep more than one, to be honest. If > the snapshot is corrupt, we can always rebuild the map from the log itself. > I have updated the doc to state that there will be exactly one snapshot > file. > > With one snapshot, we don't need additional configs. Do you agree? > > > When a replica becomes a follower, we do a bit log truncation. Having an older snapshot allows us to recover the PID->sequence mapping much quicker than rescanning the whole log. > > > > 211. I am wondering if we should store ExpirationTime in the producer > > transactionalId mapping message as we do in the producer transaction > status > > message. If a producer only calls initTransactions(), but never publishes > > any data, we still want to be able to expire and remove the producer > > transactionalId mapping message. > > > > > Actually, the document was inaccurate. The transactionalId will be expired > only if there is no active transaction, and the age of the last transaction > with that transactionalId is older than the transactioanlId expiration > time. With these semantics, storing the expiration time in the > transactionalId mapping message won't be useful, since the expiration time > is a moving target based on transaction activity. > > I have updated the doc with a clarification. > > > Currently, the producer transactionalId mapping message doesn't carry ExpirationTime, but the producer transaction status message does. It would be useful if they are consistent. > > > 212. The doc says "The LSO is always equal to one less than the minimum > of > > the initial offsets across all active transactions". This implies that > LSO > > is inclusive. However, currently, both high watermark and log end offsets > > are exclusive. For consistency, it seems that we should make LSO > exclusive > > as well. > > > > Sounds good. Doc updated. > > > > > > 213. The doc says "If the topic is configured for compaction and > deletion, > > we will use the topic’s own retention limit. Otherwise, we will use the > > default topic retention limit. Once the last message set produced by a > > given PID has aged beyond the retention time, the PID will be expired." > For > > topics configured with just compaction, it seems it's more intuitive to > > expire PID based on transactional.id.expiration.ms? > > > > This could work. The problem is that all idempotent producers get a PID, > but only transactional producers have a transactionalId. Since they are > separate concepts, it seems better not to conflate PID expiration with the > settings for transactionalId expiration. > > Another way of putting it: it seems more natural for the retention of the > PID to be based on the retention of the messages in a topic. The > transactionalId expiration is based on the transaction activity of a > producer, which is quite a different thing. > > But I see your point: having these separate can put us in a situation where > the PID gets expired even if the transactionalId is not. To solve this we > should set the default transactionalId expiration time to be less than the > default topic retention time, so that the transactionalId can be expired > before the PID. Does that seem reasonable? > > > > > > 214. In the Coordinator-Broker request handling section, the doc says "If > > the broker has a corresponding PID, verify that the received epoch is > > greater than or equal to the current epoch. " Is the epoch the > coordinator > > epoch or the producer epoch? > > > > This is the producer epoch, as known by the coordinator. I have clarified > this in the document. However, there may be producers with the same PID on > a different epoch, which will be fenced out future calls. > > > > 215. The doc says "Append to the offset topic, but skip updating the > offset > > cache in the delayed produce callback, until a WriteTxnMarkerRequest from > > the transaction coordinator is received including the offset topic > > partitions." How do we do this efficiently? Do we need to cache pending > > offsets per pid? > > > > I think we would need to cache the consumerGroup/partition-> offset for the > written but uncommitted offsets. Once the transaction commits, these > entries can be moved to the main cache. > > However, I think that the overhead for these duplicate temporary entries > should be marginal because it would be proportional to the number of input > topics per transaction. > > > > > > 216. Do we need to add any new JMX metrics? For example, on the broker > and > > transaction coordinator side, it would be useful to know the number of > live > > pids. > > > > Yes, I think we will definitely need to introduce new metrics. Some would > be: > > 1. Number of live PIDs (a proxy for the size of the PID->Sequence map) > 2. Current LSO per partition (useful to detect stuck consumers and lost > commit/abort markers). > 3. Number of active transactionalIds (proxy for the memory consumed by the > transaction coordinator). > > I have added these to the docs, though I think we will probably think of > more as the work progresses. > Thanks, Jun