A few questions: 1. If we update the producers to only support V1, doesn't that mean people get stuck on the current version of the producer until they can be sure all their consumers have been upgraded? Is that going to be a problem for anyone, and does it potentially keep important fixes/enhancements (e.g. the upcoming client-side network request timeouts) because they have to wait for all the consumers to upgrade first?
2. Why minute granularity specifically? That's seems fairly specific to LI's example workload (which, by the way, is a very useful reference point to include in the KIP, thanks for that). If someone has the memory and can support it, why not let them go all the way to per-record timestamps (by making the interval configurable and they can set it really small/to 0)? It seems to me that some people might be willing to pay that cost for the application-side simplicity if they want to consume from specific timestamps. With the proposal as is, seeking to a timestamp still requires client-side logic to filter out messages that might have earlier timestamps due to the granularity chosen. The obvious tradeoff here is yet another config option -- I'm loath to add yet more configs, although this is one that we can choose a reasonable default for (like 1 minute) and people can easily change with no impact if they need to adjust it. 3. Why not respect the timestamp passed in as long as it's not some sentinel value that indicates the broker should fill it in? When you do know they will be ordered, as they should be with mirror maker (which is specifically mentioned), this seems really useful. (More about this in questions below...) 4. I think one obvious answer to (3) is that accepting client's timestamps could break seeking-by-timestamp since they may not be properly ordered. However, I think this can break anyway under normal operations -- any imperfections in clock synchronization could result in older timestamps being applied to new messages on a new leader broker compared messages stored previously by the old leader. I think it's probably too common to just think that NTP will take care of it and then run into weird bugs because NTP isn't always perfect, sometimes does some unexpected things, and, of course, does what you tell it to and so is subject to user error. It would be reasonable to argue that the leader change issue is much less likely of an issue than if you respect timestamps from producers, where, if applications actually filled it in, you'd receive a jumbled mess of timestamps and trying to do the binary search in the index wouldn't necessarily give you correct results. However, a) we could allow clients to fill that info in but discourage it where it might cause issues (i.e. normal applications) and it seems like a significant win for mirrormaker. I actually think not accepting timestamps is probably the better choice but a) it seems awkward in the protocol spec because we have to include the field in the produce requests since we don't want to have to fully decode them on the broker and b) losing that info during mirroring seems like it breaks the goal of fixing log retention (at least for mirrors) as well as the goal of improving searching by timestamp (at least for mirrors). 5. You never actually specified the granularity (or format, but I assume unix epoch) of the timestamp. Milliseconds? Microseconds? Nanoseconds? This definitely needs to eventually make it into the protocol docs. 6. Re: the rejected alternative. Are there any other options in changing the config format that might make it a bit lighter weight? For example, do we need a full int64? Could we do something relative instead that wouldn't require as many bytes? Without (5) being specified, it's actually difficult to evaluate some of these options. 7. Any plan to expose this in the client APIs? This is related to (4). If they are not exposed anywhere in the API as being associated with messages, then we can reasonably treat them as an internal implementation detail and be very clear about what looking up an offset for a timestamp means. If they are exposed, then they're more like message metadata that applications are probably going to want preserved, and thus will want the broker to respect the timestamp. For example, if I saw that consumers could get the timestamp, I'd want to be able to assign it at the producer so that even if I have significant batching I still get an accurate timestamp -- basically I might replace some cases where I use a timestamp embedded in the message with the timestamp provided by Kafka. (That said, we should consider the impact that including it in the protocol can have; non-Java clients are likely to expose it if it is available, whether it's actually a good idea to or not.) -Ewen On Thu, Sep 3, 2015 at 3:47 PM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Hi Guozhang, > > I checked the code again. Actually CRC check probably won't fail. The newly > added timestamp field might be treated as keyLength instead, so we are > likely to receive an IllegalArgumentException when try to read the key. > I'll update the KIP. > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Sep 3, 2015 at 12:48 PM, Jiangjie Qin <j...@linkedin.com> wrote: > > > Hi, Guozhang, > > > > Thanks for reading the KIP. By "old consumer", I meant the > > ZookeeperConsumerConnector in trunk now, i.e. without this bug fixed. If > we > > fix the ZookeeperConsumerConnector then it will throw exception > complaining > > about the unsupported version when it sees message format V1. What I was > > trying to say is that if we have some ZookeeperConsumerConnector running > > without the fix, the consumer will complain about CRC mismatch instead of > > unsupported version. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Thu, Sep 3, 2015 at 12:15 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > >> Thanks for the write-up Jiangjie. > >> > >> One comment about migration plan: "For old consumers, if they see the > new > >> protocol the CRC check will fail".. > >> > >> Do you mean this bug in the old consumer cannot be fixed in a > >> backward-compatible way? > >> > >> Guozhang > >> > >> > >> On Thu, Sep 3, 2015 at 8:35 AM, Jiangjie Qin <j...@linkedin.com.invalid > > > >> wrote: > >> > >> > Hi, > >> > > >> > We just created KIP-31 to propose a message format change in Kafka. > >> > > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-31+-+Message+format+change+proposal > >> > > >> > As a summary, the motivations are: > >> > 1. Avoid server side message re-compression > >> > 2. Honor time-based log roll and retention > >> > 3. Enable offset search by timestamp at a finer granularity. > >> > > >> > Feedback and comments are welcome! > >> > > >> > Thanks, > >> > > >> > Jiangjie (Becket) Qin > >> > > >> > >> > >> > >> -- > >> -- Guozhang > >> > > > > > -- Thanks, Ewen