Heh, I guess in addition to my wall of text of questions, I should also say that I think this provides useful functionality and fixes issues that we've seen a bunch of questions and complaints about, so I'm in favor of a fix and this looks like a pretty good approach :)
It might also be useful to say which release you're hoping to get this into. -Ewen On Thu, Sep 3, 2015 at 9:35 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > A few questions: > > 1. If we update the producers to only support V1, doesn't that mean people > get stuck on the current version of the producer until they can be sure all > their consumers have been upgraded? Is that going to be a problem for > anyone, and does it potentially keep important fixes/enhancements (e.g. the > upcoming client-side network request timeouts) because they have to wait > for all the consumers to upgrade first? > > 2. Why minute granularity specifically? That's seems fairly specific to > LI's example workload (which, by the way, is a very useful reference point > to include in the KIP, thanks for that). If someone has the memory and can > support it, why not let them go all the way to per-record timestamps (by > making the interval configurable and they can set it really small/to 0)? It > seems to me that some people might be willing to pay that cost for the > application-side simplicity if they want to consume from specific > timestamps. With the proposal as is, seeking to a timestamp still requires > client-side logic to filter out messages that might have earlier timestamps > due to the granularity chosen. The obvious tradeoff here is yet another > config option -- I'm loath to add yet more configs, although this is one > that we can choose a reasonable default for (like 1 minute) and people can > easily change with no impact if they need to adjust it. > > 3. Why not respect the timestamp passed in as long as it's not some > sentinel value that indicates the broker should fill it in? When you do > know they will be ordered, as they should be with mirror maker (which is > specifically mentioned), this seems really useful. (More about this in > questions below...) > > 4. I think one obvious answer to (3) is that accepting client's timestamps > could break seeking-by-timestamp since they may not be properly ordered. > However, I think this can break anyway under normal operations -- any > imperfections in clock synchronization could result in older timestamps > being applied to new messages on a new leader broker compared messages > stored previously by the old leader. I think it's probably too common to > just think that NTP will take care of it and then run into weird bugs > because NTP isn't always perfect, sometimes does some unexpected things, > and, of course, does what you tell it to and so is subject to user error. > > It would be reasonable to argue that the leader change issue is much less > likely of an issue than if you respect timestamps from producers, where, if > applications actually filled it in, you'd receive a jumbled mess of > timestamps and trying to do the binary search in the index wouldn't > necessarily give you correct results. However, a) we could allow clients to > fill that info in but discourage it where it might cause issues (i.e. > normal applications) and it seems like a significant win for mirrormaker. > > I actually think not accepting timestamps is probably the better choice > but a) it seems awkward in the protocol spec because we have to include the > field in the produce requests since we don't want to have to fully decode > them on the broker and b) losing that info during mirroring seems like it > breaks the goal of fixing log retention (at least for mirrors) as well as > the goal of improving searching by timestamp (at least for mirrors). > > 5. You never actually specified the granularity (or format, but I assume > unix epoch) of the timestamp. Milliseconds? Microseconds? Nanoseconds? This > definitely needs to eventually make it into the protocol docs. > > 6. Re: the rejected alternative. Are there any other options in changing > the config format that might make it a bit lighter weight? For example, do > we need a full int64? Could we do something relative instead that wouldn't > require as many bytes? Without (5) being specified, it's actually difficult > to evaluate some of these options. > > 7. Any plan to expose this in the client APIs? This is related to (4). If > they are not exposed anywhere in the API as being associated with messages, > then we can reasonably treat them as an internal implementation detail and > be very clear about what looking up an offset for a timestamp means. If > they are exposed, then they're more like message metadata that applications > are probably going to want preserved, and thus will want the broker to > respect the timestamp. For example, if I saw that consumers could get the > timestamp, I'd want to be able to assign it at the producer so that even if > I have significant batching I still get an accurate timestamp -- basically > I might replace some cases where I use a timestamp embedded in the message > with the timestamp provided by Kafka. (That said, we should consider the > impact that including it in the protocol can have; non-Java clients are > likely to expose it if it is available, whether it's actually a good idea > to or not.) > > -Ewen > > > On Thu, Sep 3, 2015 at 3:47 PM, Jiangjie Qin <j...@linkedin.com.invalid> > wrote: > >> Hi Guozhang, >> >> I checked the code again. Actually CRC check probably won't fail. The >> newly >> added timestamp field might be treated as keyLength instead, so we are >> likely to receive an IllegalArgumentException when try to read the key. >> I'll update the KIP. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> On Thu, Sep 3, 2015 at 12:48 PM, Jiangjie Qin <j...@linkedin.com> wrote: >> >> > Hi, Guozhang, >> > >> > Thanks for reading the KIP. By "old consumer", I meant the >> > ZookeeperConsumerConnector in trunk now, i.e. without this bug fixed. >> If we >> > fix the ZookeeperConsumerConnector then it will throw exception >> complaining >> > about the unsupported version when it sees message format V1. What I was >> > trying to say is that if we have some ZookeeperConsumerConnector running >> > without the fix, the consumer will complain about CRC mismatch instead >> of >> > unsupported version. >> > >> > Thanks, >> > >> > Jiangjie (Becket) Qin >> > >> > On Thu, Sep 3, 2015 at 12:15 PM, Guozhang Wang <wangg...@gmail.com> >> wrote: >> > >> >> Thanks for the write-up Jiangjie. >> >> >> >> One comment about migration plan: "For old consumers, if they see the >> new >> >> protocol the CRC check will fail".. >> >> >> >> Do you mean this bug in the old consumer cannot be fixed in a >> >> backward-compatible way? >> >> >> >> Guozhang >> >> >> >> >> >> On Thu, Sep 3, 2015 at 8:35 AM, Jiangjie Qin <j...@linkedin.com.invalid >> > >> >> wrote: >> >> >> >> > Hi, >> >> > >> >> > We just created KIP-31 to propose a message format change in Kafka. >> >> > >> >> > >> >> > >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-31+-+Message+format+change+proposal >> >> > >> >> > As a summary, the motivations are: >> >> > 1. Avoid server side message re-compression >> >> > 2. Honor time-based log roll and retention >> >> > 3. Enable offset search by timestamp at a finer granularity. >> >> > >> >> > Feedback and comments are welcome! >> >> > >> >> > Thanks, >> >> > >> >> > Jiangjie (Becket) Qin >> >> > >> >> >> >> >> >> >> >> -- >> >> -- Guozhang >> >> >> > >> > >> > > > > -- > Thanks, > Ewen > -- Thanks, Ewen