Ted, I still consider changing the KIP to include it right away -- if not, I'll create a JIRA. Need to think it through in more detail first.
(Same for other open questions like interface names -- I collect feedback and update the KIP after we reach consensus :)) -Matthias On 3/9/18 3:35 PM, Ted Yu wrote: > Thanks for the details, Matthias. > > bq. change the metadata protocol only if a future release, encoding both used > and supported version might be an advantage > > Looks like encoding both versions wouldn't be implemented in this KIP. > > Please consider logging a JIRA with the encoding proposal. > > Cheers > > On Fri, Mar 9, 2018 at 2:27 PM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> @Bill: I think a filter predicate should be part of user code. And even >> if we want to add something like this, I would prefer to do it in a >> separate KIP. >> >> >> @James: I would love to avoid a second rolling bounce. But from my >> understanding it would not be possible. >> >> The purpose of the second rolling bounce is indeed to switch from >> version 2 to 3. It also has a second purpose, to switch from the old >> store to the new store (this happens after the last instance bounces a >> second time). >> >> The problem with one round of rolling bounces is, that it's unclear when >> to which from version 2 to version 3. The StreamsPartitionsAssignor is >> stateless by design, and thus, the information which version it should >> use must be passed in from externally -- and we want to use the >> StreamsConfig to pass in this information. >> >> During upgrade, all new instanced have no information about the progress >> of the upgrade (ie, how many other instanced got upgrades already). >> Therefore, it's not safe for them to send a version 3 subscription. The >> leader also has this limited view on the world and can only send version >> 2 assignments back. >> >> Thus, for the 1.2 upgrade, I don't think we can simplify the upgrade. >> >> We did consider to change the metadata to make later upgrades (ie, from >> 1.2 to 1.x) simpler though (for the case we change the metadata or >> storage format again -- as long as we don't change it, a single rolling >> bounce is sufficient), by encoding "used version" and "supported >> version". This would allow the leader to switch to the new version >> earlier and without a second rebalance: leader would receive "used >> version == old" and "supported version = old/new" -- as long as at least >> one instance sends a "supported version = old" leader sends old version >> assignment back. However, encoding both version would allow that the >> leader can send a new version assignment back, right after the first >> round or rebalance finished (all instances send "supported version = >> new"). However, there are still two issues with this: >> >> 1) if we switch to the new format right after the last instance bounced, >> the new stores might not be ready to be used -- this could lead to >> "downtime" as store must be restored before processing can resume. >> >> 2) Assume an instance fails and is restarted again. At this point, the >> instance will still have "upgrade mode" enabled and thus sends the old >> protocol data. However, it would be desirable to never fall back to the >> old protocol after the switch to the new protocol. >> >> The second issue is minor and I guess if users set-up the instance >> properly it could be avoided. However, the first issue would prevent >> "zero downtime" upgrades. Having said this, if we consider that we might >> change the metadata protocol only if a future release, encoding both >> used and supported version might be an advantage in the future and we >> could consider to add this information in 1.2 release to prepare for this. >> >> Btw: monitoring the log, is also only required to give the instances >> enough time to prepare the stores in new format. If you would do the >> second rolling bounce before this, it would still work -- however, you >> might see app "downtime" as the new store must be fully restored before >> processing can resume. >> >> >> Does this make sense? >> >> >> -Matthias >> >> >> >> On 3/9/18 11:36 AM, James Cheng wrote: >>> Matthias, >>> >>> For all the upgrade paths, is it possible to get rid of the 2nd rolling >> bounce? >>> >>> For the in-place upgrade, it seems like primary difference between the >> 1st rolling bounce and the 2nd rolling bounce is to decide whether to send >> Subscription Version 2 or Subscription Version 3. (Actually, there is >> another difference mentioned in that the KIP says that the 2nd rolling >> bounce should happen after all new state stores are created by the >> background thread. However, within the 2nd rolling bounce, we say that >> there is still a background thread, so it seems like is no actual >> requirement to wait for the new state stores to be created.) >>> >>> The 2nd rolling bounce already knows how to deal with mixed-mode (having >> both Version 2 and Version 3 in the same consumer group). It seems like we >> could get rid of the 2nd bounce if we added logic (somehow/somewhere) such >> that: >>> * Instances send Subscription Version 2 until all instances are running >> the new code. >>> * Once all the instances are running the new code, then one at a time, >> the instances start sending Subscription V3. Leader still hands out >> Assignment Version 2, until all new state stores are ready. >>> * Once all instances report that new stores are ready, Leader sends out >> Assignment Version 3. >>> * Once an instance receives an Assignment Version 3, it can delete the >> old state store. >>> >>> Doing it that way seems like it would reduce a lot of >> operator/deployment overhead. No need to do 2 rolling restarts. No need to >> monitor logs for state store rebuild. You just deploy it, and the instances >> update themselves. >>> >>> What do you think? >>> >>> The thing that made me think of this is that the "2 rolling bounces" is >> similar to what Kafka brokers have to do changes in >> inter.broker.protocol.version and log.message.format.version. And in the >> broker case, it seems like it would be possible (with some work of course) >> to modify kafka to allow us to do similar auto-detection of broker >> capabilities and automatically do a switchover from old/new versions. >>> >>> -James >>> >>> >>>> On Mar 9, 2018, at 10:38 AM, Bill Bejeck <bbej...@gmail.com> wrote: >>>> >>>> Matthias, >>>> >>>> Thanks for the KIP, it's a +1 from me. >>>> >>>> I do have one question regarding the retrieval methods on the new >>>> interfaces. >>>> >>>> Would want to consider adding one method with a Predicate that would >> allow >>>> for filtering records by the timestamp stored with the record? Or is >> this >>>> better left for users to implement themselves once the data has been >>>> retrieved? >>>> >>>> Thanks, >>>> Bill >>>> >>>> On Thu, Mar 8, 2018 at 7:14 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> Matthias: >>>>> For my point #1, I don't have preference as to which separator is >> chosen. >>>>> Given the background you mentioned, current choice is good. >>>>> >>>>> For #2, I think my proposal is better since it is closer to English >>>>> grammar. >>>>> >>>>> Would be good to listen to what other people think. >>>>> >>>>> On Thu, Mar 8, 2018 at 4:02 PM, Matthias J. Sax <matth...@confluent.io >>> >>>>> wrote: >>>>> >>>>>> Thanks for the comments! >>>>>> >>>>>> @Guozhang: >>>>>> >>>>>> So far, there is one PR for the rebalance metadata upgrade fix >>>>>> (addressing the mentioned >>>>>> https://issues.apache.org/jira/browse/KAFKA-6054) It give a first >>>>>> impression how the metadata upgrade works including a system test: >>>>>> https://github.com/apache/kafka/pull/4636 >>>>>> >>>>>> I can share other PRs as soon as they are ready. I agree that the KIP >> is >>>>>> complex am I ok with putting out more code to give better discussion >>>>>> context. >>>>>> >>>>>> @Ted: >>>>>> >>>>>> I picked `_` instead of `-` to align with the `processing.guarantee` >>>>>> parameter that accepts `at_least_one` and `exactly_once` as values. >>>>>> Personally, I don't care about underscore vs dash but I prefer >>>>>> consistency. If you feel strong about it, we can also change it to >> `-`. >>>>>> >>>>>> About the interface name: I am fine either way -- I stripped the >> `With` >>>>>> to keep the name a little shorter. Would be good to get feedback from >>>>>> others and pick the name the majority prefers. >>>>>> >>>>>> @John: >>>>>> >>>>>> We can certainly change it. I agree that it would not make a >> difference. >>>>>> I'll dig into the code to see if any of the two version might >> introduce >>>>>> undesired complexity and update the KIP if I don't hit an issue with >>>>>> putting the `-v2` to the store directory instead of `rocksdb-v2` >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> >>>>>> On 3/8/18 2:44 PM, John Roesler wrote: >>>>>>> Hey Matthias, >>>>>>> >>>>>>> The KIP looks good to me. I had several questions queued up, but they >>>>>> were >>>>>>> all in the "rejected alternatives" section... oh, well. >>>>>>> >>>>>>> One very minor thought re changing the state directory from >>>>>> "/<state.dir>/< >>>>>>> application.id>/<task.id>/rocksdb/storeName/" to "/<state.dir>/< >>>>>>> application.id>/<task.id>/rocksdb-v2/storeName/": if you put the >> "v2" >>>>>>> marker on the storeName part of the path (i.e., "/<state.dir>/< >>>>>>> application.id>/<task.id>/rocksdb/storeName-v2/"), then you get the >>>>> same >>>>>>> benefits without altering the high-level directory structure. >>>>>>> >>>>>>> It may not matter, but I could imagine people running scripts to >>>>> monitor >>>>>>> rocksdb disk usage for each task, or other such use cases. >>>>>>> >>>>>>> Thanks, >>>>>>> -John >>>>>>> >>>>>>> On Thu, Mar 8, 2018 at 2:02 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>> >>>>>>>> Matthias: >>>>>>>> Nicely written KIP. >>>>>>>> >>>>>>>> "in_place" : can this be "in-place" ? Underscore may sometimes be >> miss >>>>>>>> typed (as '-'). I think using '-' is more friendly to user. >>>>>>>> >>>>>>>> public interface ReadOnlyKeyValueTimestampStore<K, V> { >>>>>>>> >>>>>>>> Is ReadOnlyKeyValueStoreWithTimestamp better name for the class ? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Thu, Mar 8, 2018 at 1:29 PM, Guozhang Wang <wangg...@gmail.com> >>>>>> wrote: >>>>>>>> >>>>>>>>> Hello Matthias, thanks for the KIP. >>>>>>>>> >>>>>>>>> I've read through the upgrade patch section and it looks good to >> me, >>>>> if >>>>>>>> you >>>>>>>>> already have a WIP PR for it could you also share it here so that >>>>>> people >>>>>>>>> can take a look? >>>>>>>>> >>>>>>>>> I'm +1 on the KIP itself. But large KIPs like this there are always >>>>>> some >>>>>>>>> devil hidden in the details, so I think it is better to have the >>>>>>>>> implementation in parallel along with the design discussion :) >>>>>>>>> >>>>>>>>> >>>>>>>>> Guozhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 7, 2018 at 2:12 PM, Matthias J. Sax < >>>>> matth...@confluent.io >>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I want to propose KIP-258 for the Streams API to allow storing >>>>>>>>>> timestamps in RocksDB. This feature is the basis to resolve >> multiple >>>>>>>>>> tickets (issues and feature requests). >>>>>>>>>> >>>>>>>>>> Looking forward to your comments about this! >>>>>>>>>> >>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>> 258%3A+Allow+to+Store+Record+Timestamps+in+RocksDB >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- Guozhang >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature