That's a good summary of the situation in my mind. Definitely +1 to highlighting each of these specifically in the KIP
On Fri, Apr 23, 2021 at 6:19 PM Ismael Juma <ism...@juma.me.uk> wrote: > Thanks for the comments. Is the following fair? > > 1. It's good that upgrades are not affected under normal circumstances and > it mitigates the compatibility impact greatly. > 2. We're saying that the previous behavior of defaulting to 1 is dangerous > and we don't want to fallback to that. > 3. The main motivation for the previous default was to make the quickstart > easy and we can achieve that with the new default. > 4. This change raises the minimum Kafka version required by Kafka Streams > with the default configuration. We should highlight that and update the > docs accordingly. > 5. Kafka 2.3.x clusters are not that uncommon, but we're ok with users > having to set this config explicitly when deploying new applications in > such clusters. > > If so, I'm fine with the KIP, but I think we should update it to cover the > points above (some may already be covered). > > Ismael > > On Tue, Apr 20, 2021 at 12:59 PM Sophie Blee-Goldman > <sop...@confluent.io.invalid> wrote: > > > FWIW if someone is deploying a new Streams application to production and > > they don't > > set this config, either because they are not reading the docs or they > just > > forgot, then imo > > it's actually better for them to hit this exception and fail-fast if the > > brokers don't support it. > > > > Currently, the default value is 1, which means if they forget to set this > > config then they're > > running a fragile application in production with potential for data loss. > > Setting it to -1 will > > mean in most cases the appropriate value for that environment is chosen. > > And in the subset > > of applications where the user has forgotten to set this config and are > > running against older > > brokers, better to fail-fast and alert the user to set this config then > to > > silently downgrade and > > pick a value that might be completely wrong for that environment. > > > > I don't necessarily agree that 2.3 is so old that we can assume most > > clusters will understand > > this value, my experience has very much been otherwise. However I do > think > > it's likely to be > > quite rare for this to impact any users on upgrade. For that to happen, > > they would need to be > > on older brokers and running an application with the default replication > > value in their production > > environment, AND they would need to have deleted their internal topics > for > > some reason prior > > to the upgrade. Even if all of those things are true, then (1) if the > > internal topics were deleted > > then they have essentially reset their app, which means this case is > really > > more like starting up > > a new app than upgrading a live one, and (2) they probably do not mean to > > run with this setting > > in production in the first place, and they have an opportunity to fix it > > when they recreate the topics > > > > Just my 2 cents > > > > > > On Mon, Apr 19, 2021 at 8:54 PM Matthias J. Sax <mj...@apache.org> > wrote: > > > > > Thanks Ismael, > > > > > > just to clarify, any app that is running and is regularly upgraded is > > > not affected: As pointed out in the KIP, if there is a running app that > > > does not overwrite the default, and the app is upgraded, it won't be > > > affected because the repartition/changelog topics already exist. > > > > > > Second, as the default value is currently 1, for production application > > > I would assume that most people change the default anyway and thus > won't > > > be affected either. Thus, if one deploys a new production app, or > resets > > > a production app with reasonable config overwrites, they won't be > > affected. > > > > > > Hence, the impact on production deployment should basically be > > > non-existent, and for those rare cases were user would be affected it > > > should be just a small hick-up because they would either deploy a new > > > application (maybe annoying but no harm done; and actually, they should > > > detect in staging) or would have reset an existing app for data > > > reprocessing, ie, they did some manual cleanup and might hit this > corner > > > case only when re-deploying an previously stopped app, ie, an app that > > > is currently offline anyway. Again, no harm done, just some delay to > > > redeploy the app. > > > > > > Also, we point the change out in the upgrade notes. Obviously, not > > > everybody might read them, but in the end there are multiple guards and > > > if a user really has issues, it sounds like a very rare corner case. > > > > > > Last, while it might be possible to do so, the question is really, > which > > > value should we pick? > > > > > > Thus, overall I personally don't see any need to cover this case > > > automatically. It might not be too hard to write code and handle this > > > case, but I would rather keep it simple as it should really not impact > > > users in any critical way. > > > > > > > > > Thoughts? > > > > > > > > > -Matthias > > > > > > On 4/18/21 8:00 PM, Ismael Juma wrote: > > > > Is the following accurate? Do we have data that not many users would > be > > > > affected? > > > > > > > > "We also believe that 2.3.0 broker a sufficiently old such that not > too > > > > many user may be affected" > > > > > > > > I wonder if we should fallback to an actual value if the brokers are > > > 2.3.x > > > > or older. > > > > > > > > Ismael > > > > > > > > > >