Jay, I totally agree with paying more attention to compatibility across versions. Incompatibility is indeed a big cause of customers' woes. Human checks, stringent reviews, will help, but I think having compatibility tests will be more effective. +INT_MAX for compatibility tests.
- Ashish On Friday, January 9, 2015, Jay Kreps <j...@confluent.io> wrote: > Hey guys, > > We had a bit of a compatibility slip-up in 0.8.2 with the offset commit > stuff. We caught this one before the final release so it's not too bad. But > I do think it kind of points to an area we could do better. > > One piece of feedback we have gotten from going out and talking to users is > that compatibility is really, really important to them. Kafka is getting > deployed in big environments where the clients are embedded in lots of > applications and any kind of incompatibility is a huge pain for people > using it and generally makes upgrade difficult or impossible. > > In practice what I think this means for development is a lot more pressure > to really think about the public interfaces we are making and try our best > to get them right. This can be hard sometimes as changes come in patches > and it is hard to follow every single rb with enough diligence to know. > > Compatibility really means a couple things: > 1. Protocol changes > 2. Binary data format changes > 3. Changes in public apis in the clients > 4. Configs > 5. Metric names > 6. Command line tools > > I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are pretty > important but not critical. > > One thing this implies is that we are really going to have to do a good job > of thinking about apis and use cases. You can definitely see a number of > places in the old clients and in a couple of the protocols where enough > care was not given to thinking things through. Some of those were from long > long ago, but we should really try to avoid adding to that set because > increasingly we will have to carry around these mistakes for a long time. > > Here are a few things I thought we could do that might help us get better > in this area: > > 1. Technically we are just in a really bad place with the protocol because > it is defined twice--once in the old scala request objects, and once in the > new protocol format for the clients. This makes changes massively painful. > The good news is that the new request definition DSL was intended to make > adding new protocol versions a lot easier and clearer. It will also make it > a lot more obvious when the protocol is changed since you will be checking > in or reviewing a change to Protocol.java. Getting the server moved over to > the new request objects and protocol definition will be a bit of a slog but > it will really help here I think. > > 2. We need to get some testing in place on cross-version compatibility. > This is work and no tests here will be perfect, but I suspect with some > effort we could catch a lot of things. > > 3. I was also thinking it might be worth it to get a little bit more formal > about the review and discussion process for things which will have impact > to these public areas to ensure we end up with something we are happy with. > Python has a PIP process (https://www.python.org/dev/peps/pep-0257/) by > which major changes are made, and it might be worth it for us to do a > similar thing. We have essentially been doing this already--major changes > almost always have an associated wiki, but I think just getting a little > more rigorous might be good. The idea would be to just call out these wikis > as official proposals and do a full Apache discuss/vote thread for these > important change. We would use these for big features (security, log > compaction, etc) as well as for small changes that introduce or change a > public api/config/etc. This is a little heavier weight, but I think it is > really just critical that we get these things right and this would be a way > to call out this kind of change so that everyone would take the time to > look at them. > > Thoughts? > > -Jay > -- Regards, Ashish