The idea is more about isolating the intra-cluster traffic from the normal clients as much as possible. There's a couple situations we've seen where this would be useful that I can think of immediately:
1) Normal operation - just having the intra-cluster traffic on a separate network interface would allow it to not get overwhelmed by something like a bootstrapping client who is saturating the network interface. We see this fairly often, where the replication falls behind because of heavy traffic from one application. We can always adjust the network threads, but segregating the traffic is the first step. 2) Isolation in case of an error - We have had situations, more than once, where we are needing to rebuild a cluster after a catastrophic problem and the clients are causing that process to take too long, or are causing additional failures. This has mostly come into play with file descriptor limits in the past, but it's certainly not the only situation. Constantly reconnecting clients continue to cause the brokers to fall over while we are trying to recover a down cluster. The only solution was to firewall off all the clients temporarily. This is a great deal more complicated if the brokers and the clients are all operating over the same port. Now, that said, quotas can be a partial solution to this. I don't want to jump the gun on that discussion (because it's going to come up separately and in more detail), but it is possible to structure quotas in a way that will allow the intra-cluster replication to continue to function in the case of high load. That would partially address case 1, but it does nothing for case 2. Additionally, I think it is also desirable to segregate the traffic even with quotas, so that regardless of the client load, the cluster itself is able to be healthy. -Todd On Thu, Feb 12, 2015 at 11:38 AM, Jun Rao <j...@confluent.io> wrote: > Todd, > > Could you elaborate on the benefit for having a separate endpoint for > intra-cluster communication? Is it mainly for giving intra-cluster requests > a high priority? At this moment, having a separate endpoint just means that > the socket connection for the intra-cluster communication is handled by a > separate acceptor thread. The processing of the requests from the network > and the handling of the requests are still shared by a single thread pool. > So, if any of the thread pool is exhausted, the intra-cluster requests will > still be delayed. We can potentially change this model, but this requires > more work. > > An alternative is to just rely on quotas. Intra-cluster requests will be > exempt from any kind of throttling. > > Gwen, > > I agree that defaulting wire.protocol.version to the current version is > probably better. It just means that we need to document the migration path > for previous versions. > > Thanks, > > Jun > > > On Wed, Feb 11, 2015 at 6:33 PM, Todd Palino <tpal...@gmail.com> wrote: > > > Thanks, Gwen. This looks good to me as far as the wire protocol > versioning > > goes. I agree with you on defaulting to the new wire protocol version for > > new installs. I think it will also need to be very clear (to general > > installer of Kafka, and not just developers) in documentation when the > wire > > protocol version changes moving forwards, and what the risk/benefit of > > changing to the new version is. > > > > Since a rolling upgrade of the intra-cluster protocol is supported, will > a > > rolling downgrade work as well? Should a flaw (bug, security, or > otherwise) > > be discovered after upgrade, is it possible to change the > > wire.protocol.version > > back to 0.8.2 and do a rolling bounce? > > > > On the host/port/protocol specification, specifically the ZK config > format, > > is it possible to have an un-advertised endpoint? I would see this as > > potentially useful if you wanted to have an endpoint that you are > reserving > > for intra-cluster communication, and you would prefer to not have it > > advertised at all. Perhaps it is blocked by a firewall rule or other > > authentication method. This could also allow you to duplicate a security > > protocol type but segregate it on a different port or interface (if it is > > unadvertised, there is no ambiguity to the clients as to which endpoint > > should be selected). I believe I asked about that previously, and I > didn't > > track what the final outcome was or even if it was discussed further. > > > > > > -Todd > > > > > > On Wed, Feb 11, 2015 at 4:38 PM, Gwen Shapira <gshap...@cloudera.com> > > wrote: > > > > > Added Jun's notes to the KIP (Thanks for explaining so clearly, Jun. I > > was > > > clearly struggling with this...) and removed the reference to > > > use.new.wire.protocol. > > > > > > On Wed, Feb 11, 2015 at 4:19 PM, Joel Koshy <jjkosh...@gmail.com> > wrote: > > > > > > > The description that Jun gave for (2) was the detail I was looking > for > > > > - Gwen can you update the KIP with that for completeness/clarity? > > > > > > > > I'm +1 as well overall. However, I think it would be good if we also > > > > get an ack from someone who is more experienced on the operations > side > > > > (say, Todd) to review especially the upgrade plan. > > > > > > > > On Wed, Feb 11, 2015 at 09:40:50AM -0800, Jun Rao wrote: > > > > > +1 for proposed changes in 1 and 2. > > > > > > > > > > 1. The impact is that if someone uses SimpleConsumer and references > > > > Broker > > > > > explicitly, the application needs code change to compile with > 0.8.3. > > > > Since > > > > > SimpleConsumer is not widely used, breaking the API in > SimpleConsumer > > > but > > > > > maintaining overall code cleanness seems to be a better tradeoff. > > > > > > > > > > 2. For clarification, the issue is the following. In 0.8.3, we will > > be > > > > > evolving the wire protocol of UpdateMedataRequest (to send info > about > > > > > endpoints for different security protocols). Since this is used in > > > > > intra-cluster communication, we need to do the upgrade in two > steps. > > > The > > > > > idea is that in 0.8.3, we will default wire.protocol.version to > > 0.8.2. > > > > When > > > > > upgrading to 0.8.3, in step 1, we do a rolling upgrade to 0.8.3. > > After > > > > step > > > > > 1, all brokers will be capable for processing the new protocol in > > > 0.8.3, > > > > > but without actually using it. In step 2, we > > > > > configure wire.protocol.version to 0.8.3 in each broker and do > > another > > > > > rolling restart. After step 2, all brokers will start using the new > > > > > protocol in 0.8.3. Let's say that in the next release 0.9, we are > > > > changing > > > > > the intra-cluster wire protocol again. We will do the similar > thing: > > > > > defaulting wire.protocol.version to 0.8.3 in 0.9 so that people can > > > > upgrade > > > > > from 0.8.3 to 0.9 in two steps. For people who want to upgrade from > > > 0.8.2 > > > > > to 0.9 directly, they will have to configure wire.protocol.version > to > > > > 0.8.2 > > > > > first and then do the two-step upgrade to 0.9. > > > > > > > > > > Gwen, > > > > > > > > > > In KIP2, there is still a reference to use.new.protocol. This needs > > to > > > be > > > > > removed. Also, would it be better to use > > > > intra.cluster.wire.protocol.version > > > > > since this only applies to the wire protocol among brokers? > > > > > > > > > > Others, > > > > > > > > > > The patch in KAFKA-1809 is almost ready. It would be good to wrap > up > > > the > > > > > discussion on KIP2 soon. So, if you haven't looked at this KIP, > > please > > > > take > > > > > a look and send your comments. > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > > > > > > On Mon, Jan 26, 2015 at 8:02 PM, Gwen Shapira < > gshap...@cloudera.com > > > > > > > wrote: > > > > > > > > > > > Hi Kafka Devs, > > > > > > > > > > > > While reviewing the patch for KAFKA-1809, we came across two > > > questions > > > > > > that we are interested in hearing the community out on. > > > > > > > > > > > > 1. This patch changes the Broker class and adds a new class > > > > > > BrokerEndPoint that behaves like the previous broker. > > > > > > > > > > > > While technically kafka.cluster.Broker is not part of the public > > API, > > > > > > it is returned by javaapi, used with the SimpleConsumer. > > > > > > > > > > > > Getting replicas from PartitionMetadata will now return > > > BrokerEndPoint > > > > > > instead of Broker. All method calls remain the same, but since we > > > > > > return a new type, we break the API. > > > > > > > > > > > > Note that this breakage does not prevent upgrades - existing > > > > > > SimpleConsumers will continue working (because we are > > > > > > wire-compatible). > > > > > > The only thing that won't work is building SimpleConsumers with > > > > > > dependency on Kafka versions higher than 0.8.2. Arguably, we > don't > > > > > > want anyone to do it anyway :) > > > > > > > > > > > > So: > > > > > > Do we state that the highest release on which SimpleConsumers can > > > > > > depend is 0.8.2? Or shall we keep Broker as is and create an > > > > > > UberBroker which will contain multiple brokers as its endpoints? > > > > > > > > > > > > 2. > > > > > > The KIP suggests "use.new.wire.protocol" configuration to decide > > > which > > > > > > protocols the brokers will use to talk to each other. The problem > > is > > > > > > that after the next upgrade, the wire protocol is no longer new, > so > > > > > > we'll have to reset it to false for the following upgrade, then > > > change > > > > > > to true again... and upgrading more than a single version will be > > > > > > impossible. > > > > > > Bad idea :) > > > > > > > > > > > > As an alternative, we can have a property for each version and > set > > > one > > > > > > of them to true. Or (simple, I think) have > "wire.protocol.version" > > > > > > property and accept version numbers (0.8.2, 0.8.3, 0.9) as > values. > > > > > > > > > > > > Please share your thoughts :) > > > > > > > > > > > > Gwen > > > > > > > > > > > > > > > > > > > >