I'll be happy to give the initial design a go, but will probably only get to it after Strata.
So either wait a bit (there are enough KIPs to review ;) or someone else can get started. Gwen On Thu, Feb 12, 2015 at 6:55 PM, Joel Koshy <jjkosh...@gmail.com> wrote: > +1 on investigating it further as a separate feature that will improve > ops significantly (especially since an expert on the operations side > has described use cases from actual experience). > > On Thu, Feb 12, 2015 at 05:47:50PM -0800, Gwen Shapira wrote: > > I REALLY like the idea of supporting separate network for inter-broker > > communication (and probably Zookeeper too). > > I think its actually a pretty typical configuration in clusters, so I'm > > surprised we didn't think of it before :) > > Servers arrive with multiple cards specifically for "admin nic" vs. > > "clients nic" vs "storage nic". > > > > That said, I'd like to handle it in a separate patch. First because > > KAFKA-1809 is big enough already, and second because this really deserve > > its own requirement-gathering and design. > > > > Does that make sense? > > > > Gwen > > > > > > > > On Thu, Feb 12, 2015 at 12:34 PM, Todd Palino <tpal...@gmail.com> wrote: > > > > > The idea is more about isolating the intra-cluster traffic from the > normal > > > clients as much as possible. There's a couple situations we've seen > where > > > this would be useful that I can think of immediately: > > > > > > 1) Normal operation - just having the intra-cluster traffic on a > separate > > > network interface would allow it to not get overwhelmed by something > like a > > > bootstrapping client who is saturating the network interface. We see > this > > > fairly often, where the replication falls behind because of heavy > traffic > > > from one application. We can always adjust the network threads, but > > > segregating the traffic is the first step. > > > > > > 2) Isolation in case of an error - We have had situations, more than > once, > > > where we are needing to rebuild a cluster after a catastrophic problem > and > > > the clients are causing that process to take too long, or are causing > > > additional failures. This has mostly come into play with file > descriptor > > > limits in the past, but it's certainly not the only situation. > Constantly > > > reconnecting clients continue to cause the brokers to fall over while > we > > > are trying to recover a down cluster. The only solution was to > firewall off > > > all the clients temporarily. This is a great deal more complicated if > the > > > brokers and the clients are all operating over the same port. > > > > > > Now, that said, quotas can be a partial solution to this. I don't want > to > > > jump the gun on that discussion (because it's going to come up > separately > > > and in more detail), but it is possible to structure quotas in a way > that > > > will allow the intra-cluster replication to continue to function in the > > > case of high load. That would partially address case 1, but it does > nothing > > > for case 2. Additionally, I think it is also desirable to segregate the > > > traffic even with quotas, so that regardless of the client load, the > > > cluster itself is able to be healthy. > > > > > > -Todd > > > > > > > > > On Thu, Feb 12, 2015 at 11:38 AM, Jun Rao <j...@confluent.io> wrote: > > > > > > > Todd, > > > > > > > > Could you elaborate on the benefit for having a separate endpoint for > > > > intra-cluster communication? Is it mainly for giving intra-cluster > > > requests > > > > a high priority? At this moment, having a separate endpoint just > means > > > that > > > > the socket connection for the intra-cluster communication is handled > by a > > > > separate acceptor thread. The processing of the requests from the > network > > > > and the handling of the requests are still shared by a single thread > > > pool. > > > > So, if any of the thread pool is exhausted, the intra-cluster > requests > > > will > > > > still be delayed. We can potentially change this model, but this > requires > > > > more work. > > > > > > > > An alternative is to just rely on quotas. Intra-cluster requests > will be > > > > exempt from any kind of throttling. > > > > > > > > Gwen, > > > > > > > > I agree that defaulting wire.protocol.version to the current version > is > > > > probably better. It just means that we need to document the migration > > > path > > > > for previous versions. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > On Wed, Feb 11, 2015 at 6:33 PM, Todd Palino <tpal...@gmail.com> > wrote: > > > > > > > > > Thanks, Gwen. This looks good to me as far as the wire protocol > > > > versioning > > > > > goes. I agree with you on defaulting to the new wire protocol > version > > > for > > > > > new installs. I think it will also need to be very clear (to > general > > > > > installer of Kafka, and not just developers) in documentation when > the > > > > wire > > > > > protocol version changes moving forwards, and what the > risk/benefit of > > > > > changing to the new version is. > > > > > > > > > > Since a rolling upgrade of the intra-cluster protocol is supported, > > > will > > > > a > > > > > rolling downgrade work as well? Should a flaw (bug, security, or > > > > otherwise) > > > > > be discovered after upgrade, is it possible to change the > > > > > wire.protocol.version > > > > > back to 0.8.2 and do a rolling bounce? > > > > > > > > > > On the host/port/protocol specification, specifically the ZK config > > > > format, > > > > > is it possible to have an un-advertised endpoint? I would see this > as > > > > > potentially useful if you wanted to have an endpoint that you are > > > > reserving > > > > > for intra-cluster communication, and you would prefer to not have > it > > > > > advertised at all. Perhaps it is blocked by a firewall rule or > other > > > > > authentication method. This could also allow you to duplicate a > > > security > > > > > protocol type but segregate it on a different port or interface > (if it > > > is > > > > > unadvertised, there is no ambiguity to the clients as to which > endpoint > > > > > should be selected). I believe I asked about that previously, and I > > > > didn't > > > > > track what the final outcome was or even if it was discussed > further. > > > > > > > > > > > > > > > -Todd > > > > > > > > > > > > > > > On Wed, Feb 11, 2015 at 4:38 PM, Gwen Shapira < > gshap...@cloudera.com> > > > > > wrote: > > > > > > > > > > > Added Jun's notes to the KIP (Thanks for explaining so clearly, > Jun. > > > I > > > > > was > > > > > > clearly struggling with this...) and removed the reference to > > > > > > use.new.wire.protocol. > > > > > > > > > > > > On Wed, Feb 11, 2015 at 4:19 PM, Joel Koshy <jjkosh...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > The description that Jun gave for (2) was the detail I was > looking > > > > for > > > > > > > - Gwen can you update the KIP with that for > completeness/clarity? > > > > > > > > > > > > > > I'm +1 as well overall. However, I think it would be good if we > > > also > > > > > > > get an ack from someone who is more experienced on the > operations > > > > side > > > > > > > (say, Todd) to review especially the upgrade plan. > > > > > > > > > > > > > > On Wed, Feb 11, 2015 at 09:40:50AM -0800, Jun Rao wrote: > > > > > > > > +1 for proposed changes in 1 and 2. > > > > > > > > > > > > > > > > 1. The impact is that if someone uses SimpleConsumer and > > > references > > > > > > > Broker > > > > > > > > explicitly, the application needs code change to compile with > > > > 0.8.3. > > > > > > > Since > > > > > > > > SimpleConsumer is not widely used, breaking the API in > > > > SimpleConsumer > > > > > > but > > > > > > > > maintaining overall code cleanness seems to be a better > tradeoff. > > > > > > > > > > > > > > > > 2. For clarification, the issue is the following. In 0.8.3, > we > > > will > > > > > be > > > > > > > > evolving the wire protocol of UpdateMedataRequest (to send > info > > > > about > > > > > > > > endpoints for different security protocols). Since this is > used > > > in > > > > > > > > intra-cluster communication, we need to do the upgrade in two > > > > steps. > > > > > > The > > > > > > > > idea is that in 0.8.3, we will default wire.protocol.version > to > > > > > 0.8.2. > > > > > > > When > > > > > > > > upgrading to 0.8.3, in step 1, we do a rolling upgrade to > 0.8.3. > > > > > After > > > > > > > step > > > > > > > > 1, all brokers will be capable for processing the new > protocol in > > > > > > 0.8.3, > > > > > > > > but without actually using it. In step 2, we > > > > > > > > configure wire.protocol.version to 0.8.3 in each broker and > do > > > > > another > > > > > > > > rolling restart. After step 2, all brokers will start using > the > > > new > > > > > > > > protocol in 0.8.3. Let's say that in the next release 0.9, > we are > > > > > > > changing > > > > > > > > the intra-cluster wire protocol again. We will do the similar > > > > thing: > > > > > > > > defaulting wire.protocol.version to 0.8.3 in 0.9 so that > people > > > can > > > > > > > upgrade > > > > > > > > from 0.8.3 to 0.9 in two steps. For people who want to > upgrade > > > from > > > > > > 0.8.2 > > > > > > > > to 0.9 directly, they will have to configure > > > wire.protocol.version > > > > to > > > > > > > 0.8.2 > > > > > > > > first and then do the two-step upgrade to 0.9. > > > > > > > > > > > > > > > > Gwen, > > > > > > > > > > > > > > > > In KIP2, there is still a reference to use.new.protocol. This > > > needs > > > > > to > > > > > > be > > > > > > > > removed. Also, would it be better to use > > > > > > > intra.cluster.wire.protocol.version > > > > > > > > since this only applies to the wire protocol among brokers? > > > > > > > > > > > > > > > > Others, > > > > > > > > > > > > > > > > The patch in KAFKA-1809 is almost ready. It would be good to > wrap > > > > up > > > > > > the > > > > > > > > discussion on KIP2 soon. So, if you haven't looked at this > KIP, > > > > > please > > > > > > > take > > > > > > > > a look and send your comments. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jan 26, 2015 at 8:02 PM, Gwen Shapira < > > > > gshap...@cloudera.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Kafka Devs, > > > > > > > > > > > > > > > > > > While reviewing the patch for KAFKA-1809, we came across > two > > > > > > questions > > > > > > > > > that we are interested in hearing the community out on. > > > > > > > > > > > > > > > > > > 1. This patch changes the Broker class and adds a new class > > > > > > > > > BrokerEndPoint that behaves like the previous broker. > > > > > > > > > > > > > > > > > > While technically kafka.cluster.Broker is not part of the > > > public > > > > > API, > > > > > > > > > it is returned by javaapi, used with the SimpleConsumer. > > > > > > > > > > > > > > > > > > Getting replicas from PartitionMetadata will now return > > > > > > BrokerEndPoint > > > > > > > > > instead of Broker. All method calls remain the same, but > since > > > we > > > > > > > > > return a new type, we break the API. > > > > > > > > > > > > > > > > > > Note that this breakage does not prevent upgrades - > existing > > > > > > > > > SimpleConsumers will continue working (because we are > > > > > > > > > wire-compatible). > > > > > > > > > The only thing that won't work is building SimpleConsumers > with > > > > > > > > > dependency on Kafka versions higher than 0.8.2. Arguably, > we > > > > don't > > > > > > > > > want anyone to do it anyway :) > > > > > > > > > > > > > > > > > > So: > > > > > > > > > Do we state that the highest release on which > SimpleConsumers > > > can > > > > > > > > > depend is 0.8.2? Or shall we keep Broker as is and create > an > > > > > > > > > UberBroker which will contain multiple brokers as its > > > endpoints? > > > > > > > > > > > > > > > > > > 2. > > > > > > > > > The KIP suggests "use.new.wire.protocol" configuration to > > > decide > > > > > > which > > > > > > > > > protocols the brokers will use to talk to each other. The > > > problem > > > > > is > > > > > > > > > that after the next upgrade, the wire protocol is no longer > > > new, > > > > so > > > > > > > > > we'll have to reset it to false for the following upgrade, > then > > > > > > change > > > > > > > > > to true again... and upgrading more than a single version > will > > > be > > > > > > > > > impossible. > > > > > > > > > Bad idea :) > > > > > > > > > > > > > > > > > > As an alternative, we can have a property for each version > and > > > > set > > > > > > one > > > > > > > > > of them to true. Or (simple, I think) have > > > > "wire.protocol.version" > > > > > > > > > property and accept version numbers (0.8.2, 0.8.3, 0.9) as > > > > values. > > > > > > > > > > > > > > > > > > Please share your thoughts :) > > > > > > > > > > > > > > > > > > Gwen > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >