Side question: why run kafka on docker for aws? Is the docker config being used for configuration management? Are there more systems running on the instance other than kafka?
Sent by Outlook<http://taps.io/outlookmobile> for Android On Sun, Mar 1, 2015 at 1:10 PM -0800, "Ewen Cheslack-Postava" <e...@confluent.io<mailto:e...@confluent.io>> wrote: On Fri, Feb 27, 2015 at 8:09 PM, Jeff Schroeder <jeffschroe...@computer.org> wrote: > Kafka on dedicated hosts running in docker under marathon under Mesos. It > was a real bear to get working, but is really beautiful once I did manage > to get it working. I simply run with a unique hostname constraint and > number of instances = replication factor. If a broker dies and it isn't a > hardware or network issue, marathon restarts it. > > The hardest part was that Kafka was registering to ZK with the internal (to > docker) port. My workaround was that you have to use the same port inside > and outside docker or it will register to ZK with whatever the port is > inside the container. > You should be able to use advertised.host.name and advertised.port to control this, so you aren't required to use the same port inside and outside Docker. > > FYI this is an on premise dedicated Mesos cluster running on bare metal :) > > On Friday, February 27, 2015, James Cheng <jch...@tivo.com> wrote: > > > Hi, > > > > I know that Netflix might be talking about "Kafka on AWS" at the March > > meetup, but I wanted to bring up the topic anyway. > > > > I'm sure that some people are running Kafka in AWS. Is anyone running > > Kafka within docker in production? How does that work? > > > > For both of these, how do you persist data? If on AWS, do you use EBS? Do > > you use ephemeral storage and then rely on replication? And if using > > docker, do you persist data outside the docker container and on the host > > machine? > On AWS, your choice will depend on a tradeoff of tolerance for data loss, performance, and price sensitivity. You might be able to get better/more predictable performance out of the ephemeral instance storage, but since you are presumably running all instances in the same AZ you leave yourself open to significant data loss if there's a coordinated outage. It's pretty rare, but it does happen. With EBS you may have to do more work or spread across more volumes to get the same throughput. Relevant quote from the docs on provisioned IOPS: "Additionally, you can stripe multiple volumes together to achieve up to 48,000 IOPS or 800MBps when attached to larger EC2 instances". (Note MBps not Mbps.) Other considerations: AWS has been moving most of its instance storage to SSDs, so getting enough instance storage space can be relatively pricey, and you can also potentially go with a hybrid setup to get a balance of the two, but you'll need to be very careful about partition assignment then to ensure at least one copy of every partition ends up on an EBS-backed node. For Docker, you probably want the data to be stored on a volume. If possible, it would be better if non-hardware errors could be resolved just by restarting the broker. You'll avoid a lot of needless copying of data. Storing data in a volume would let you simply restart a new container and have it pick up where the last one left off. The example of Postgres given for a volume container in https://docs.docker.com/userguide/dockervolumes/ isn't too far from Kafka if you were to assume Postgres was replicating to a slave -- you'd prefer to reuse the existing data on the existing node (which a volume container enables), but could still handle bringing up a new node if necessary. > > > > And related, how do you deal with broker failure? Do you simply replace > > it, and repopulate a new broker via replication? Or do you bring back up > > the broker with the persisted files? > > > > Trying to learn about what people are doing, beyond "on premises and > > dedicated hardware". > > > > Thanks, > > -James > > > > > > -- > Text by Jeff, typos by iPhone > -- Thanks, Ewen