In production, you probably want to avoid stacking up the applications like this. There’s a number of reasons: 1) Kafka’s performance is significantly increased by other applications not polluting the OS page cache 2) Zookeeper has specific performance requirements - among them are a dedicated disk for transaction logs that it can sequentially write to 3) Mirror maker chews up a lot of CPU and memory with decompression and recompression of messages
Particular sizing of your systems is going to be dependent on the amount of data you are moving around, but at the very least I would recommend that your Kafka brokers, Zookeeper ensemble, and mirror makers be on separate systems (stacking up the mirror makers on a common system is fine, however). The Kafka brokers will need CPU and memory, and of course decent storage to meet your retention and performance requirements. ZK needs a bit of memory, and very good disk for the transaction logs, but it’s CPU requirements are pretty light. Mirror maker needs CPU and memory, but it has no real need of disk performance at all. Sizing the brokers, you can probably get away with 3 or 4 GB of heap (this is based on my experience running really large clusters at LinkedIn - even at that heap size we were good for a long time), using G1 garbage collection. The guidelines in the Kafka documentation for this are the ones that I have developed over the last few years here. Reserve the rest of the memory for the OS to manage - buffers and cache is your friend. -Todd On Mon, Aug 7, 2017 at 11:06 AM, Gabriel Machado <gmachado....@gmail.com> wrote: > Thanks Todd, i will set swapiness to 1. > > Theses machines will be the future production cluster for our main > datacenter . We have 2 remote datacenters. > Kafka will bufferize logs and elasticsearch will index its. > > Is it a bad practice to have all these JVMs on the same virtual machine ? > What do you recommend (number of machines, quantity of GB, CPU...) ? For > the moment, each node has 4 vcpu. > > Gabriel. > > 2017-08-07 15:45 GMT+02:00 Todd Palino <tpal...@gmail.com>: > > > To avoid swap you should set swappiness to 1, not 0. 1 is a request > (don't > > swap if avoidable) whereas 0 is a demand (processes will be killed as OOM > > instead of swapping. > > > > However, I'm wondering why you are running such large heaps. Most of the > ZK > > heap is used for storage of the data in memory, and it's obvious from > your > > setup that this is a development instance. So if ZK is only being used > for > > that Kafka cluster you are testing, you can go with a smaller heap. > > > > Also, for what reason are you running a 12 GB heap for Kafka? Even our > > largest production clusters at LinkedIn are using a heap size of 6 GB > right > > now. You want to leave memory open for the OS to use for buffers and > cache > > in order to get better performance from consumers. You can see from that > > output that it's trying to. > > > > It really looks like you're just overloading your system. In which case > > swapping is to be expected. > > > > -Todd > > > > > > > > On Aug 7, 2017 8:34 AM, "Gabriel Machado" <gmachado....@gmail.com> > wrote: > > > > Hi, > > > > I have a 3 nodes cluster with 18 GB RAM and 2 GB swap. > > Each node have the following JVMs (Xms=Xmx) : > > - Zookeeper 2GB > > - Kafka 12 GB > > - Kafka mirror-maker DCa 1 GB > > - Kafka mirror-maker DCb 1 GB > > > > All th JVMs consume 16 GB. It leaves 2 GB for the OS (debian jessie 64 > > bits). > > Why i have no swap free on these virtual machines ? > > > > #free -m > > total used free shared buffers cached > > Mem: 18105 17940 164 0 38 6666 > > -/+ buffers/cache: 11235 6869 > > Swap: 2047 2045 2 > > > > > > I've read i should avoid jvm swapping. > > What is the best way to do that ? > > - modify swapiness threshold > > - unmount all swap partition > > - force the jvm to stay in memory with mlockall ( > > https://github.com/LucidWorks/mlockall-agent) > > - Other solution > > > > Gabriel. > > > -- *Todd Palino* Senior Staff Engineer, Site Reliability Data Infrastructure Streaming linkedin.com/in/toddpalino