Allen, 1. The design docs state that most OS¹s make good use of memory and keep recently written files in memory (http://kafka.apache.org/documentation.html#maximizingefficiency). Since everything was just written, it should still be fresh in the OS cache. 2. I cannot answer your virtualization question. But to answer the second part of your question, throughput depends on message size and content (Is it highly compressible). Most of the benchmarks I have seen, kafka can max out the network pic on most machines if your clients have a good number of partitions. Kafka is horizontally scalable, provided that Number of Partitions of data >> Number of brokers and that each partition sees somewhat uniform levels of traffic. -Erik
On 8/31/15, 1:09 PM, "allen chan" <allen.michael.c...@gmail.com> wrote: >I am currently using the Elasticsearch (ELK stack) and Redis is the >current >choice as broker. > >I want to move to a distributed broker to make that layer more HA. >Currently exploring kafka as a replacement. > >I have a few questions: >1. I read that kafka is designed to write contents to disk and this cannot >be turned off. If everything is working properly on the elasticsearch >side, >the logs should be pulled off right away. Is there a setting i can use to >hold the logs in page cache before writing to disk? > >2. Does kafka work well in virtualized vmware environment? Does anyone has >specs to be used for sustained 80k messages per second. I am thinking of >using three kafka nodes to begin with. > >Sorry for the questions. I cannot find a really good book right now. > > >-- >Allen Michael Chan