Which version of Kafka are you using? Cluster expansion should be reliable in 0.8.1.1.
As for using HDFS, we thought about it when implement replication in Kafka. The short answer is that using HDFS in Kafka is not easy. You can see the discussion in https://issues.apache.org/jira/browse/KAFKA-50 Thanks, Jun On Mon, May 19, 2014 at 2:28 AM, Hangjun Ye <yehang...@gmail.com> wrote: > Hi there, > > I recently started to use Kafka for our data analysis pipeline and it works > very well. > > One problem to us so far is expanding our cluster when we need more storage > space. > Kafka provides some scripts for helping do this but the process wasn't > smooth. > > To make it work perfectly, seems Kafka needs to do some jobs that a > distributed file system has already done. > So just wondering if any thoughts to make Kafka work on top of HDFS? Maybe > make the Kafka storage engine pluggable and HDFS is one option? > > The pros might be that HDFS has already handled storage management > (replication, corrupted disk/machine, migration, load balance, etc.) very > well and it frees Kafka and the users from the burden, and the cons might > be performance degradation. > As Kafka does very well on performance, possibly even with some degree of > degradation, it's still competitive for the most situations. > > Best, > -- > Hangjun Ye >