Hangjun, Does having Kafka in Yarn would be a big architectural change from where it is now? From what I have seen on most typical setup you want machines optimized for Kafka, not just it on top of hdfs. -Steve
On Tue, May 20, 2014 at 8:37 PM, Hangjun Ye <yehang...@gmail.com> wrote: > Thanks Jun and Francois. > > We used Kafka 0.8.0 previously. We got some weird error when expanding > cluster and it couldn't be finished. > Now we use 0.8.1.1, I would have a try on cluster expansion sometime. > > I read the discussion on that jira issue and I agree with points raised > there. > HDFS was also improved a lot since then and many issues have been resolved > (e.g. SPOF). > > We have a team for building and providing storage/computing platform for > our company and we have already provided a Hadoop cluster. > If Kafka has an option to store data on HDFS, we just need to allocate some > space quota for it on our cluster (and increase it on demand) and it might > reduce our operational cost a lot. > > Another (and maybe more aggressive) thought is about the deployment. Jun > has a good point: "HDFS only provides data redundancy, but not > computational redundancy". If Kafka could be deployed on YARN, it could > offload some computational resource management to YARN and we don't have to > allocate machines physically. Kafka still needs to take care of load > balance and partition assignment among brokers by itself. > Many computational frameworks like spark/samza have such an option and it's > a big attractive point for us. > > Best, > Hangjun > > > 2014-05-20 21:00 GMT+08:00 François Langelier <f.langel...@gmail.com>: > > > Take a look at Camus <https://github.com/linkedin/camus/> > > > > > > > > François Langelier > > Étudiant en génie Logiciel - École de Technologie > > Supérieure<http://www.etsmtl.ca/> > > Capitaine Club Capra <http://capra.etsmtl.ca/> > > VP-Communication - CS Games <http://csgames.org> 2014 > > Jeux de Génie <http://www.jdgets.com/> 2011 à 2014 > > Argentier Fraternité du Piranha <http://fraternitedupiranha.com/> > > 2012-2014 > > Comité Organisateur Olympiades ÉTS 2012 > > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior > > > > > > On 19 May 2014 05:28, Hangjun Ye <yehang...@gmail.com> wrote: > > > > > Hi there, > > > > > > I recently started to use Kafka for our data analysis pipeline and it > > works > > > very well. > > > > > > One problem to us so far is expanding our cluster when we need more > > storage > > > space. > > > Kafka provides some scripts for helping do this but the process wasn't > > > smooth. > > > > > > To make it work perfectly, seems Kafka needs to do some jobs that a > > > distributed file system has already done. > > > So just wondering if any thoughts to make Kafka work on top of HDFS? > > Maybe > > > make the Kafka storage engine pluggable and HDFS is one option? > > > > > > The pros might be that HDFS has already handled storage management > > > (replication, corrupted disk/machine, migration, load balance, etc.) > very > > > well and it frees Kafka and the users from the burden, and the cons > might > > > be performance degradation. > > > As Kafka does very well on performance, possibly even with some degree > of > > > degradation, it's still competitive for the most situations. > > > > > > Best, > > > -- > > > Hangjun Ye > > > > > > > > > -- > Hangjun Ye >