Re: Make kafka storage engine pluggable and provide a HDFS plugin?

Kam Kasravi Wed, 21 May 2014 07:55:02 -0700

Hi Hangjun

I've explored deploying kafka on yarn and current YARN does not support long 
running services with locality constraints. Deploying kafka producers / 
consumers (not brokers) is supported in the apache incubator samza project. 
Background on YARN limitations can be found here: YARN-371, YARN-1040, 
YARN-1404, YARN-1412 and YARN-2027.  Support for long running services within 
YARN will likely change with the work that Carlo Curino and team are doing 
(rayon) which is described in YARN-1051. Background/technical details are 
described within that JIRA.


Thanks
Kam
On Tuesday, May 20, 2014 10:40 PM, Hangjun Ye <yehang...@gmail.com> wrote:
 


Hi Steve,

Yes, what I want is that Kafka doesn't have to care about machines
physically (as an option).

Best,
Hangjun


2014-05-21 11:46 GMT+08:00 Steve Morin <st...@stevemorin.com>:

> Hangjun,
>   Does having Kafka in Yarn would be a big architectural change from where
> it is now?  From what I have seen on most typical setup you want machines
> optimized for Kafka, not just it on top of hdfs.
> -Steve
>
>
> On Tue, May 20, 2014 at 8:37 PM, Hangjun Ye <yehang...@gmail.com> wrote:
>
> > Thanks Jun and Francois.
> >
> > We used Kafka 0.8.0 previously. We got some weird error when expanding
> > cluster and it couldn't be finished.
> > Now we use 0.8.1.1, I would have a try on cluster expansion sometime.
> >
> > I read the discussion on that jira issue and I agree with points raised
> > there.
> > HDFS was also improved a lot since then and many issues have been
> resolved
> > (e.g. SPOF).
> >
> > We have a team for building and providing storage/computing platform for
> > our company and we have already provided a Hadoop cluster.
> > If Kafka has an option to store data on HDFS, we just need to allocate
> some
> > space quota for it on our cluster (and increase it on demand) and it
> might
> > reduce our operational cost a lot.
> >
> > Another (and maybe more aggressive) thought is about the deployment. Jun
> > has a good point: "HDFS only provides data redundancy, but not
> > computational redundancy". If Kafka could be deployed on YARN, it could
> > offload some computational resource management to YARN and we don't have
> to
> > allocate machines physically. Kafka still needs to take care of load
> > balance and partition assignment among brokers by itself.
> > Many computational frameworks like spark/samza have such an option and
> it's
> > a big attractive point for us.
> >
> > Best,
> > Hangjun
> >
> >
> > 2014-05-20 21:00 GMT+08:00 François Langelier <f.langel...@gmail.com>:
> >
> > > Take a look at Camus <https://github.com/linkedin/camus/>
> > >
> > >
> > >
> > > François Langelier
> > > Étudiant en génie Logiciel - École de Technologie
> > > Supérieure<http://www.etsmtl.ca/>
> > > Capitaine Club Capra <http://capra.etsmtl.ca/>
> > > VP-Communication - CS Games <http://csgames.org> 2014
> > > Jeux de Génie <http://www.jdgets.com/> 2011 à 2014
> > > Argentier Fraternité du Piranha <http://fraternitedupiranha.com/>
> > > 2012-2014
> > > Comité Organisateur Olympiades ÉTS 2012
> > > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> > >
> > >
> > > On 19 May 2014 05:28, Hangjun Ye <yehang...@gmail.com> wrote:
> > >
> > > > Hi there,
> > > >
> > > > I recently started to use Kafka for our data analysis pipeline and it
> > > works
> > > > very well.
> > > >
> > > > One problem to us so far is expanding our cluster when we need more
> > > storage
> > > > space.
> > > > Kafka provides some scripts for helping do this but the process
> wasn't
> > > > smooth.
> > > >
> > > > To make it work perfectly, seems Kafka needs to do some jobs that a
> > > > distributed file system has already done.
> > > > So just wondering if any thoughts to make Kafka work on top of HDFS?
> > > Maybe
> > > > make the Kafka storage engine pluggable and HDFS is one option?
> > > >
> > > > The pros might be that HDFS has already handled storage management
> > > > (replication, corrupted disk/machine, migration, load balance, etc.)
> > very
> > > > well and it frees Kafka and the users from the burden, and the cons
> > might
> > > > be performance degradation.
> > > > As Kafka does very well on performance, possibly even with some
> degree
> > of
> > > > degradation, it's still competitive for the most situations.
> > > >
> > > > Best,
> > > > --
> > > > Hangjun Ye
> > > >
> > >
> >
> >
> >
> > --
> > Hangjun Ye
> >
>



-- 
Hangjun Ye

Re: Make kafka storage engine pluggable and provide a HDFS plugin?

Reply via email to