Re: One or multiple instances of MM to aggregate kafka data to one hadoop

Daniel Compton Wed, 28 Jan 2015 15:56:41 -0800

Hi Mingjie

I would recommend the first option of running one mirrormaker instance
pulling from multiple DC's.

A single MM instance will be able to make more efficient use of the machine
resources in two ways:
1. You will only have to run one process which will be able to be allocated
the full amount of resources
2. Within the process, if you run enough consumer threads, I think that
they should be able to rebalance and pick up the load if they don't have
anything to do. I'm not 100% sure on this, but 1 still holds.

A single MM instance should handle connectivity issues with one DC without
affecting the rest of the consumer threads for other DC's.

You would gain process isolation running a MM per DC, but this would raise
the operational burden and resource requirements. I'm not sure what benefit
you'd actually get from process isolation, so I'd recommend against it.
However I'd be interested to hear if others do things differently.

Daniel.

On Thu Jan 29 2015 at 11:14:29 AM Mingjie Lai <m...@apache.org> wrote:

> Hi.
>
> We have a pretty typical data ingestion use case that we use mirrormaker at
> one hadoop data center, to mirror kafka data from multiple remote
> application data centers. I know mirrormaker can support to consume kafka
> data from multiple kafka source, by one instance at one physical node. By
> this, we can give one instance of mm multiple consumer config files, so it
> can consume data from muti places.
>
> Another option is to have multiple mirrormaker instances at one node, each
> mm instance is dedicated to grab data from one single source data center.
> Certainly there will be multiple mm nodes to balance the load.
>
> The second option looks better since it kind of has an isolation for
> different data centers.
>
> Any recommendation for this kind of data aggregation cases?
>
> Still new to kafka and mirrormaker. Welcome any information.
>
> Thanks,
> Mingjie
>

Re: One or multiple instances of MM to aggregate kafka data to one hadoop

Reply via email to