Really appreciate you guys' recommendations. On Thu, Jan 29, 2015 at 9:22 AM, Jon Bringhurst < jbringhu...@linkedin.com.invalid> wrote:
> Hey Mingjie, > > Here's how we have our mirror makers configured. For some context, let me > try to describe this using the example datacenter layout as described in: > > https://engineering.linkedin.com/samza/operating-apache-samza-scale > > In that example, there are four data centers (A, B, C, and D). However, we > only need Datacenter A and B to describe this. > > Datacenter A mirrors data from local(A) to aggregate(A) as well as > local(B) to aggregate(A). > > Datacenter B mirrors data from local(B) to aggregate(B) as well as > local(A) to aggregate(B). > > The diagram in the article should make easy to visualize. Note that the > mirror makers are running in the destination datacenter and pull the > traffic in. > > Let's say we have two physical machines (lets call them servers 1 and 2 in > datacenter A; servers 3 and 4 in datacenter B) in each datacenter dedicated > to running mirror makers. This is how the layout of mirror maker processes > would look like: > > * Datacenter A MirrorMaker Cluster > * Server 1 > * local(A) to aggregate(A) MirrorMaker Instance > * local(B) to aggregate(A) MirrorMaker Instance > * Server 2 > * local(A) to aggregate(A) MirrorMaker Instance > * local(B) to aggregate(A) MirrorMaker Instance > > * Datacenter B MirrorMaker Cluster > * Server 3 > * local(B) to aggregate(B) MirrorMaker Instance > * local(A) to aggregate(B) MirrorMaker Instance > * Server 4 > * local(B) to aggregate(B) MirrorMaker Instance > * local(A) to aggregate(B) MirrorMaker Instance > > The benefit of this layout is that if the load becomes too high, we would > then add on another server to each cluster that looks exactly like the > others in the cluster (easy to provision). If you get really huge, you can > start creating multiple mirror maker clusters that each handle a specific > flow (but still have homogeneous processes within each cluster). > > Of course, YMMV, but this is what works well for us. :) > > -Jon > > On Jan 28, 2015, at 3:54 PM, Daniel Compton < > daniel.compton.li...@gmail.com> wrote: > > > Hi Mingjie > > > > I would recommend the first option of running one mirrormaker instance > > pulling from multiple DC's. > > > > A single MM instance will be able to make more efficient use of the > machine > > resources in two ways: > > 1. You will only have to run one process which will be able to be > allocated > > the full amount of resources > > 2. Within the process, if you run enough consumer threads, I think that > > they should be able to rebalance and pick up the load if they don't have > > anything to do. I'm not 100% sure on this, but 1 still holds. > > > > A single MM instance should handle connectivity issues with one DC > without > > affecting the rest of the consumer threads for other DC's. > > > > You would gain process isolation running a MM per DC, but this would > raise > > the operational burden and resource requirements. I'm not sure what > benefit > > you'd actually get from process isolation, so I'd recommend against it. > > However I'd be interested to hear if others do things differently. > > > > Daniel. > > > > On Thu Jan 29 2015 at 11:14:29 AM Mingjie Lai <m...@apache.org> wrote: > > > >> Hi. > >> > >> We have a pretty typical data ingestion use case that we use > mirrormaker at > >> one hadoop data center, to mirror kafka data from multiple remote > >> application data centers. I know mirrormaker can support to consume > kafka > >> data from multiple kafka source, by one instance at one physical node. > By > >> this, we can give one instance of mm multiple consumer config files, so > it > >> can consume data from muti places. > >> > >> Another option is to have multiple mirrormaker instances at one node, > each > >> mm instance is dedicated to grab data from one single source data > center. > >> Certainly there will be multiple mm nodes to balance the load. > >> > >> The second option looks better since it kind of has an isolation for > >> different data centers. > >> > >> Any recommendation for this kind of data aggregation cases? > >> > >> Still new to kafka and mirrormaker. Welcome any information. > >> > >> Thanks, > >> Mingjie > >> > >