subject:"Re\: One or multiple instances of MM to aggregate kafka data to one hadoop"

Re: One or multiple instances of MM to aggregate kafka data to one hadoop

2015-01-30 Thread Mingjie Lai

Really appreciate you guys' recommendations. On Thu, Jan 29, 2015 at 9:22 AM, Jon Bringhurst < jbringhu...@linkedin.com.invalid> wrote: > Hey Mingjie, > > Here's how we have our mirror makers configured. For some context, let me > try to describe this using the example datacenter layout as descri

Re: One or multiple instances of MM to aggregate kafka data to one hadoop

2015-01-29 Thread Jon Bringhurst

Hey Mingjie, Here's how we have our mirror makers configured. For some context, let me try to describe this using the example datacenter layout as described in: https://engineering.linkedin.com/samza/operating-apache-samza-scale In that example, there are four data centers (A, B, C, and D). How

Re: One or multiple instances of MM to aggregate kafka data to one hadoop

2015-01-28 Thread Daniel Compton

Hi Mingjie I would recommend the first option of running one mirrormaker instance pulling from multiple DC's. A single MM instance will be able to make more efficient use of the machine resources in two ways: 1. You will only have to run one process which will be able to be allocated the full amo