Ah, one thing to be aware of is that the effectiveness of compression is directly related to the producer batch size--more batching, more compression. So even if you use compression on both clusters the mirror may be much smaller.
-jay On Friday, August 23, 2013, Rajasekar Elango wrote: > Thanks Guazhang, Jun, > > Yes we doing gzip compression and that should be reason for difference in > disk usage. I had a typo that the size is actually 91G in source cluster.So > 25G/91G ratio makes sense for compression. > > Thanks, > Raja. > > > On Thu, Aug 22, 2013 at 7:00 PM, Guozhang Wang > <wangg...@gmail.com<javascript:;>> > wrote: > > > When you state the numbers, are they the same across instances in the > > cluster, meaning that Topic-0 would have 910*5 GB in source cluster and > > 25*5 GB in target cluster? > > > > Another possibility is that MirrorMaker uses compression on the producer > > side, but I would be surprised if the compression rate could be 25/910. > > > > Guozhang > > > > > > On Thu, Aug 22, 2013 at 3:48 PM, Rajasekar Elango < > rela...@salesforce.com <javascript:;> > > >wrote: > > > > > Yes, both source and target clusters have 5 brokers in cluster. > > > > > > Sent from my iPhone > > > > > > On Aug 22, 2013, at 6:11 PM, Guozhang Wang > > > <wangg...@gmail.com<javascript:;>> > wrote: > > > > > > > Hello Rajasekar, > > > > > > > > Are the size of the source cluster and target cluster the same? > > > > > > > > Guozhang > > > > > > > > > > > > On Thu, Aug 22, 2013 at 2:14 PM, Rajasekar Elango < > > > rela...@salesforce.com <javascript:;>>wrote: > > > > > > > >> Hi, > > > >> > > > >> We are using mirrormaker to replicate data between two kafka > clusters. > > > I am > > > >> seeing huge difference in size of log in data dir between the broker > > in > > > >> source cluster vs broker in destination cluster: > > > >> > > > >> For eg: Size of ~/data/Topic-0/ is about 910 G in source broker, but > > > only > > > >> its only 25G in destination broker. I see segmented log files (~500 > M) > > > is > > > >> created for about every 2 or 3 mins in source brokers, but I see > > > segmented > > > >> log files is created for about every 25 mins in destination broker. > > > >> > > > >> I verified mirrormaker is doing fine using consumer offset checker, > > not > > > >> much lag, offsets are incrementing. I also verified that > > > topics/partitions > > > >> are not under replicated in both source and target cluster. What is > > the > > > >> reason for this difference in disk usage? > > > >> > > > >> > > > >> -- > > > >> Thanks, > > > >> Raja. > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > > -- > > -- Guozhang > > > > > > -- > Thanks, > Raja. >