When you state the numbers, are they the same across instances in the cluster, meaning that Topic-0 would have 910*5 GB in source cluster and 25*5 GB in target cluster?
Another possibility is that MirrorMaker uses compression on the producer side, but I would be surprised if the compression rate could be 25/910. Guozhang On Thu, Aug 22, 2013 at 3:48 PM, Rajasekar Elango <rela...@salesforce.com>wrote: > Yes, both source and target clusters have 5 brokers in cluster. > > Sent from my iPhone > > On Aug 22, 2013, at 6:11 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Hello Rajasekar, > > > > Are the size of the source cluster and target cluster the same? > > > > Guozhang > > > > > > On Thu, Aug 22, 2013 at 2:14 PM, Rajasekar Elango < > rela...@salesforce.com>wrote: > > > >> Hi, > >> > >> We are using mirrormaker to replicate data between two kafka clusters. > I am > >> seeing huge difference in size of log in data dir between the broker in > >> source cluster vs broker in destination cluster: > >> > >> For eg: Size of ~/data/Topic-0/ is about 910 G in source broker, but > only > >> its only 25G in destination broker. I see segmented log files (~500 M) > is > >> created for about every 2 or 3 mins in source brokers, but I see > segmented > >> log files is created for about every 25 mins in destination broker. > >> > >> I verified mirrormaker is doing fine using consumer offset checker, not > >> much lag, offsets are incrementing. I also verified that > topics/partitions > >> are not under replicated in both source and target cluster. What is the > >> reason for this difference in disk usage? > >> > >> > >> -- > >> Thanks, > >> Raja. > > > > > > > > -- > > -- Guozhang > -- -- Guozhang