Hey there,

I'm trying build a data ingestion pipeline on azure.

My resources:

1 * Producer [having zookeeper installed with it] (16 Gb Ram and 4 Cores)
2 * Brokers (16Gb Ram and 4Cores)

I'm attempting to transfer log files which have variable size ranging from
0.1 Mb to 0.7 Mb. Current having 2 types of logs coming in, so each of them
are passed through the brokers on a single topic having 2 partitions.

I've got dummy data for both logs, so I tend to get an average throughput
of 14 - 16 files/sec and 12 - 14 files/sec.

The producer client I'm using currently is kafka-python by dpkp.

So to test how well the system would sustain for extended amounts of time,
I looped it in to produce the logs infinitely. But After the second round
of sending all of the sample data I have [takes around 6 to 7 hours] the
throughput falls down drastically to around 4-5 files/sec. I've been
monitoring the RAM and disk usage and all of them fall down sharly with the
same magnitude as the throughput falls down by.

I'm not able to understand why the brokers are having a degraded
throughput.

All my metrics are based from the ack I get from the brokers when the log
files get added on the kafka log.

Any help would be highly appreciated.

Warm Regards,
Prajwal

Reply via email to