Hi folks, I needed bit of feedback from you based on your experiences using kafka streaming application.
We have a replicated kafka cluster running in a data center in one city. We are running a kafka streaming application which reads from a source topic from that cluster and commits the output into local database in its own data center. The distance between these two data center is about 1000 miles, with high latency(20 - 70 ms) 100 mbps connection between the two. Our source topic receives 10,000 message per second and a message size is around 4 KB. Since the streaming application receives lot of messages, aggregates them and again sends aggregated messages to a changelog topic, and then again reads from changelog topic and updates local store. This is a continuous process, with changelog topic message size may grow upto 100KB to 750KB. So you get an idea that there is lot of network data exchange to and fro between 2 data centers. In such a scenario is it advisable to run streaming application in a WAN kind of setup or it is better to move the streaming application within the LAN of kafka cluster. We seem to be running into some request timeout issues when running the application on a WAN vs LAN and needed to know if network connection between the two could be the issue. Please let me know your thoughts. Thanks Sachin