Hi there, I have several different production servers running in different cloud providers, like AWS, Rackspace, ..etc. And I have one central big data environment inside company's network. I need to build a NRT(near real time) monitoring system which I need to stream and archive all the logs from all those different servers into the central place.
I have read through the documentation of Kafka and found that it seems like Kafka Connect <http://kafka.apache.org/documentation.html#quickstart_kafkaconnect> does what I want. However, instead of connecting to a local file, I really need to connect to a file that is on a remote server. Even more challenging, due to the existence of the firewall, I can only pull from the remote server into the big data environment instead of pushing from the remote to the big data environment. Is there any built in functionality in Kafka does what I want? If not, what is the best practice to architect this? I have seen how Splunk and Sumologic and really amazed at how well they works, however, at least Sumologic requires to put a connector on the remote server and it will periodically check and update the logs to sumo's server. It is kind of cool but really won't work in my scenario due to the firewall. I have asked something similar on Stackoverflow <http://stackoverflow.com/questions/34498946/kafka-pull-logs-from-remote-servers> in case you are interested in getting a few points there :) Best regards, Bin