I have a requirement where I need to feed push traffic(comma separated logs) at a very high rate to flume. I have three concerns:
1. I am using php to send events to flume through rsyslog. The code I am using is : *openlog("mylogs", LOG_NDELAY, LOG_LOCAL2); syslog(LOG_INFO, "aaid,bid,cid,info1,info2,...."); closelog();* I want to add some fields as headers in the above event log " *aaid,bid,cid,info1,info2,....*" , I don't see any function in php where I could add headers for some fields so that I can take some action based on just the headers without opening the complete msg. 2. How to load balance the trafffic. I want the logger to forward the logs to the load balancer and then the load balancer to choose a flume node(based on various factors like current load, cpu utilization) and also handle failures(divert traffic if a flume node goes down). I looked at the flume based load balancer but it provides just two options: Round Robin and Random load balancing. Any ideas as to how I could do this load balancing with failure detection and handling would be very helpful. 3. I want to update a cache in real-time from flume(using interceptor). I want a hashing based approach to divert certain traffic(based on a field or header in log) to certain nodes, so that one node is responsible for updating rows with keys under same hash bucket. This is to avoid row level locking. I hope I have explained my requirements well enough for everyone to understand. But If it's not as clear as I think, please let me know. Regards Mohit