Hi,

From my testing of Spark Streaming with Flume, it seems that there's only one of the Spark worker nodes that runs a Flume Avro RPC server to receive messages at any given time, as opposed to every Spark worker running an Avro RPC server to receive messages. Is this the case? Our use-case would benefit from balancing the load across Workers because of our volume of messages. We would be using a load balancer in front of the Spark workers running the Avro RPC servers, essentially round-robinning the messages across all of them.

If this is something that is currently not supported, I'd be interested in contributing to the code to make it happen.

- Christophe

Reply via email to