Based on the source code here:
https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala
It looks like in its current version, FlumeUtils does not support
starting an Avro RPC server on more than one worker.
- Christophe
On 4/7/14, 12:23 PM, Michael Ernest wrote:
You can configure your sinks to write to one or more Avro sources in a
load-balanced configuration.
https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
mfe
On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp
<christo...@christophe.cc>wrote:
Hi,
From my testing of Spark Streaming with Flume, it seems that there's only
one of the Spark worker nodes that runs a Flume Avro RPC server to receive
messages at any given time, as opposed to every Spark worker running an
Avro RPC server to receive messages. Is this the case? Our use-case would
benefit from balancing the load across Workers because of our volume of
messages. We would be using a load balancer in front of the Spark workers
running the Avro RPC servers, essentially round-robinning the messages
across all of them.
If this is something that is currently not supported, I'd be interested in
contributing to the code to make it happen.
- Christophe