Right, but at least in my case, no avro RPC server was started on any of the spark worker nodes except for one. I don't know if that's just some configuration issue with my setup or if it's expected behavior. I would need spark to start avro RPC servers on every worker rather than just one.
- Christophe On Apr 7, 2014 12:24 PM, "Michael Ernest" <mfern...@cloudera.com> wrote: > You can configure your sinks to write to one or more Avro sources in a > load-balanced configuration. > > https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors > > mfe > > > On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp > <christo...@christophe.cc>wrote: > > > Hi, > > > > From my testing of Spark Streaming with Flume, it seems that there's only > > one of the Spark worker nodes that runs a Flume Avro RPC server to > receive > > messages at any given time, as opposed to every Spark worker running an > > Avro RPC server to receive messages. Is this the case? Our use-case would > > benefit from balancing the load across Workers because of our volume of > > messages. We would be using a load balancer in front of the Spark workers > > running the Avro RPC servers, essentially round-robinning the messages > > across all of them. > > > > If this is something that is currently not supported, I'd be interested > in > > contributing to the code to make it happen. > > > > - Christophe > > > > > > -- > Michael Ernest > Sr. Solutions Consultant > West Coast >