Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Cool. I'll look at making the code change in FlumeUtils and generating a pull request. As far as the use case, the volume of messages we have is currently about 30 MB per second which may grow to over what a 1 Gbit network adapter can handle. - Christophe On Apr 7, 2014 1:51 PM, "Michael Ernest"

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Michael Ernest
I don't see why not. If one were doing something similar with straight Flume, you'd start an agent on each node you care to receive Avro/RPC events. In the absence of clearer insight to your use case, I'm puzzling just a little why it's necessary for each Worker to be its own receiver, but there's

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Could it be as simple as just changing FlumeUtils to accept a list of host/port number pairs to start the RPC servers on? On 4/7/14, 12:58 PM, Christophe Clapp wrote: Based on the source code here: https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/strea

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Based on the source code here: https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala It looks like in its current version, FlumeUtils does not support starting an Avro RPC server on more than one worker. - Christophe On 4/7

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Right, but at least in my case, no avro RPC server was started on any of the spark worker nodes except for one. I don't know if that's just some configuration issue with my setup or if it's expected behavior. I would need spark to start avro RPC servers on every worker rather than just one. - Chri

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Michael Ernest
You can configure your sinks to write to one or more Avro sources in a load-balanced configuration. https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors mfe On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp wrote: > Hi, > > From my testing of Spark Streaming with Flume, it seems t