Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Cool. I'll look at making the code change in FlumeUtils and generating a pull request. As far as the use case, the volume of messages we have is currently about 30 MB per second which may grow to over what a 1 Gbit network adapter can handle. - Christophe On Apr 7, 2014 1:51 PM, "Michael Ernest"

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Michael Ernest
I don't see why not. If one were doing something similar with straight Flume, you'd start an agent on each node you care to receive Avro/RPC events. In the absence of clearer insight to your use case, I'm puzzling just a little why it's necessary for each Worker to be its own receiver, but there's

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Could it be as simple as just changing FlumeUtils to accept a list of host/port number pairs to start the RPC servers on? On 4/7/14, 12:58 PM, Christophe Clapp wrote: Based on the source code here: https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/strea

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Based on the source code here: https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala It looks like in its current version, FlumeUtils does not support starting an Avro RPC server on more than one worker. - Christophe On 4/7

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Right, but at least in my case, no avro RPC server was started on any of the spark worker nodes except for one. I don't know if that's just some configuration issue with my setup or if it's expected behavior. I would need spark to start avro RPC servers on every worker rather than just one. - Chri

Re: Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Michael Ernest
You can configure your sinks to write to one or more Avro sources in a load-balanced configuration. https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors mfe On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp wrote: > Hi, > > From my testing of Spark Streaming with Flume, it seems t

Spark Streaming and Flume Avro RPC Servers

2014-04-07 Thread Christophe Clapp
Hi, From my testing of Spark Streaming with Flume, it seems that there's only one of the Spark worker nodes that runs a Flume Avro RPC server to receive messages at any given time, as opposed to every Spark worker running an Avro RPC server to receive messages. Is this the case? Our use-case