Cool. I'll look at making the code change in FlumeUtils and generating a
pull request.
As far as the use case, the volume of messages we have is currently about
30 MB per second which may grow to over what a 1 Gbit network adapter can
handle.
- Christophe
On Apr 7, 2014 1:51 PM, "Michael Ernest"
I don't see why not. If one were doing something similar with straight
Flume, you'd start an agent on each node you care to receive Avro/RPC
events. In the absence of clearer insight to your use case, I'm puzzling
just a little why it's necessary for each Worker to be its own receiver,
but there's
Could it be as simple as just changing FlumeUtils to accept a list of
host/port number pairs to start the RPC servers on?
On 4/7/14, 12:58 PM, Christophe Clapp wrote:
Based on the source code here:
https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/strea
Based on the source code here:
https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala
It looks like in its current version, FlumeUtils does not support
starting an Avro RPC server on more than one worker.
- Christophe
On 4/7
Right, but at least in my case, no avro RPC server was started on any of
the spark worker nodes except for one. I don't know if that's just some
configuration issue with my setup or if it's expected behavior. I would
need spark to start avro RPC servers on every worker rather than just one.
- Chri
You can configure your sinks to write to one or more Avro sources in a
load-balanced configuration.
https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
mfe
On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp
wrote:
> Hi,
>
> From my testing of Spark Streaming with Flume, it seems t