Re: Spark and Flume integration - do I understand this correctly?

Hari Shreedharan Tue, 29 Jul 2014 14:07:44 -0700

Hi,

Deploying spark with Flume is pretty simple. What you'd need to do is:

1. Start your spark Flume DStream Receiver on some machine using one ofthe FlumeUtils.createStream methods - where you need to specify thehostname and port of the worker node on which you want the sparkexecutor to run - say a.b.c.d: 4585. This is where Spark will receivethe data from Flume.

2. Once you application has started, start the flume agent(s) which aregoing to be sending the data, with Avro sinks with hostname set to:a.b.c.d and port set to 4585.


And you are done!

Tathagata Das wrote:


Hari, can you help?

TD

On Tue, Jul 29, 2014 at 12:13 PM, dapooley<[email protected]>  wrote:


Hi,

I am trying to integrate Spark onto a Flume log sink and avro source. The

sink is on one machine (the application), and the source is onanother. Logevents are being sent from the application server to the avro sourceserver

(a log directory sink on the arvo source prints to verify)

The aim is to get Spark to also receive the same events that the avrosource

is getting. The steps, I believe, are:

1. install/start Spark master (on avro source machine).
2. write spark application, deploy (on avro source machine).
3. add spark application as a worker to the master.
4. have spark application configured to same port as avro source

Test setup is using 2 ubuntu VMs on a Windows host.

Flume configuration:

######################### application ##############################
## Tail application log file
# /var/lib/apache-flume-1.5.0-bin/bin/flume-ng agent -n cps -c conf -f
conf/flume-conf.properties
# http://flume.apache.org/FlumeUserGuide.html#exec-source
source_agent.sources = tomcat
source_agent.sources.tomcat.type = exec
source_agent.sources.tomcat.command = tail -F
/var/lib/tomcat/logs/application.log
source_agent.sources.tomcat.batchSize = 1
source_agent.sources.tomcat.channels = memoryChannel

# http://flume.apache.org/FlumeUserGuide.html#memory-channel
source_agent.channels = memoryChannel
source_agent.channels.memoryChannel.type = memory
source_agent.channels.memoryChannel.capacity = 100

## Send to Flume Collector on Analytics Node
# http://flume.apache.org/FlumeUserGuide.html#avro-sink
source_agent.sinks = avro_sink
source_agent.sinks.avro_sink.type = avro
source_agent.sinks.avro_sink.channel = memoryChannel
source_agent.sinks.avro_sink.hostname = 10.0.2.2
source_agent.sinks.avro_sink.port = 41414


######################## avro source ##############################
## Receive Flume events for Spark streaming

# http://flume.apache.org/FlumeUserGuide.html#memory-channel
agent1.channels = memoryChannel
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity = 100

## Flume Collector on Analytics Node
# http://flume.apache.org/FlumeUserGuide.html#avro-source
agent1.sources = avroSource
agent1.sources.avroSource.type = avro
agent1.sources.avroSource.channels = memoryChannel
agent1.sources.avroSource.bind = 0.0.0.0
agent1.sources.avroSource.port = 41414

#Sinks
agent1.sinks = localout

#http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
agent1.sinks.localout.type = file_roll
agent1.sinks.localout.sink.directory = /home/vagrant/flume/logs
agent1.sinks.localout.sink.rollInterval = 0
agent1.sinks.localout.channel = memoryChannel

thank you in advance for any assistance,



--

View this message in context:http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Flume-integration-do-I-understand-this-correctly-tp10879.html

Sent from the Apache Spark User List mailing list archive at Nabble.com.

Tathagata Das <mailto:[email protected]>
July 29, 2014 at 1:52 PM
Hari, can you help?

TD


--

Thanks,
Hari

Re: Spark and Flume integration - do I understand this correctly?

Reply via email to