Re: Re: About FlumeUtils.createStream

Akhil Das Mon, 23 Feb 2015 21:05:10 -0800

That depends on how many machines you have in your cluster. Say you have 6
workers and its most likely it is to be distributed across all worker
(assuming your topic has 6 partitions). Now when you have more than 6
partition, say 12. Then these 6 receivers will start to consume from 2
partitions at a time. And when you have less partitions say 3, then 3 of
the receivers will be idle.
On 24 Feb 2015 10:16, "[email protected]" <[email protected]> wrote:


> Hi, Akhil,Tathagata,
>
> This leads me to another question ,For the Spark Streaming and Kafka
> Integration, If there are more than one Receiver in the cluster, such as
>   val streams = (1 to 6).map ( _ => KafkaUtils.createStream(ssc,
> zkQuorum, group, topicMap).map(_._2) ),
> then these Receivers will stay on one cluster node, or will they
> distributed among the cluster nodes?
>
> ------------------------------
> [email protected]
>
>
> *From:* Akhil Das <[email protected]>
> *Date:* 2015-02-24 12:58
> *To:* Tathagata Das <[email protected]>
> *CC:* user <[email protected]>; bit1129 <[email protected]>
> *Subject:* Re: About FlumeUtils.createStream
>
> I see, thanks for the clarification TD.
> On 24 Feb 2015 09:56, "Tathagata Das" <[email protected]> wrote:
>
>> Akhil, that is incorrect.
>>
>> Spark will list on the given port for Flume to push data into it.
>> When in local mode, it will listen on localhost:9999
>> When in some kind of cluster, instead of localhost you will have to give
>> the hostname of the cluster node where you want Flume to forward the data.
>> Spark will launch the Flume receiver on that node (assuming the hostname
>> matching is correct), and list on port 9999, for receiving data from Flume.
>> So only the configured machine will listen on port 9999.
>>
>> I suggest trying the other stream. FlumeUtils.createPollingStream. More
>> details here.
>> http://spark.apache.org/docs/latest/streaming-flume-integration.html
>>
>>
>>
>> On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das <[email protected]>
>> wrote:
>>
>>> Spark won't listen on 9999 mate, It basically means you have a flume
>>> source running at port 9999 of your localhost. And when you submit your
>>> application in standalone mode, workers will consume date from that port.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Sat, Feb 21, 2015 at 9:22 AM, [email protected] <[email protected]>
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>> In the spark streaming application, I write the code, 
>>>> FlumeUtils.createStream(ssc,"localhost",9999),which
>>>> means spark will listen on the 9999 port, and wait for Flume Sink to write
>>>> to it.
>>>> My question is:  when I submit the application to the Spark Standalone
>>>> cluster, will 9999 be opened only on the Driver Machine or all the workers
>>>> will also open the 9999 port and wait for the Flume data?
>>>>
>>>> ------------------------------
>>>>
>>>>
>>>
>>

Re: Re: About FlumeUtils.createStream

Reply via email to