That depends on how many machines you have in your cluster. Say you have 6 workers and its most likely it is to be distributed across all worker (assuming your topic has 6 partitions). Now when you have more than 6 partition, say 12. Then these 6 receivers will start to consume from 2 partitions at a time. And when you have less partitions say 3, then 3 of the receivers will be idle. On 24 Feb 2015 10:16, "[email protected]" <[email protected]> wrote:
> Hi, Akhil,Tathagata, > > This leads me to another question ,For the Spark Streaming and Kafka > Integration, If there are more than one Receiver in the cluster, such as > val streams = (1 to 6).map ( _ => KafkaUtils.createStream(ssc, > zkQuorum, group, topicMap).map(_._2) ), > then these Receivers will stay on one cluster node, or will they > distributed among the cluster nodes? > > ------------------------------ > [email protected] > > > *From:* Akhil Das <[email protected]> > *Date:* 2015-02-24 12:58 > *To:* Tathagata Das <[email protected]> > *CC:* user <[email protected]>; bit1129 <[email protected]> > *Subject:* Re: About FlumeUtils.createStream > > I see, thanks for the clarification TD. > On 24 Feb 2015 09:56, "Tathagata Das" <[email protected]> wrote: > >> Akhil, that is incorrect. >> >> Spark will list on the given port for Flume to push data into it. >> When in local mode, it will listen on localhost:9999 >> When in some kind of cluster, instead of localhost you will have to give >> the hostname of the cluster node where you want Flume to forward the data. >> Spark will launch the Flume receiver on that node (assuming the hostname >> matching is correct), and list on port 9999, for receiving data from Flume. >> So only the configured machine will listen on port 9999. >> >> I suggest trying the other stream. FlumeUtils.createPollingStream. More >> details here. >> http://spark.apache.org/docs/latest/streaming-flume-integration.html >> >> >> >> On Sat, Feb 21, 2015 at 12:17 AM, Akhil Das <[email protected]> >> wrote: >> >>> Spark won't listen on 9999 mate, It basically means you have a flume >>> source running at port 9999 of your localhost. And when you submit your >>> application in standalone mode, workers will consume date from that port. >>> >>> Thanks >>> Best Regards >>> >>> On Sat, Feb 21, 2015 at 9:22 AM, [email protected] <[email protected]> >>> wrote: >>> >>>> >>>> Hi, >>>> In the spark streaming application, I write the code, >>>> FlumeUtils.createStream(ssc,"localhost",9999),which >>>> means spark will listen on the 9999 port, and wait for Flume Sink to write >>>> to it. >>>> My question is: when I submit the application to the Spark Standalone >>>> cluster, will 9999 be opened only on the Driver Machine or all the workers >>>> will also open the 9999 port and wait for the Flume data? >>>> >>>> ------------------------------ >>>> >>>> >>> >>
