Re: Having problem with Spark streaming with Kinesis

Aniket Bhatnagar Wed, 26 Nov 2014 20:30:07 -0800

Did you set spark master as local[*]? If so, then it means that nunber of
executors is equal to number of cores of the machine. Perhaps your mac
machine has more cores (certainly more than number of kinesis shards +1).


Try explicitly setting master as local[N] where N is number of kinesis
shards + 1. It should then work on both the machines.

On Thu, Nov 27, 2014, 9:46 AM Ashrafuzzaman <ashrafuzzaman...@gmail.com>
wrote:

> I was trying in one machine with just sbt run.
>
> And it is working with my mac environment with the same configuration.
>
> I used the sample code from
> https://github.com/apache/spark/blob/master/extras/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala
>
>
> val kinesisClient = new AmazonKinesisClient(new
> DefaultAWSCredentialsProviderChain())
> kinesisClient.setEndpoint(endpointUrl)
> val numShards =
> kinesisClient.describeStream(streamName).getStreamDescription().getShards()
>   .size()
>
> /* In this example, we're going to create 1 Kinesis
> Worker/Receiver/DStream for each shard. */
> val numStreams = numShards
>
> /* Setup the and SparkConfig and StreamingContext */
> /* Spark Streaming batch interval */
> val batchInterval = Milliseconds(2000)
> val sparkConfig = new SparkConf().setAppName("KinesisWordCount")
> val ssc = new StreamingContext(sparkConfig, batchInterval)
>
> /* Kinesis checkpoint interval.  Same as batchInterval for this example. */
> val kinesisCheckpointInterval = batchInterval
>
> /* Create the same number of Kinesis DStreams/Receivers as Kinesis
> stream's shards */
> val kinesisStreams = (0 until numStreams).map { i =>
>   KinesisUtils.createStream(ssc, streamName, endpointUrl,
> kinesisCheckpointInterval,
>       InitialPositionInStream.LATEST, StorageLevel.MEMORY_AND_DISK_2)
> }
>
> /* Union all the streams */
> val unionStreams = ssc.union(kinesisStreams)
>
> /* Convert each line of Array[Byte] to String, split into words, and count
> them */
> val words = unionStreams.flatMap(byteArray => new String(byteArray)
>   .split(" "))
>
> /* Map each word to a (word, 1) tuple so we can reduce/aggregate by key. */
> val wordCounts = words.map(word => (word, 1)).reduceByKey(_ + _)
>
> /* Print the first 10 wordCounts */
> wordCounts.print()
>
> /* Start the streaming context and await termination */
> ssc.start()
> ssc.awaitTermination()
>
>
>
> A.K.M. Ashrafuzzaman
> Lead Software Engineer
> NewsCred <http://www.newscred.com>
>
> (M) 880-175-5592433
> Twitter <https://twitter.com/ashrafuzzaman> | Blog
> <http://jitu-blog.blogspot.com/> | Facebook
> <https://www.facebook.com/ashrafuzzaman.jitu>
>
> Check out The Academy <http://newscred.com/theacademy>, your #1 source
> for free content marketing resources
>
> On Wed, Nov 26, 2014 at 11:26 PM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>> What's your cluster size? For streamig to work, it needs shards + 1
>> executors.
>>
>> On Wed, Nov 26, 2014, 5:53 PM A.K.M. Ashrafuzzaman <
>> ashrafuzzaman...@gmail.com> wrote:
>>
>>> Hi guys,
>>> When we are using Kinesis with 1 shard then it works fine. But when we
>>> use more that 1 then it falls into an infinite loop and no data is
>>> processed by the spark streaming. In the kinesis dynamo DB, I can see that
>>> it keeps increasing the leaseCounter. But it do start processing.
>>>
>>> I am using,
>>> scala: 2.10.4
>>> java version: 1.8.0_25
>>> Spark: 1.1.0
>>> spark-streaming-kinesis-asl: 1.1.0
>>>
>>> A.K.M. Ashrafuzzaman
>>> Lead Software Engineer
>>> NewsCred <http://www.newscred.com/>
>>>
>>> (M) 880-175-5592433
>>> Twitter <https://twitter.com/ashrafuzzaman> | Blog
>>> <http://jitu-blog.blogspot.com/> | Facebook
>>> <https://www.facebook.com/ashrafuzzaman.jitu>
>>>
>>> Check out The Academy <http://newscred.com/theacademy>, your #1 source
>>> for free content marketing resources
>>>
>>>
>

Re: Having problem with Spark streaming with Kinesis

Reply via email to