Could you do a thread dump in the executor that runs the Kinesis receiver
and post it? It would be great if you can provide the executor log as well?

On Tue, Feb 9, 2016 at 3:14 PM, Roberto Coluccio <roberto.coluc...@gmail.com
> wrote:

> Hello,
>
> can anybody kindly help me out a little bit here? I just verified the
> problem is still there on Spark 1.6.0 and emr-4.3.0 as well. It's
> definitely a Kinesis-related issue, since with Spark 1.6.0 I'm successfully
> able to get Streaming drivers to terminate with no issue IF I don't use
> Kinesis and open any Receivers.
>
> Thank you!
>
> Roberto
>
>
> On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <
> roberto.coluc...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm struggling around an issue ever since I tried to upgrade my Spark
>> Streaming solution from 1.4.1 to 1.5+.
>>
>> I have a Spark Streaming app which creates 3 ReceiverInputDStreams
>> leveraging KinesisUtils.createStream API.
>>
>> I used to leverage a timeout to terminate my app
>> (StreamingContext.awaitTerminationOrTimeout(timeout)) gracefully (SparkConf
>> spark.streaming.stopGracefullyOnShutdown=true).
>>
>> I used to submit my Spark app on EMR in yarn-cluster mode.
>>
>> Everything worked fine up to Spark 1.4.1 (on EMR AMI 3.9).
>>
>> Since I upgraded (tried with Spark 1.5.2 on emr-4.2.0 and Spark 1.6.0 on
>> emr-4.3.0) I can't get the app to actually terminate. Logs tells me it
>> tries to, but no confirmation of receivers stop is retrieved. Instead, when
>> the timer gets to the next period, the StreamingContext continues its
>> processing for a while (then it gets killed with a SIGTERM 15. YARN's vmem
>> and pmem killls disabled).
>>
>> ...
>>
>> 16/02/02 21:22:08 INFO ApplicationMaster: Final app status: SUCCEEDED, 
>> exitCode: 0
>> 16/02/02 21:22:08 INFO StreamingContext: Invoking stop(stopGracefully=true) 
>> from shutdown hook
>> 16/02/02 21:22:08 INFO ReceiverTracker: Sent stop signal to all 3 receivers
>> 16/02/02 21:22:18 INFO ReceiverTracker: Waiting for receiver job to 
>> terminate gracefully
>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned shuffle 141
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> 172.31.3.140:50152 in memory (size: 23.9 KB, free: 2.1 GB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 23.9 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 23.9 KB, free: 1224.0 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 23.9 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_217_piece0 on 
>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 23.9 KB, free: 1224.7 MB)
>> 16/02/02 21:22:52 INFO ContextCleaner: Cleaned accumulator 184
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> 172.31.3.140:50152 in memory (size: 3.0 KB, free: 2.1 GB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-141.ec2.internal:41776 in memory (size: 3.0 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-141.ec2.internal:56428 in memory (size: 3.0 KB, free: 1224.9 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-140.ec2.internal:36295 in memory (size: 3.0 KB, free: 1224.0 MB)
>> 16/02/02 21:22:52 INFO BlockManagerInfo: Removed broadcast_218_piece0 on 
>> ip-172-31-3-140.ec2.internal:50542 in memory (size: 3.0 KB, free: 1224.7 MB)
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 680 for time 1454448300000 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 708 for time 1454448300000 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO TransformedDStream: Slicing from 1454448000000 ms to 
>> 1454448300000 ms (aligned to 1454448000000 ms and 1454448300000 ms)
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 777 for time 1454448300000 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO StateDStream: Marking RDD 801 for time 1454448300000 
>> ms for checkpointing
>> 16/02/02 21:25:00 INFO JobScheduler: Added jobs for time 1454448300000 ms
>> 16/02/02 21:25:00 INFO JobGenerator: Checkpointing graph for time 
>> 1454448300000 ms
>> 16/02/02 21:25:00 INFO DStreamGraph: Updating checkpoint data for time 
>> 1454448300000 ms
>> 16/02/02 21:25:00 INFO JobScheduler: Starting job streaming job 
>> 1454448300000 ms.0 from job set of time 1454448300000 ms
>>
>> ...
>>
>>
>> Please, this is really blocking in the upgrade process to latest Spark
>> versions and I really don't know how to work it around.
>>
>> Any help would be very much appreciated.
>>
>> Thank you,
>>
>> Roberto
>>
>>
>>
>

Reply via email to