Hi Hequn & Kien,
        Finally the problem is solved.
        It is due to slow sink write. Because the job only have 2 tasks, I 
check the backpressure, found that the source has high backpressure, so I tried 
to improve the sink write. After that the end to end duration is below 1s and 
the checkpoint timeout is fixed.

Best
Henry


> 在 2018年10月24日,下午10:43,徐涛 <happydexu...@gmail.com> 写道:
> 
> Hequn & Kien,
>       Thanks a lot for your help, I will try it later.
> 
> Best
> Henry
> 
> 
>> 在 2018年10月24日,下午8:18,Hequn Cheng <chenghe...@gmail.com 
>> <mailto:chenghe...@gmail.com>> 写道:
>> 
>> Hi Henry,
>> 
>> @Kien is right. Take a thread dump to see what was doing in the TaskManager. 
>> Also check whether gc happens frequently.
>> 
>> Best, Hequn
>>  
>> 
>> On Wed, Oct 24, 2018 at 5:03 PM 徐涛 <happydexu...@gmail.com 
>> <mailto:happydexu...@gmail.com>> wrote:
>> Hi 
>>         I am running a flink application with parallelism 64, I left the 
>> checkpoint timeout default value, which is 10minutes, the state size is less 
>> than 1MB, I am using the FsStateBackend.
>>         The application triggers some checkpoints but all of them fails due 
>> to "Checkpoint expired before completing”, I check the checkpoint history, 
>> found that there are 63 subtask acknowledge, but one left n/a, and also the 
>> alignment duration is quite long, about 5m27s.
>>         I want to know why there is one subtask does not acknowledge? And 
>> because the alignment duration is long, what will influent the alignment 
>> duration?
>>         Thank a lot.
>> 
>> Best
>> Henry
> 

Reply via email to