[jira] [Updated] (FLINK-18748) Savepoint would be queued unexpected

Congxian Qiu(klion26) (Jira) Tue, 28 Jul 2020 22:37:18 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Congxian Qiu(klion26) updated FLINK-18748:
------------------------------------------
    Description: 
Inspired by a [user-zh 
email|http://apache-flink.147419.n8.nabble.com/flink-1-11-rest-api-saveppoint-td5497.html]

After FLINK-17342, when triggering a checkpoint/savepoint, we'll check whether 
the request can be triggered in 
{{CheckpointRequestDecider#chooseRequestToExecute}}, the logic is as follow:
{code:java}
Preconditions.checkState(Thread.holdsLock(lock));
// 1. 
if (isTriggering || queuedRequests.isEmpty()) {
   return Optional.empty();
}

// 2 too many ongoing checkpoitn/savepoint
if (pendingCheckpointsSizeSupplier.get() >= maxConcurrentCheckpointAttempts) {
   return Optional.of(queuedRequests.first())
      .filter(CheckpointTriggerRequest::isForce)
      .map(unused -> queuedRequests.pollFirst());
}

// 3 check the timestamp of last complete checkpoint
long nextTriggerDelayMillis = nextTriggerDelayMillis(lastCompletionMs);
if (nextTriggerDelayMillis > 0) {
   return onTooEarly(nextTriggerDelayMillis);
}

return Optional.of(queuedRequests.pollFirst());
{code}
But if currently {{pendingCheckpointsSizeSupplier.get()}} < 
{{maxConcurrentCheckpointAttempts}}, and the request is a savepoint, the 
savepoint will still wait some time in step 3. 

I think we should trigger the savepoint immediately if 
{{pendingCheckpointSizeSupplier.get()}} < {{maxConcurrentCheckpointAttempts}}.

  was:
Inspired by an [user-zh 
email|[http://apache-flink.147419.n8.nabble.com/flink-1-11-rest-api-saveppoint-td5497.html]]

After FLINK-17342, when triggering a checkpoint/savepoint, we'll check whether 
the request can be triggered in 
{{CheckpointRequestDecider#chooseRequestToExecute}}, the logic is as follow:
{code:java}
Preconditions.checkState(Thread.holdsLock(lock));
// 1. 
if (isTriggering || queuedRequests.isEmpty()) {
   return Optional.empty();
}

// 2 too many ongoing checkpoitn/savepoint
if (pendingCheckpointsSizeSupplier.get() >= maxConcurrentCheckpointAttempts) {
   return Optional.of(queuedRequests.first())
      .filter(CheckpointTriggerRequest::isForce)
      .map(unused -> queuedRequests.pollFirst());
}

// 3 check the timestamp of last complete checkpoint
long nextTriggerDelayMillis = nextTriggerDelayMillis(lastCompletionMs);
if (nextTriggerDelayMillis > 0) {
   return onTooEarly(nextTriggerDelayMillis);
}

return Optional.of(queuedRequests.pollFirst());
{code}
But if currently {{pendingCheckpointsSizeSupplier.get()}} < 
{{maxConcurrentCheckpointAttempts}}, and the request is a savepoint, the 
savepoint will still wait some time in step 3. 

I think we should trigger the savepoint immediately if 
{{pendingCheckpointSizeSupplier.get()}} < {{maxConcurrentCheckpointAttempts}}.


> Savepoint would be queued unexpected
> ------------------------------------
>
>                 Key: FLINK-18748
>                 URL: https://issues.apache.org/jira/browse/FLINK-18748
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0, 1.11.1
>            Reporter: Congxian Qiu(klion26)
>            Priority: Major
>
> Inspired by a [user-zh 
> email|http://apache-flink.147419.n8.nabble.com/flink-1-11-rest-api-saveppoint-td5497.html]
> After FLINK-17342, when triggering a checkpoint/savepoint, we'll check 
> whether the request can be triggered in 
> {{CheckpointRequestDecider#chooseRequestToExecute}}, the logic is as follow:
> {code:java}
> Preconditions.checkState(Thread.holdsLock(lock));
> // 1. 
> if (isTriggering || queuedRequests.isEmpty()) {
>    return Optional.empty();
> }
> // 2 too many ongoing checkpoitn/savepoint
> if (pendingCheckpointsSizeSupplier.get() >= maxConcurrentCheckpointAttempts) {
>    return Optional.of(queuedRequests.first())
>       .filter(CheckpointTriggerRequest::isForce)
>       .map(unused -> queuedRequests.pollFirst());
> }
> // 3 check the timestamp of last complete checkpoint
> long nextTriggerDelayMillis = nextTriggerDelayMillis(lastCompletionMs);
> if (nextTriggerDelayMillis > 0) {
>    return onTooEarly(nextTriggerDelayMillis);
> }
> return Optional.of(queuedRequests.pollFirst());
> {code}
> But if currently {{pendingCheckpointsSizeSupplier.get()}} < 
> {{maxConcurrentCheckpointAttempts}}, and the request is a savepoint, the 
> savepoint will still wait some time in step 3. 
> I think we should trigger the savepoint immediately if 
> {{pendingCheckpointSizeSupplier.get()}} < {{maxConcurrentCheckpointAttempts}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18748) Savepoint would be queued unexpected

Reply via email to