[jira] [Commented] (FLINK-21522) Iterative stream could not work with stop-with-savepoint

Piotr Nowojski (Jira) Mon, 01 Mar 2021 02:36:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-21522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292808#comment-17292808
 ]


Piotr Nowojski commented on FLINK-21522:
----------------------------------------

My first guess would be that this is a duplicate of FLINK-2470

Keep also in mind, that checkpointing with iterations is very rough and we will 
probably not be fixing it, but replacing with a completely new mechanism based 
on iterations/cyclic support from [Stateful 
Functions|https://flink.apache.org/stateful-functions.html]

> Iterative stream could not work with stop-with-savepoint
> --------------------------------------------------------
>
>                 Key: FLINK-21522
>                 URL: https://issues.apache.org/jira/browse/FLINK-21522
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.11.3, 1.12.1, 1.13.0
>            Reporter: Kezhu Wang
>            Priority: Major
>
> User reports this in user mail list: 
> [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Stateful-functions-2-2-and-stop-with-savepoint-td41772.html]
> I copied the full mail body here:
> {quote}
>  I have an embedded function with a SinkFunction as an egress, implemented as 
> this pseudo-code:
> val serializationSchema = KafkaSchemaSerializationSchema(... props required 
> to use a Confluent Schema Registry with Avro, auth etc ...)
> return SinkFunctionSpec(EGRESS_ID, FlinkKafkaProducer(serializationSchema, 
> props, AT_LEAST_ONCE))
> Checkpointing and taking a savepoint without stopping work as expected.
> However, when I run "flink stop <job-id>" or even "flink stop --drain 
> <job-id>", the operation never completes, reporting IN_PROGRESS until I hit 
> the "failure-cause: org.apache.flink.runtime.checkpoint.CheckpointException: 
> Checkpoint expired before completing" CompletedException.
> In the "Checkpoint History" it shows only 2 of my 3 operators completed their 
> work:
> Source: my-ingress-ingress -> router (my-ingress) | acknowledge: 1/1 (100%) | 
> end-to-end duration: 638ms | data-size 1.38 KB
> feedback-union -> functions -> Sink: my-egress-egress | acknowledge 0/1 0% | 
> end-to-end duration: n/a | data-size: n/a
> feedback | acknowledge: 1/1 (100%) | end-to-end duration: 626ms | data-size: 
> 0 B
> I've been unable to gain any insights from logs so far. Thoughts?
> {quote}
> I think it is what we overlooked in evaluation of 
> [apache/iceberg#2033|https://github.com/apache/iceberg/issues/2033], 
> FLINK-21132 and FLINK-21133.
> I think the problem is two folds in current implementation:
>  # {{StreamIterationHead}} does not finish itself.
>  # There is a local feedback from {{StreamIterationTail}} to 
> {{StreamIterationHead}} which could cause {{StreamIterationTail}} blocking 
> after {{StreamIterationHead}} finished . Globally speaking, it is a loop. 
> [~pnowojski] emphasized this in FLINK-21132 and FLINK-21133.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21522) Iterative stream could not work with stop-with-savepoint

Reply via email to