[jira] [Comment Edited] (FLINK-24229) [FLIP-171] DynamoDB implementation of Async Sink

Yuri Gusev (Jira) Tue, 30 Nov 2021 08:23:07 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-24229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450520#comment-17450520
 ]


Yuri Gusev edited comment on FLINK-24229 at 11/30/21, 4:22 PM:
---------------------------------------------------------------

Hi [~CrynetLogistics] 

We are working on it, implementation is almost there, will submit a PR for 
review may be this/start of the next week. In our previous implementation we 
deduplicated entries during aggregation of a batch, but now we need to move 
this to the writer itself, because you are doing batching in the 
AsyncSinkWriter.

I'm not sure I followed your answer on the fatal exception behaviour.

What we would like to do is to allow user to define how to handle the failure 
(for example after all retries towards DynamoDB for the current batch, send the 
"poisonous record" to the DLQ, or drop it).

This is how we achieved it in the old implementation: 
[WriteRequestFailureHandler.java|https://github.com/YuriGusev/flink/blob/FLINK-16504_dynamodb_connector_rebased/flink-connectors/flink-connector-dynamodb/src/main/java/org/apache/flink/streaming/connectors/dynamodb/WriteRequestFailureHandler.java],
 
[DynamoDbSink.java|https://github.com/YuriGusev/flink/blob/8787c343b615602c989fa793e0f4687ef40e530c/flink-connectors/flink-connector-dynamodb/src/main/java/org/apache/flink/streaming/connectors/dynamodb/DynamoDbSink.java#L355]

At the moment it seems [it will send the failed records 
back|https://github.com/apache/flink/blob/cb5034b6b8d1a601ae3bc34feb3314518e78aae3/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L304]
 to the queue and retry again, or on fatalException it will propagate the error 
and stop the application.


was (Author: gusev):
Hi [~CrynetLogistics] 

We are working on it, implementation is almost there, will submit a PR for 
review may be this/start of the next week. In our previous implementation we 
deduplicated entries during aggregation of a batch, but now we need to move 
this to the writer itself, because you are doing batching in the 
AsyncSinkWriter.

I'm not sure I followed your answer on the fatal exception behaviour.

What we would like to do is to allow user to define what to do in the end (for 
example after all retries towards DynamoDB for the current batch).

This is how we achieved it in the old implementation: 
[WriteRequestFailureHandler.java|https://github.com/YuriGusev/flink/blob/FLINK-16504_dynamodb_connector_rebased/flink-connectors/flink-connector-dynamodb/src/main/java/org/apache/flink/streaming/connectors/dynamodb/WriteRequestFailureHandler.java],
 
[DynamoDbSink.java|https://github.com/YuriGusev/flink/blob/8787c343b615602c989fa793e0f4687ef40e530c/flink-connectors/flink-connector-dynamodb/src/main/java/org/apache/flink/streaming/connectors/dynamodb/DynamoDbSink.java#L355]

At the moment it seems [it will send the failed records 
back|https://github.com/apache/flink/blob/cb5034b6b8d1a601ae3bc34feb3314518e78aae3/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L304]
 to the queue and retry again, or on fatalException it will propagate the error 
and stop the application.

> [FLIP-171] DynamoDB implementation of Async Sink
> ------------------------------------------------
>
>                 Key: FLINK-24229
>                 URL: https://issues.apache.org/jira/browse/FLINK-24229
>             Project: Flink
>          Issue Type: New Feature
>          Components: Connectors / Common
>            Reporter: Zichen Liu
>            Assignee: Zichen Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> h2. Motivation
> *User stories:*
>  As a Flink user, I’d like to use DynamoDB as sink for my data pipeline.
> *Scope:*
>  * Implement an asynchronous sink for DynamoDB by inheriting the 
> AsyncSinkBase class. The implementation can for now reside in its own module 
> in flink-connectors.
>  * Implement an asynchornous sink writer for DynamoDB by extending the 
> AsyncSinkWriter. The implementation must deal with failed requests and retry 
> them using the {{requeueFailedRequestEntry}} method. If possible, the 
> implementation should batch multiple requests (PutRecordsRequestEntry 
> objects) to Firehose for increased throughput. The implemented Sink Writer 
> will be used by the Sink class that will be created as part of this story.
>  * Java / code-level docs.
>  * End to end testing: add tests that hits a real AWS instance. (How to best 
> donate resources to the Flink project to allow this to happen?)
> h2. References
> More details to be found 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-24229) [FLIP-171] DynamoDB implementation of Async Sink

Reply via email to