[jira] [Updated] (FLINK-29854) Make Record Size Flush Strategy Optional for Async Sink

Danny Cranmer (Jira) Wed, 02 Nov 2022 14:11:09 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-29854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Danny Cranmer updated FLINK-29854:
----------------------------------
    Description: 
h3. Background

Currently AsyncSinkWriter supports three mechanisms that trigger a flush to the 
destination:
 * TIme based 
 * Batch size in bytes
 * Number of records in the batch

For "batch size in bytes" one must implement 
[getSizeInBytes|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L202]
 in order for the base to calculate the total batch size. In some cases 
computing the batch size within the AsyncSinkWriter is an expensive operation, 
or not possible. For example, the DynamoDB connector needs to determine the 
serialized size of {{DynamoDbWriteRequest}}. 
(https://github.com/apache/flink-connector-dynamodb/pull/1/files#r1012223894)

h3. Scope

Add a feature to make "size in bytes" support optional, this includes:
- Connectors will not be required to implement {{getSizeInBytes}}
- Batches will not be validated for max size
- Records will not be validated for size
- Batches are not flushed when max size is exceeded

The sink implementer can decide if it is appropriate to enable this feature.





  was:
h3. Background

Currently AsyncSinkWriter supports three mechanisms that trigger a flush to the 
destination:
 * TIme based 
 * Batch size in bytes
 * Number of records in the batch

For "batch size in bytes" one must implement 
[getSizeInBytes|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L202]
 in order for the base to calculate the total batch size. In some cases 
computing the batch size within the AsyncSinkWriter is an expensive operation, 
or not possible. For example, the DynamoDB connector needs to determine the 
serialized size of {{DynamoDbWriteRequest}}. 
(https://github.com/apache/flink-connector-dynamodb/pull/1/files#r1012223894)

h3. Scope

Add a feature to make "size in bytes" support optional, this includes:
- Connectors will not be required to implement {{getSizeInBytes}}
- Batches will not be validated for max size
- Records will not be validated size

The sink implementer can decide if it is appropriate to enable this feature.






> Make Record Size Flush Strategy Optional for Async Sink
> -------------------------------------------------------
>
>                 Key: FLINK-29854
>                 URL: https://issues.apache.org/jira/browse/FLINK-29854
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Common
>            Reporter: Danny Cranmer
>            Assignee: Ahmed Hamdy
>            Priority: Major
>
> h3. Background
> Currently AsyncSinkWriter supports three mechanisms that trigger a flush to 
> the destination:
>  * TIme based 
>  * Batch size in bytes
>  * Number of records in the batch
> For "batch size in bytes" one must implement 
> [getSizeInBytes|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L202]
>  in order for the base to calculate the total batch size. In some cases 
> computing the batch size within the AsyncSinkWriter is an expensive 
> operation, or not possible. For example, the DynamoDB connector needs to 
> determine the serialized size of {{DynamoDbWriteRequest}}. 
> (https://github.com/apache/flink-connector-dynamodb/pull/1/files#r1012223894)
> h3. Scope
> Add a feature to make "size in bytes" support optional, this includes:
> - Connectors will not be required to implement {{getSizeInBytes}}
> - Batches will not be validated for max size
> - Records will not be validated for size
> - Batches are not flushed when max size is exceeded
> The sink implementer can decide if it is appropriate to enable this feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-29854) Make Record Size Flush Strategy Optional for Async Sink

Reply via email to