[ https://issues.apache.org/jira/browse/FLINK-31946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723237#comment-17723237 ]
Curtis Jensen edited comment on FLINK-31946 at 5/16/23 5:52 PM: ---------------------------------------------------------------- Hello [~liangtl] Thank you for the reply. To better describe a use case, I have aggregations data for how many times a user logs in from a specific ip address. I also have aggregations for how many times any user logs in from that ip address. These are two separate DynamoDB Items with different partition keys. For two different accounts logging in I might have DynamoDB items like: {{partition_key | sort_key | count}} {{-------------{-}|{-}----------{-}|{-}-----}} {{accountid-xxx | ip-1.1.1.1 | 1 }}{{# number of times account xxx logged in from ip 1}} {{accountid-xxx | ip-1.1.1.2 | 1 }}{{# number of times account xxx logged in from ip 2}} {{accountid-yyy | ip-1.1.1.1 | 1}} {{ip-1.1.1.1 | total | 2 }}{{# number of times any account logged in from ip 1}} {{ip-1.1.1.2 | total | 1 }}{{# number of times any account logged in from ip 2}} When making a query for counts by an account id, I also need total stats for each ip address they log in from. So I have to make and additional query for each ip address. I would like to optimize the query by duplicating the ip total entries for each record with the account partition_key, making a table like: {{partition_key | sort_key | count}} {{-------------{-}|{-}----------------{-}|{-}-----}} {{accountid-xxx | ip-1.1.1.1 | 1 }} {{accountid-xxx | ip-1.1.1.2 | 1 }} {{accountid-xxx | ip-1.1.1.1-total | 2 # duplicate from pk: ip-1.1.1.1}} {{accountid-xxx | ip-1.1.1.2-total | 1 }}{{# duplicate from pk: ip-1.1.1.2}} {{accountid-yyy | ip-1.1.1.1 | 1}} {{{}accountid-yyy | ip-1.1.1.1-total | 2 # duplicate from pk: ip-1.1.1.1{}}}{{{{}}{}}} {{ip-1.1.1.1 | total | 2}} {{ip-1.1.1.2 | total | 1}} This would allow me to get all the aggregation data for the account and ip address with one query (by accountid-xxx) instead of 3 queries (by accountid-xxx, ip-1.1.1.1, and ip-1.1.1.2). I could accomplish this with a GSI, but that would increase my DynamoDB cost. I have been able to accomplish this using a FlatMap function. However, this complicates my code and increases the number of tasks in my Flink Application. The simplest and most cost effective solution would be to be able to insert multiple DynamoDB items from a single aggregation. was (Author: JIRAUSER300083): Hello [~liangtl] Thank you for the reply. To better describe a use case, I have aggregations data for how many times a user logs in from a specific ip address. I also have aggregations for how many times any user logs in from that ip address. These are two separate DynamoDB Items with different partition keys. For two different accounts logging in I might have DynamoDB items like: {{partition_key | sort_key | count}} {{--------------|------------|------}} {{accountid-xxx | ip-1.1.1.1 | 1 }}{{{}# number of times account xxx logged in from ip 1{}}}{{{}{}}} {{accountid-xxx | ip-1.1.1.2 | 1 }}{{{}# number of times account xxx logged in from ip 2{}}}{{{}{}}} {{accountid-yyy | ip-1.1.1.1 | 1}} {{ip-1.1.1.1 | total | 2 }}{{{}# number of times any account logged in from ip 1{}}}{{{}{}}} {{ip-1.1.1.2 | total | 1 }}{{{}# number of times any account logged in from ip 2{}}}{{{}{}}} When making a query for counts by an account id, I also need total stats for each ip address they log in from. So I have to make and additional query for each ip address. I would like to optimize the query by duplicating the ip total entries for each record with the account partition_key, making a table like: {{partition_key | sort_key | count}} {{--------------|------------------|------}} {{accountid-xxx | ip-1.1.1.1 | 1 }} {{accountid-xxx | ip-1.1.1.2 | 1 }} {{accountid-xxx | ip-1.1.1.1-total | 2 # duplicate from pk: ip-1.1.1.1}} {{accountid-xxx | ip-1.1.1.2-total | 1 }}{{# duplicate from pk: ip-1.1.1.2}} {{accountid-yyy | ip-1.1.1.1 | 1}} {{accountid-yyy | ip-1.1.1.1-total | 2 }}{{{}# duplicate from pk: ip-1.1.1.1{}}}{{{}{}}} {{ip-1.1.1.1 | total | 2}} {{ip-1.1.1.2 | total | 1}} This would allow me to get all the aggregation data for the account and ip address with one query (by accountid-xxx) instead of 3 queries (by accountid-xxx, ip-1.1.1.1, and ip-1.1.1.2). I could accomplish this with a GSI, but that would increase my DynamoDB cost. I have been able to accomplish this using a FlatMap function. However, this complicates my code and increases the number of tasks in my Flink Application. The simplest and most cost effective solution would be to be able to insert multiple DynamoDB items from a single aggregation. > DynamoDB Sink Allow Multiple Item Writes > ---------------------------------------- > > Key: FLINK-31946 > URL: https://issues.apache.org/jira/browse/FLINK-31946 > Project: Flink > Issue Type: Improvement > Components: Connectors / DynamoDB > Reporter: Curtis Jensen > Priority: Minor > > In some cases, it is desirable to be able to write aggregation data to > multiple partition keys. This supports the case of denormalizing data to > facilitate more efficient read operations. > However, the DynamoDBSink allows for only a single DynamoDB item to be > generated for each Flink Element. This appears to be a limitation of the > ElementConverter more than DyanmoDBSink. -- This message was sent by Atlassian Jira (v8.20.10#820010)