JingsongLi commented on code in PR #228: URL: https://github.com/apache/flink-table-store/pull/228#discussion_r925107204
########## docs/content/docs/development/create-table.md: ########## @@ -166,10 +166,38 @@ Partitioned filtering is the most effective way to improve performance, your query statements should contain partition filtering conditions as much as possible. -## Bucket +## Bucket and Bucket Key Review Comment: Just `Bucket`? `Bucket and Bucket Key` is a little weird. ########## docs/content/docs/development/create-table.md: ########## @@ -166,10 +166,38 @@ Partitioned filtering is the most effective way to improve performance, your query statements should contain partition filtering conditions as much as possible. -## Bucket +## Bucket and Bucket Key + +Bucket is the concept of dividing data into more manageable parts for more efficient queries. + +With `N` as bucket number, records are falling into `(0, 1, ..., N-1)` buckets. For each record, which bucket +it belongs is computed by the hash value of one or more columns (denoted as **bucket key**), and mod by bucket number. + +``` +bucket_id = hash_func(bucket_key) % num_of_buckets +``` + +Users can specify the bucket key as follows + +```sql +CREATE TABLE MyTable ( + catalog_id BIGINT, + user_id BIGINT, + item_id BIGINT, + behavior STRING, + dt STRING +) WITH ( + 'bucket-key' = 'catalog_id' +); +``` + +__Note:__ +- If users do not specify the bucket key explicitly + - For changelog table, the primary key (if present) or the whole row is used as bucket key. Review Comment: I think it is OK to just say with and without key. changelog / append-only table is the concept of orthogonality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org