[ https://issues.apache.org/jira/browse/FLINK-27625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lincoln lee updated FLINK-27625: -------------------------------- Description: The hint name discuss thread: https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8 FLINK-27623 adds a global parameter 'table.exec.async-lookup.output-mode' for table users so that all three control parameters related to async I/O can be configured at the same job level. As planned in the issue, we‘d like to go a step further to offer more precise control for async join operation more than job level config, to introduce a new join hint: ‘ASYNC_LOOKUP’. For the hint option, for intuitive and user-friendly reasons, we want to support both simple and kv forms, with all options except table name being optional (use job level configuration if not set) # 1. simple form: (ordered hint option list) ``` ASYNC_LOOKUP('tableName'[, 'output-mode', 'buffer-capacity', 'timeout']) optional: output-mode buffer-capacity timeout ``` Note: since Calcite currently does not support the mixed type hint options, the table name here needs to be a string instead of an identifier. (For `SqlHint`: The option format can not be mixed in, they should either be all simple identifiers or all literals or all key value pairs.) We can improve this after Calcite support. # 2. kv form: (support unordered hint option list) ``` ASYNC_LOOKUP('table'='tableName'[, 'output-mode'='ordered|allow-unordered', 'capacity'='int', 'timeout'='duration']) optional kvs: 'output-mode'='ordered|allow-unordered' 'capacity'='int' 'timeout'='duration' ``` e.g., if the job level configuration is: ``` table.exec.async-lookup.output-mode: ORDERED table.exec.async-lookup.buffer-capacity: 100 table.exec.async-lookup.timeout: 180s ``` then the following hints: ``` 1. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '300s') 2. ASYNC_LOOKUP('dim1', 'allow-unordered', '200') 3. ASYNC_LOOKUP('table'='dim1', 'output-mode'='allow-unordered') 4. ASYNC_LOOKUP('table'='dim1', 'timeout'='300s') 5. ASYNC_LOOKUP('table'='dim1', 'capacity'='300') ``` are equivalent to: ``` 1. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '300s') 2. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '180s') 3. ASYNC_LOOKUP('table'='dim1', 'output-mode'='allow-unordered', 'capacity'='100', 'timeout'='180s') 4. ASYNC_LOOKUP('table'='dim1', 'output-mode'='ordered', 'capacity'='100', 'timeout'='300s') 5. ASYNC_LOOKUP('table'='dim1', 'output-mode'='ordered', 'capacity'='300', 'timeout'='180s') ``` In addition, if the lookup source implements both sync and async table function, the planner prefers to choose the async function when the 'ASYNC_LOOKUP' hint is specified. was: Add query hint for async lookup join for join level control: e.g., {code} // ordered mode ASYNC_LOOKUP(dim1, 'ordered', '100', '180s') // unordered mode ASYNC_LOOKUP(dim1, 'allow-unordered', '100', '180s') {code} TODO: The hint name should be discussed in ML. > Add query hint for async lookup join > ------------------------------------ > > Key: FLINK-27625 > URL: https://issues.apache.org/jira/browse/FLINK-27625 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API > Reporter: lincoln lee > Assignee: lincoln lee > Priority: Major > Fix For: 1.16.0 > > > The hint name discuss thread: > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8 > FLINK-27623 adds a global parameter 'table.exec.async-lookup.output-mode' for > table users so that all three control parameters related to async I/O can be > configured at the same job level. > As planned in the issue, we‘d like to go a step further to offer more precise > control for async join operation more than job level config, to introduce a > new join hint: ‘ASYNC_LOOKUP’. > For the hint option, for intuitive and user-friendly reasons, we want to > support both simple and kv forms, with all options except table name being > optional (use job level configuration if not set) > # 1. simple form: (ordered hint option list) > ``` > ASYNC_LOOKUP('tableName'[, 'output-mode', 'buffer-capacity', 'timeout']) > optional: > output-mode > buffer-capacity > timeout > ``` > Note: since Calcite currently does not support the mixed type hint options, > the table name here needs to be a string instead of an identifier. (For > `SqlHint`: The option format can not be mixed in, they should either be all > simple identifiers or all literals or all key value pairs.) We can improve > this after Calcite support. > # 2. kv form: (support unordered hint option list) > ``` > ASYNC_LOOKUP('table'='tableName'[, 'output-mode'='ordered|allow-unordered', > 'capacity'='int', 'timeout'='duration']) > optional kvs: > 'output-mode'='ordered|allow-unordered' > 'capacity'='int' > 'timeout'='duration' > ``` > e.g., if the job level configuration is: > ``` > table.exec.async-lookup.output-mode: ORDERED > table.exec.async-lookup.buffer-capacity: 100 > table.exec.async-lookup.timeout: 180s > ``` > then the following hints: > ``` > 1. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '300s') > 2. ASYNC_LOOKUP('dim1', 'allow-unordered', '200') > 3. ASYNC_LOOKUP('table'='dim1', 'output-mode'='allow-unordered') > 4. ASYNC_LOOKUP('table'='dim1', 'timeout'='300s') > 5. ASYNC_LOOKUP('table'='dim1', 'capacity'='300') > ``` > are equivalent to: > ``` > 1. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '300s') > 2. ASYNC_LOOKUP('dim1', 'allow-unordered', '200', '180s') > 3. ASYNC_LOOKUP('table'='dim1', 'output-mode'='allow-unordered', > 'capacity'='100', 'timeout'='180s') > 4. ASYNC_LOOKUP('table'='dim1', 'output-mode'='ordered', 'capacity'='100', > 'timeout'='300s') > 5. ASYNC_LOOKUP('table'='dim1', 'output-mode'='ordered', 'capacity'='300', > 'timeout'='180s') > ``` > In addition, if the lookup source implements both sync and async table > function, the planner prefers to choose the async function when the > 'ASYNC_LOOKUP' hint is specified. -- This message was sent by Atlassian Jira (v8.20.7#820007)