This is an automated email from the ASF dual-hosted git repository. dataroaring pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push: new ff365ca1303 [docs] (DebugPoints) Update docs about Debug Points (#28347) ff365ca1303 is described below commit ff365ca13034f4df84a006c2507bfd6a91c150d4 Author: HowardQin <hao....@esgyn.cn> AuthorDate: Mon Dec 25 09:33:47 2023 +0800 [docs] (DebugPoints) Update docs about Debug Points (#28347) --------- Co-authored-by: qinhao <qin...@newland.com.cn> --- .../http-actions/fe/debug-point-action.md | 243 +++++++++++++++++---- .../http-actions/fe/debug-point-action.md | 215 ++++++++++++++---- 2 files changed, 376 insertions(+), 82 deletions(-) diff --git a/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md b/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md index cac2afdcd25..84ad9bf324a 100644 --- a/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md +++ b/docs/en/docs/admin-manual/http-actions/fe/debug-point-action.md @@ -26,9 +26,17 @@ under the License. # Debug Point -Debug point is used in code test. When enabling a debug point, it can run related code. +Debug point is a piece of code, inserted into FE or BE code, when program running into this code, -Both FE and BE support debug points. +it can change variables or behaviors of the program. + +It is mainly used for unit test or regression test when it is impossible to trigger some exceptions through normal means. + +Each debug point has a name, the name can be whatever you want, there are swithes to enable and disable debug points, + +and you can also pass data to debug points. + +Both FE and BE support debug point, and after inserting debug point code, recompilation of FE or BE is needed. ## Code Example @@ -36,8 +44,8 @@ FE example ```java private Status foo() { - // dbug_fe_foo_do_nothing is the debug point name. - // When it's active,DebugPointUtil.isEnable("dbug_fe_foo_do_nothing") will return true. + // dbug_fe_foo_do_nothing is the debug point name + // when it's active, DebugPointUtil.isEnable("dbug_fe_foo_do_nothing") returns true if (DebugPointUtil.isEnable("dbug_fe_foo_do_nothing")) { return Status.Nothing; } @@ -48,13 +56,13 @@ private Status foo() { } ``` -BE 桩子示例代码 +BE example ```c++ void Status foo() { - // dbug_be_foo_do_nothing is the debug point name. - // When it's active,DEBUG_EXECUTE_IF will execute the code block. - DEBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; }); + // dbug_be_foo_do_nothing is the debug point name + // when it's active, DBUG_EXECUTE_IF will execute the code block + DBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; }); do_foo_action(); @@ -62,32 +70,36 @@ void Status foo() { } ``` -## Global config -To activate debug points, need set `enable_debug_points` to true. +## Global Config + +To enable debug points globally, we need to set `enable_debug_points` to true, + +`enable_debug_points` is located in FE's fe.conf and BE's be.conf. -`enable_debug_points` was located in FE's fe.conf and BE's be.conf。 +## Activate A Specified Debug Point -## Enable Debug Point +After debug points are enabled globally, a http request with a debug point name should be send to FE or BE node, <br/> +only after that, when the program running into the specified debug point, related code can be executed. ### API ``` - POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>] +POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>] ``` ### Query Parameters * `debug_point_name` - Debug point name. Require. + Debug point name. Required. * `timeout` - Timeout in seconds. When timeout, the debug point will be disable. Default is -1, not timeout. Optional. + Timeout in seconds. When timeout, the debug point will be deactivated. Default is -1, never timeout. Optional. * `execute` - Max active times。Default is -1, unlimit active times. Optional. + After activating, the max times the debug point can be executed. Default is -1, unlimited times. Optional. ### Request body @@ -96,24 +108,105 @@ None ### Response - ``` - { - msg: "OK", - code: 0 - } - ``` +``` +{ + msg: "OK", + code: 0 +} +``` ### Examples -Enable debug point `foo`, activate no more than five times. +After activating debug point `foo`, executed no more than five times. - ``` - curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5" +``` +curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5" + +``` + + +## Pass Custom Parameters +When activating debug point, besides "timeout" and "execute" mentioned above, passing custom parameters is also allowed.<br/> +A parameter is a key-value pair in the form of "key=value" in url path, after debug point name glued by charactor '?'.<br/> +See examples below. + +### API + +``` +POST /api/debug_point/add/{debug_point_name}[?k1=v1&k2=v2&k3=v3...] +``` +* `k1=v1` <br/> + k1 is parameter name <br/> + v1 is parameter value <br/> + multiple key-value pairs are concatenated by `&` <br/> + + + +### Request body + +None + +### Response + +``` +{ + msg: "OK", + code: 0 +} +``` + +### Examples +Assuming a FE node with configuration http_port=8030 in fe.conf, <br/> +the following http request activates a debug point named `foo` in FE node and passe parameter `percent` and `duration`: +>NOTE: User name and password may be needed. +``` +curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?percent=0.5&duration=3" +``` + +``` +NOTE: +1. Inside FE and BE code, names and values of parameters are taken as strings. +2. Parameter names and values are case sensitive in http request and FE/BE code. +3. FE and BE share same url paths of REST API, it's just their IPs and Ports are different. +``` + +### Use parameters in FE and BE code +Following request activates debug point `OlapTableSink.write_random_choose_sink` in FE and passes parameter `needCatchUp` and `sinkNum`: +``` +curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/OlapTableSink.write_random_choose_sink?needCatchUp=true&sinkNum=3" +``` + +The code in FE checks debug point `OlapTableSink.write_random_choose_sink` and gets parameter values: +```java +private void debugWriteRandomChooseSink(Tablet tablet, long version, Multimap<Long, Long> bePathsMap) { + DebugPoint debugPoint = DebugPointUtil.getDebugPoint("OlapTableSink.write_random_choose_sink"); + if (debugPoint == null) { + return; + } + boolean needCatchup = debugPoint.param("needCatchUp", false); + int sinkNum = debugPoint.param("sinkNum", 0); + ... +} +``` + +Following request activates debug point `TxnManager.prepare_txn.random_failed` in BE and passes parameter `percent`: +``` +curl -X POST "http://127.0.0.1:8040/api/debug_point/add/TxnManager.prepare_txn.random_failed?percent=0.7 +``` + +The code in BE checks debug point `TxnManager.prepare_txn.random_failed` and gets parameter value: +```c++ +DBUG_EXECUTE_IF("TxnManager.prepare_txn.random_failed", + {if (rand() % 100 < (100 * dp->param("percent", 0.5))) { + LOG_WARNING("TxnManager.prepare_txn.random_failed random failed"); + return Status::InternalError("debug prepare txn random failed"); + }} +); +``` + - ``` - ## Disable Debug Point ### API @@ -137,10 +230,10 @@ None ### Response ``` - { - msg: "OK", - code: 0 - } +{ + msg: "OK", + code: 0 +} ``` ### Examples @@ -149,17 +242,17 @@ None Disable debug point `foo`。 - ``` - curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo" +``` +curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo" - ``` +``` ## Clear Debug Points ### API ``` - POST /api/debug_point/clear +POST /api/debug_point/clear ``` @@ -170,16 +263,78 @@ None ### Response - ``` - { - msg: "OK", - code: 0 - } - ``` +``` +{ + msg: "OK", + code: 0 +} +``` ### Examples - ``` - curl -X POST "http://127.0.0.1:8030/api/debug_point/clear" - ``` \ No newline at end of file +``` +curl -X POST "http://127.0.0.1:8030/api/debug_point/clear" +``` + +## Debug Points in Regression Test + +>In community's CI system, `enable_debug_points` configuration of FE and BE are true by default. + +The Regression test framework also provides methods to activate and deactivate a particular debug point, <br/> +they are declared as below: +```groovy +// "name" is the debug point to activate, "params" is a list of key-value pairs passed to debug point +def enableDebugPointForAllFEs(String name, Map<String, String> params = null); +def enableDebugPointForAllBEs(String name, Map<String, String> params = null); +// "name" is the debug point to deactivate +def disableDebugPointForAllFEs(String name); +def disableDebugPointForAllFEs(String name); +``` +`enableDebugPointForAllFEs()` or `enableDebugPointForAllBEs()` needs to be called before the test actions you want to generate error, <br/> +and `disableDebugPointForAllFEs()` or `disableDebugPointForAllBEs()` needs to be called afterward. + +### Concurrent Issue + +Enabled debug points affects FE or BE globally, which could cause other concurrent tests to fail unexpectly in your pull request. <br/> +To avoid this, there's a convension that regression tests using debug points must be in directory regression-test/suites/fault_injection_p0, <br/> +and their group name must be "nonConcurrent", as these regression tests will be executed serially by pull request workflow. + +### Examples + +```groovy +// .groovy file of the test case must be in regression-test/suites/fault_injection_p0 +// and the group name must be 'nonConcurrent' +suite('debugpoint_action', 'nonConcurrent') { + try { + // Activate debug point named "PublishVersionDaemon.stop_publish" in all FE + // and pass parameter "timeout" + // "execute" and "timeout" are pre-existing parameters, usage is mentioned above + GetDebugPoint().enableDebugPointForAllFEs('PublishVersionDaemon.stop_publish', [timeout:1]) + + // Activate debug point named "Tablet.build_tablet_report_info.version_miss" in all BE + // and pass parameter "tablet_id", "version_miss" and "timeout" + GetDebugPoint().enableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss', + [tablet_id:'12345', version_miss:true, timeout:1]) + + // Test actions which will run into debug point and generate error + sql """CREATE TABLE tbl_1 (k1 INT, k2 INT) + DUPLICATE KEY (k1) + DISTRIBUTED BY HASH(k1) + BUCKETS 3 + PROPERTIES ("replication_allocation" = "tag.location.default: 1"); + """ + sql "INSERT INTO tbl_1 VALUES (1, 10)" + sql "INSERT INTO tbl_1 VALUES (2, 20)" + order_qt_select_1_1 'SELECT * FROM tbl_1' + + } finally { + // Deactivate debug points + GetDebugPoint().disableDebugPointForAllFEs('PublishVersionDaemon.stop_publish') + GetDebugPoint().disableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss') + } +} +``` + + + diff --git a/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md b/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md index a1c9a59a35b..df68ac003c8 100644 --- a/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md +++ b/docs/zh-CN/docs/admin-manual/http-actions/fe/debug-point-action.md @@ -26,9 +26,13 @@ under the License. # 代码打桩 -代码打桩是代码测试使用的。激活木桩后,可以执行木桩代码。木桩的名字是任意取的。 +代码打桩,是指在 FE 或 BE 源码中插入一段代码,当程序执行到这里时,可以改变程序的变量或行为,这样的一段代码称为一个`木桩`。 -FE 和 BE 都支持代码打桩。 +主要用于单元测试或回归测试,用来构造正常方法无法实现的异常。 + +每一个木桩都有一个名称,可以随便取名,可以通过一些机制控制木桩的开关,还可以向木桩传递参数。 + +FE 和 BE 都支持代码打桩,打桩完后要重新编译 BE 或 FE。 ## 木桩代码示例 @@ -54,8 +58,8 @@ BE 桩子示例代码 void Status foo() { // dbug_be_foo_do_nothing 是一个木桩名字, - // 打开这个木桩之后,DEBUG_EXECUTE_IF 将会执行宏参数中的代码块 - DEBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; }); + // 打开这个木桩之后,DBUG_EXECUTE_IF 将会执行宏参数中的代码块 + DBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; }); do_foo_action(); @@ -71,11 +75,12 @@ void Status foo() { ## 打开木桩 +打开总开关后,还需要通过向 FE 或 BE 发送 http 请求的方式,打开或关闭指定名称的木桩,只有这样当代码执行到这个木桩时,相关代码才会被执行。 ### API ``` - POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>] +POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>] ``` @@ -85,10 +90,10 @@ void Status foo() { 木桩名字。必填。 * `timeout` - 超时时间,单位为秒。超时之后,木桩失活。默认值-1表示永远不超时。可填。 + 超时时间,单位为秒。超时之后,木桩失活。默认值-1表示永远不超时。可选。 * `execute` - 木桩最大激活次数。默认值-1表示不限激活次数。可填。 + 木桩最大执行次数。默认值-1表示不限执行次数。可选。 ### Request body @@ -97,30 +102,109 @@ void Status foo() { ### Response - ``` - { - msg: "OK", - code: 0 - } - ``` +``` +{ + msg: "OK", + code: 0 +} +``` ### Examples -打开木桩 `foo`,最多激活5次。 +打开木桩 `foo`,最多执行5次。 - ``` - curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5" +``` +curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5" + +``` - ``` +## 向木桩传递参数 + +激活木桩时,除了前文所述的 timeout 和 execute,还可以传递其它自定义参数。<br/> +一个参数是一个形如 key=value 的 key-value 对,在 url 的路径部分,紧跟在木桩名称后,以字符 '?' 开头。 + +### API + +``` +POST /api/debug_point/add/{debug_point_name}[?k1=v1&k2=v2&k3=v3...] +``` +* `k1=v1` + k1为参数名称,v1为参数值,多个参数用&分隔。 + +### Request body + +无 + +### Response + +``` +{ + msg: "OK", + code: 0 +} +``` + +### Examples + +假设 FE 在 fe.conf 中有配置 http_port=8030,则下面的请求激活 FE 中的木桩`foo`,并传递了两个参数 `percent` 和 `duration`: + +``` +curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?percent=0.5&duration=3" +``` + +``` +注意: +1、在 FE 或 BE 的代码中,参数名和参数值都是字符串。 +2、在 FE 或 BE 的代码中和 http 请求中,参数名称和值都是大小写敏感的。 +3、发给 FE 或 BE 的 http 请求,路径部分格式是相同的,只是 IP 地址和端口号不同。 +``` + +### 在 FE 和 BE 代码中使用参数 + +激活 FE 中的木桩`OlapTableSink.write_random_choose_sink`并传递参数 `needCatchUp` 和 `sinkNum`: +>注意:可能需要用户名和密码 +``` +curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/OlapTableSink.write_random_choose_sink?needCatchUp=true&sinkNum=3" +``` + +在 FE 代码中使用木桩 OlapTableSink.write_random_choose_sink 的参数 `needCatchUp` 和 `sinkNum`: +```java +private void debugWriteRandomChooseSink(Tablet tablet, long version, Multimap<Long, Long> bePathsMap) { + DebugPoint debugPoint = DebugPointUtil.getDebugPoint("OlapTableSink.write_random_choose_sink"); + if (debugPoint == null) { + return; + } + boolean needCatchup = debugPoint.param("needCatchUp", false); + int sinkNum = debugPoint.param("sinkNum", 0); + ... +} +``` + + +激活 BE 中的木桩`TxnManager.prepare_txn.random_failed`并传递参数 `percent`: +``` +curl -X POST "http://127.0.0.1:8040/api/debug_point/add/TxnManager.prepare_txn.random_failed?percent=0.7 +``` +在 BE 代码中使用木桩 `TxnManager.prepare_txn.random_failed` 的参数 `percent`: +```c++ +DBUG_EXECUTE_IF("TxnManager.prepare_txn.random_failed", + {if (rand() % 100 < (100 * dp->param("percent", 0.5))) { + LOG_WARNING("TxnManager.prepare_txn.random_failed random failed"); + return Status::InternalError("debug prepare txn random failed"); + }} +); +``` + + ## 关闭木桩 ### API ``` - POST /api/debug_point/remove/{debug_point_name} +POST /api/debug_point/remove/{debug_point_name} ``` @@ -137,10 +221,10 @@ void Status foo() { ### Response ``` - { - msg: "OK", - code: 0 - } +{ + msg: "OK", + code: 0 +} ``` ### Examples @@ -149,39 +233,94 @@ void Status foo() { 关闭木桩`foo`。 - ``` - curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo" - - ``` +``` +curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo" +``` ## 清除所有木桩 ### API ``` - POST /api/debug_point/clear +POST /api/debug_point/clear ``` - - ### Request body 无 ### Response - ``` - { - msg: "OK", - code: 0 - } - ``` +``` +{ + msg: "OK", + code: 0 +} +``` ### Examples 清除所有木桩。 - ``` - curl -X POST "http://127.0.0.1:8030/api/debug_point/clear" - ``` +``` +curl -X POST "http://127.0.0.1:8030/api/debug_point/clear" +``` + +## 在回归测试中使用木桩 + +> 提交PR时,社区 CI 系统默认开启 FE 和 BE 的`enable_debug_points`配置。 + +回归测试框架提供方法函数来开关指定的木桩,它们声明如下: + +```groovy +// 打开木桩,name 是木桩名称,params 是一个key-value列表,是传给木桩的参数 +def enableDebugPointForAllFEs(String name, Map<String, String> params = null); +def enableDebugPointForAllBEs(String name, Map<String, String> params = null); +// 关闭木桩,name 是木桩的名称 +def disableDebugPointForAllFEs(String name); +def disableDebugPointForAllFEs(String name); +``` +需要在调用测试 action 之前调用 `enableDebugPointForAllFEs()` 或 `enableDebugPointForAllBEs()` 来开启木桩, <br/> +这样执行到木桩代码时,相关代码才会被执行,<br/> +然后在调用测试 action 之后调用 `disableDebugPointForAllFEs()` 或 `disableDebugPointForAllBEs()` 来关闭木桩。 + +### 并发问题 + +FE 或 BE 中开启的木桩是全局生效的,同一个 Pull Request 中,并发跑的其它测试,可能会受影响而意外失败。 +为了避免这种情况,我们规定,使用木桩的回归测试,必须放在 regression-test/suites/fault_injection_p0 目录下, +且组名必须设置为 `nonConcurrent`,社区 CI 系统对于这些用例,会串行运行。 + +### Examples + +```groovy +// 测试用例的.groovy 文件必须放在 regression-test/suites/fault_injection_p0 目录下, +// 且组名设置为 'nonConcurrent' +suite('debugpoint_action', 'nonConcurrent') { + try { + // 打开所有FE中,名为 "PublishVersionDaemon.stop_publish" 的木桩 + // 传参数 timeout + // 与上面curl调用时一样,execute 是执行次数,timeout 是超时秒数 + GetDebugPoint().enableDebugPointForAllFEs('PublishVersionDaemon.stop_publish', [timeout:1]) + // 打开所有BE中,名为 "Tablet.build_tablet_report_info.version_miss" 的木桩 + // 传参数 tablet_id, version_miss 和 timeout + GetDebugPoint().enableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss', + [tablet_id:'12345', version_miss:true, timeout:1]) + + // 测试用例,会触发木桩代码的执行 + sql """CREATE TABLE tbl_1 (k1 INT, k2 INT) + DUPLICATE KEY (k1) + DISTRIBUTED BY HASH(k1) + BUCKETS 3 + PROPERTIES ("replication_allocation" = "tag.location.default: 1"); + """ + sql "INSERT INTO tbl_1 VALUES (1, 10)" + sql "INSERT INTO tbl_1 VALUES (2, 20)" + order_qt_select_1_1 'SELECT * FROM tbl_1' + + } finally { + GetDebugPoint().disableDebugPointForAllFEs('PublishVersionDaemon.stop_publish') + GetDebugPoint().disableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss') + } +} +``` --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org