[ https://issues.apache.org/jira/browse/HIVE-28639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-28639: ---------------------------------- Labels: pull-request-available (was: ) > Aggregate storage statistics in Hive LLAP > ----------------------------------------- > > Key: HIVE-28639 > URL: https://issues.apache.org/jira/browse/HIVE-28639 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Since TEZ-4451, we maintain a thread local snapshot of the IOStatistics in > TaskRunner2Callable, which can be reused in LLAP (due to its threadlocal > nature). > Motivation here: in cloud environments, the stats provided by > FileSystem.Statistics are not suitable for in-depth debugging, we only have > bytesRead, bytesWrite and so, and in case of throttling and retries, we don't > have the chance to tell what led to performance degradation. > The proposal here is to utilize the IOStatistics already given by Tez to get > stats like: > {code} > query-executor <14>1 2024-11-20T21:46:01.006Z query-executor-0-0 > query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 > class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server > handler 2 on 25000"] IOStatistics: counters=((action_file_opened=3177) > (action_http_get_request=60363) > (action_http_head_request=3177) > (audit_request_execution=63540) > (audit_span_creation=3178) > (object_metadata_request=3177) > (op_open=3177) > (store_io_request=63540) > (stream_read_bytes=3781596661) > (stream_read_close_operations=3177) > (stream_read_closed=60363) > (stream_read_opened=60363) > (stream_read_operations=1107101) > (stream_read_operations_incomplete=296426) > (stream_read_remote_stream_drain=60363) > (stream_read_seek_policy_changed=3177) > (stream_read_total_bytes=3781596661)); > gauges=(); > minimums=((action_file_opened.min=10) > (action_http_get_request.min=17) > (action_http_head_request.min=6) > (stream_read_remote_stream_drain.min=0)); > maximums=((action_file_opened.max=1200) > (action_http_get_request.max=378) > (action_http_head_request.max=1176) > (stream_read_remote_stream_drain.max=3)); > means=((action_file_opened.mean=(samples=3177, sum=47320, mean=14.8946)) > (action_http_get_request.mean=(samples=60363, sum=1577407, mean=26.1320)) > (action_http_head_request.mean=(samples=3177, sum=46953, mean=14.7790)) > (stream_read_remote_stream_drain.mean=(samples=60363, sum=509, mean=0.0084))); > query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0 > query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 > class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server > handler 2 on 25000"] IOStatistics: counters=((action_http_head_request=14578) > (action_http_head_request.failures=290) > (audit_request_execution=36068) > (audit_span_creation=17857) > (files_created=3584) > (ignored_errors=30) > (object_list_request=17871) > (object_list_request.failures=15) > (object_metadata_request=14578) > (object_put_bytes=2483204365) > (object_put_request=3619) > (object_put_request.failures=40) > (object_put_request_completed=3619) > (op_create=3584) > (op_exists=10704) > (op_mkdirs=3568) > (store_io_request=38021) > (store_io_retry=2018) > (store_io_throttled=310) > (stream_write_block_uploads=3584) > (stream_write_bytes=2460606811) > (stream_write_total_data=4914757178)); > gauges=((stream_write_block_uploads_data_pending=3228222) > (stream_write_block_uploads_pending=3584)); > minimums=((action_http_head_request.failures.min=5) > (action_http_head_request.min=5) > (object_list_request.failures.min=6) > (object_list_request.min=8) > (object_put_request.failures.min=125) > (object_put_request.min=111) > (op_create.min=16) > (op_exists.min=15) > (op_mkdirs.min=10)); > maximums=((action_http_head_request.failures.max=3025) > (action_http_head_request.max=2760) > (object_list_request.failures.max=3005) > (object_list_request.max=5915) > (object_put_request.failures.max=60008) > (object_put_request.max=5596) > (op_create.max=17899) > (op_exists.max=57540) > (op_mkdirs.max=5703)); > means=((action_http_head_request.failures.mean=(samples=290, sum=257010, > mean=886.2414)) > (action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165)) > (object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000)) > (object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735)) > (object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250)) > (object_put_request.mean=(samples=3579, sum=748058, mean=209.0131)) > (op_create.mean=(samples=3584, sum=831359, mean=231.9640)) > (op_exists.mean=(samples=10704, sum=1020671, mean=95.3542)) > (op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396))); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)