[ 
https://issues.apache.org/jira/browse/HIVE-28639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28639:
--------------------------------
        Parent: HIVE-27639
    Issue Type: Sub-task  (was: Improvement)

> Aggregate storage statistics in Hive LLAP
> -----------------------------------------
>
>                 Key: HIVE-28639
>                 URL: https://issues.apache.org/jira/browse/HIVE-28639
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Since TEZ-4451, we maintain a thread local snapshot of the IOStatistics in 
> TaskRunner2Callable, which can be reused in LLAP (due to its threadlocal 
> nature).
> Motivation here: in cloud environments, the stats provided by 
> FileSystem.Statistics are not suitable for in-depth debugging, we only have 
> bytesRead, bytesWrite and so, and in case of throttling and retries, we don't 
> have the chance to tell what led to performance degradation.
> The proposal here is to utilize the IOStatistics already given by Tez to get 
> stats like:
> {code}
> query-executor <14>1 2024-11-20T21:46:01.006Z query-executor-0-0 
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server 
> handler 2 on 25000"] IOStatistics: counters=((action_file_opened=3177)
> (action_http_get_request=60363)
> (action_http_head_request=3177)
> (audit_request_execution=63540)
> (audit_span_creation=3178)
> (object_metadata_request=3177)
> (op_open=3177)
> (store_io_request=63540)
> (stream_read_bytes=3781596661)
> (stream_read_close_operations=3177)
> (stream_read_closed=60363)
> (stream_read_opened=60363)
> (stream_read_operations=1107101)
> (stream_read_operations_incomplete=296426)
> (stream_read_remote_stream_drain=60363)
> (stream_read_seek_policy_changed=3177)
> (stream_read_total_bytes=3781596661));
> gauges=();
> minimums=((action_file_opened.min=10)
> (action_http_get_request.min=17)
> (action_http_head_request.min=6)
> (stream_read_remote_stream_drain.min=0));
> maximums=((action_file_opened.max=1200)
> (action_http_get_request.max=378)
> (action_http_head_request.max=1176)
> (stream_read_remote_stream_drain.max=3));
> means=((action_file_opened.mean=(samples=3177, sum=47320, mean=14.8946))
> (action_http_get_request.mean=(samples=60363, sum=1577407, mean=26.1320))
> (action_http_head_request.mean=(samples=3177, sum=46953, mean=14.7790))
> (stream_read_remote_stream_drain.mean=(samples=60363, sum=509, mean=0.0084)));
> query-executor <14>1 2024-11-20T21:46:03.024Z query-executor-0-0 
> query-executor 1 f886d546-60fc-43c7-b8cb-f92b5b1d6e21 [mdc@38374 
> class="statistics.IOStatisticsLogging" level="INFO" thread="IPC Server 
> handler 2 on 25000"] IOStatistics: counters=((action_http_head_request=14578)
> (action_http_head_request.failures=290)
> (audit_request_execution=36068)
> (audit_span_creation=17857)
> (files_created=3584)
> (ignored_errors=30)
> (object_list_request=17871)
> (object_list_request.failures=15)
> (object_metadata_request=14578)
> (object_put_bytes=2483204365)
> (object_put_request=3619)
> (object_put_request.failures=40)
> (object_put_request_completed=3619)
> (op_create=3584)
> (op_exists=10704)
> (op_mkdirs=3568)
> (store_io_request=38021)
> (store_io_retry=2018)
> (store_io_throttled=310)
> (stream_write_block_uploads=3584)
> (stream_write_bytes=2460606811)
> (stream_write_total_data=4914757178));
> gauges=((stream_write_block_uploads_data_pending=3228222)
> (stream_write_block_uploads_pending=3584));
> minimums=((action_http_head_request.failures.min=5)
> (action_http_head_request.min=5)
> (object_list_request.failures.min=6)
> (object_list_request.min=8)
> (object_put_request.failures.min=125)
> (object_put_request.min=111)
> (op_create.min=16)
> (op_exists.min=15)
> (op_mkdirs.min=10));
> maximums=((action_http_head_request.failures.max=3025)
> (action_http_head_request.max=2760)
> (object_list_request.failures.max=3005)
> (object_list_request.max=5915)
> (object_put_request.failures.max=60008)
> (object_put_request.max=5596)
> (op_create.max=17899)
> (op_exists.max=57540)
> (op_mkdirs.max=5703));
> means=((action_http_head_request.failures.mean=(samples=290, sum=257010, 
> mean=886.2414))
> (action_http_head_request.mean=(samples=14288, sum=274566, mean=19.2165))
> (object_list_request.failures.mean=(samples=15, sum=9216, mean=614.4000))
> (object_list_request.mean=(samples=17856, sum=1304800, mean=73.0735))
> (object_put_request.failures.mean=(samples=40, sum=2103009, mean=52575.2250))
> (object_put_request.mean=(samples=3579, sum=748058, mean=209.0131))
> (op_create.mean=(samples=3584, sum=831359, mean=231.9640))
> (op_exists.mean=(samples=10704, sum=1020671, mean=95.3542))
> (op_mkdirs.mean=(samples=3568, sum=277018, mean=77.6396)));
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to