[jira] [Updated] (HIVE-28638) Refactor stats handling in StatsRecordingThreadPool

Jira Sun, 24 Nov 2024 08:08:13 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-28638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


László Bodor updated HIVE-28638:
--------------------------------
    Description: 
File system statistics in LLAP is handled like:
1. get all statistics before task callable (contains thread local), using 
FileSystem.getAllStatistics()
2. if more than one entry belongs to the same scheme, it's merged
3. run task
4. get all statistics again, and subtract the pre-state, considering the 
possibility of multiple Statistics object for the same scheme

this logic is currently tightly coupled with the StatsRecordingThreadPool, 
which calls to LlapUtil, so it's not testable at all, and pollutes both 
StatsRecordingThreadPool and LlapUtil, not to mention that the exposed stat 
fields (bytesRead, bytesWritten, ...) are also present in different classes, 
which makes them harder to maintain

the whole before/after logic can be moved to a separate class, which can hold 
the statistics as its state, making StatsRecordingThreadPool much cleaner

  was:
File system statistics in LLAP is handled like:
1. get all statistics before task callable (contains thread local), using 
FileSystem.getAllStatistics()
2. if more than one entry belongs to the same scheme, it's merged
3. run task
4. get all statistics again, subtract the pre-state, considering the 
possibility of multiple Statistics object for the same scheme

this logic is currently tightly coupled with the StatsRecordingThreadPool, 
which calls to LlapUtil, so it's not testable at all, and pollutes both 
StatsRecordingThreadPool and LlapUtil

the whole before/after logic can be moved to a separate class, which can hold 
the statistics as its state, making StatsRecordingThreadPool much cleaner


> Refactor stats handling in StatsRecordingThreadPool
> ---------------------------------------------------
>
>                 Key: HIVE-28638
>                 URL: https://issues.apache.org/jira/browse/HIVE-28638
>             Project: Hive
>          Issue Type: Improvement
>      Security Level: Public(Viewable by anyone) 
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> File system statistics in LLAP is handled like:
> 1. get all statistics before task callable (contains thread local), using 
> FileSystem.getAllStatistics()
> 2. if more than one entry belongs to the same scheme, it's merged
> 3. run task
> 4. get all statistics again, and subtract the pre-state, considering the 
> possibility of multiple Statistics object for the same scheme
> this logic is currently tightly coupled with the StatsRecordingThreadPool, 
> which calls to LlapUtil, so it's not testable at all, and pollutes both 
> StatsRecordingThreadPool and LlapUtil, not to mention that the exposed stat 
> fields (bytesRead, bytesWritten, ...) are also present in different classes, 
> which makes them harder to maintain
> the whole before/after logic can be moved to a separate class, which can hold 
> the statistics as its state, making StatsRecordingThreadPool much cleaner



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28638) Refactor stats handling in StatsRecordingThreadPool

Reply via email to