[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225276#comment-14225276
 ] 

Marcelo Vanzin commented on HIVE-8574:
--------------------------------------

Hey [~chengxiang li], I'd like to have a better understanding of how these 
metrics will be used by Hive to come up with the proper fix here.

I see two approaches:

* Add an API to clean up the metrics. This keeps the current "collect all 
metrics" approach, but adds APIs that will to delete the metrics. This assumes 
that Hive will always process metrics of finished jobs, even if just to ask 
them to be deleted.

* Suggested by [~xuefuz]: add a timeout after a job is finished for cleaning up 
the metrics. This means that Hive has some time after a job finished where this 
data will be available, but after that, it's gone.

I could also add some internal checks so that the collection doesn't keep 
acumulating data indefinitely if data is never deleted; like track only the 
last "x" finished jobs, evicting the oldest when a new job starts.

What do you think?



> Enhance metrics gathering in Spark Client [Spark Branch]
> --------------------------------------------------------
>
>                 Key: HIVE-8574
>                 URL: https://issues.apache.org/jira/browse/HIVE-8574
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>
> The current implementation of metrics gathering in the Spark client is a 
> little hacky. First, it's awkward to use (and the implementation is also 
> pretty ugly). Second, it will just collect metrics indefinitely, so in the 
> long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to