[ https://issues.apache.org/jira/browse/HIVE-22402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mollitor updated HIVE-22402: ---------------------------------- Status: Patch Available (was: Open) > Deprecate and Replace Hive PerfLogger > ------------------------------------- > > Key: HIVE-22402 > URL: https://issues.apache.org/jira/browse/HIVE-22402 > Project: Hive > Issue Type: Improvement > Affects Versions: 4.0.0 > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Major > Attachments: HIVE-22402.1.patch, HIVE-22402.2.patch, > HIVE-22402.3.patch, HIVE-22402.4.patch, HIVE-22402.4.patch > > > Recently I wanted to add some additional capability, and add more, > performance logging to support my troubleshooting efforts. I started looking > at PerfLogger and started to examine its usage. I discovered a few things: > # Since 'loggers' must be open and closed manually, I found a couple of > places where loggers were opened, but not closed, rendering them useless > # Since 'loggers' must be closed manually, I found a few places where an > early-return or Exception thrown would cause a logger to not be closed, > thereby rendering it useless > # Session information is not logged, so it can be difficult to precisely > pinpoint which session is taking lots of time > # PerfLogger overloaded. Most of the time, it's being used as a simple timer > mechanism with automatic logging in SLF4J debug. However, it is also a facade > over the Hive Metrics subsystem and timing results are automatically > published to Metrics and then there becomes this dependency on a 'logger' to > be able to access metric data as well. > The last bullet is the most challenging part and why I propose to deprecate > the Hive {{PerfLogger}} and not simply remove it. I am proposing a new > system... a {{PerfTimer}} that is allows for Java 8's try-with-resources > feature to protect against the developer having to care about manually close > measurements and not having to carefully consider all early-exits. The base > implementation logs to SLF4J. An extended version automatically publishes to > the Hive Metric subsystem as well. > The current Hive {{PerfLogger}} has a bit of a clunky system for allowing > plugable implementations. However, the current default implementation has a > side-effect of also publishing timing information to the Hive Metrics > subsystem. There are code sections that look up various timers in the Metrics > Subsytem and publish the results back to the client. Since, in theory, the > implementation is plugable, any other implementation that does not also have > this side-effect of also publishing to the Metrics Subsystem will break these > non-optional code paths. Also, these code paths create and interact with > {{PerfLoggers in a static way, and then the publishing code pulls the data > from the {{PerfLogger}}}} (as a facade to the Metrics subsystem) in a static > way. Therefore, when I tried to replace the entire {{PerfLogger}} code, I > came across an issue because there is not (and should not) be a way to just > statically pull this information down from any point in the code. Information > that is required for publishing should be passed around within some sort of > context object, separate from the Metrics subsystem. There was no obvious way > to string a new {{PerfTimer}} to all the required locations. I propose > marking the {{PerfLogger}} as deprecated and leaving these complex section > alone. Instead, replace only the simple "I want a timer" use cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)