[ https://issues.apache.org/jira/browse/HIVE-22402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Mollitor updated HIVE-22402: ---------------------------------- Description: Recently I wanted to add some additional capability, and add more, performance logging to support my troubleshooting efforts. I started looking at PerfLogger and started to examine its usage. I discovered a few things: # Since 'loggers' must be open and closed manually, I found a couple of places where loggers were opened, but not closed, rendering them useless # Since 'loggers' must be closed manually, I found a few places where an early-return or Exception thrown would cause a logger to not be closed, thereby rendering it useless # Session information is not logged, so it can be difficult to precisely pinpoint which session is taking lots of time # PerfLogger overloaded. Most of the time, it's being used as a simple timer mechanism with automatic logging in SLF4J debug. However, it is also a facade over the Hive Metrics subsystem and timing results are automatically published to Metrics and then there becomes this dependency on a 'logger' to be able to access metric data as well. The last bullet is the most challenging part and why I propose to deprecate the Hive {{PerfLogger}} and not simply remove it. I am proposing a new system... a {{PerfTimer}} that is allows for Java 8's try-with-resources feature to protect against the developer having to care about manually close measurements and not having to carefully consider all early-exits. The base implementation logs to SLF4J. An extended version automatically publishes to the Hive Metric subsystem as well. The current Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable implementations. However, the current default implementation has a side-effect of also publishing timing information to the Hive Metrics subsystem. There are code sections that look up various timers in the Metrics Subsytem and publish the results back to the client. Since, in theory, the implementation is plugable, any other implementation that does not also have this side-effect of also publishing to the Metrics Subsystem will break these non-optional code paths. Also, these code paths create and interact with {{PerfLoggers in a static way, and then the publishing code pulls the data from the {{PerfLogger}}}} (as a facade to the Metrics subsystem) in a static way. Therefore, when I tried to replace the entire {{PerfLogger}} code, I came across an issue because there is not (and should not) be a way to just statically pull this information down from any point in the code. Information that is required for publishing should be passed around within some sort of context object, separate from the Metrics subsystem. There was no obvious way to string a new {{PerfTimer}} to all the required locations. I propose marking the {{PerfLogger}} as deprecated and leaving these complex section alone. Instead, replace only the simple "I want a timer" use cases. was: Recently I wanted to add some additional capability, and add more, performance logging to support my troubleshooting efforts. I started looking at PerfLogger and started to examine its usage. I discovered a few things: # Since 'loggers' must be open and closed manually, I found a couple of places where loggers were opened, but not closed, rendering them useless # Since 'loggers' must be closed manually, I found a few places where an early-return or Exception thrown would cause a logger to not be closed, thereby rendering it useless # Session information is not logged, so it can be difficult to precisely pinpoint which session is taking lots of time # PerfLogger overloaded. Most of the time, it's being used as a simple timer mechanism with automatic logging in SLF4J debug. However, it is also a facade over the Hive Metrics subsystem and timing results are automatically published to Metrics and then there becomes this dependency on a 'logger' to be able to access metric data as well. The last bullet is the most challenging part and why I propose to deprecate the Hive {{PerfLogger}} and not simply remove it. I am proposing a new system... a {{PerfTimer}} that is allows for Java 8's try-with-resources feature to protect against the developer having to care about manually close measurements and not having to carefully consider all early-exits. The base implementation logs to SLF4J. An extended version automatically publishes to the Hive Metric subsystem as well. The Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable implementations. However, there are sections of the code that rely on the {{PerfLogger}}'s ability to publish to the Metrics subsystem. These code section look up various timers in the Metrics Subsytem and publish the results back to the client. Since, in theory, the implementation is plugable, any other implementation that does not publish to the Metrics Subsystem will break these non-optional code paths. Also, these code paths create and interact with {{PerfLoggers}} in a static way, and then the publishing code pulls the data from the {{PerfLogger}} (as a facade to the Metrics subsystem) in a static way. Therefore, when I tried to replace the entire {{PerfLogger}} code, I came across an issue on this task because there are places that just statically access and modify this performance metric information. Moving forward, information that is required for publishing should be passed around within some sort of context object, separate from the Metrics subsystem. There was no obvious way to string a new {{PerfTimer}} to all the required locations and will have to be left for another day when those sections of the code are modified in a more holistic way. I propose marking the {{PerfLogger}} as deprecated, so it does not continue to be used for new development, and leaving these complex section alone. Instead, replace only the simple "I want a timer" use cases with this new \{{PerfTimer}} facility > Deprecate and Replace Hive PerfLogger > ------------------------------------- > > Key: HIVE-22402 > URL: https://issues.apache.org/jira/browse/HIVE-22402 > Project: Hive > Issue Type: Improvement > Affects Versions: 4.0.0 > Reporter: David Mollitor > Assignee: David Mollitor > Priority: Major > Attachments: HIVE-22402.1.patch, HIVE-22402.2.patch, > HIVE-22402.3.patch, HIVE-22402.4.patch > > > Recently I wanted to add some additional capability, and add more, > performance logging to support my troubleshooting efforts. I started looking > at PerfLogger and started to examine its usage. I discovered a few things: > # Since 'loggers' must be open and closed manually, I found a couple of > places where loggers were opened, but not closed, rendering them useless > # Since 'loggers' must be closed manually, I found a few places where an > early-return or Exception thrown would cause a logger to not be closed, > thereby rendering it useless > # Session information is not logged, so it can be difficult to precisely > pinpoint which session is taking lots of time > # PerfLogger overloaded. Most of the time, it's being used as a simple timer > mechanism with automatic logging in SLF4J debug. However, it is also a facade > over the Hive Metrics subsystem and timing results are automatically > published to Metrics and then there becomes this dependency on a 'logger' to > be able to access metric data as well. > The last bullet is the most challenging part and why I propose to deprecate > the Hive {{PerfLogger}} and not simply remove it. I am proposing a new > system... a {{PerfTimer}} that is allows for Java 8's try-with-resources > feature to protect against the developer having to care about manually close > measurements and not having to carefully consider all early-exits. The base > implementation logs to SLF4J. An extended version automatically publishes to > the Hive Metric subsystem as well. > The current Hive {{PerfLogger}} has a bit of a clunky system for allowing > plugable implementations. However, the current default implementation has a > side-effect of also publishing timing information to the Hive Metrics > subsystem. There are code sections that look up various timers in the Metrics > Subsytem and publish the results back to the client. Since, in theory, the > implementation is plugable, any other implementation that does not also have > this side-effect of also publishing to the Metrics Subsystem will break these > non-optional code paths. Also, these code paths create and interact with > {{PerfLoggers in a static way, and then the publishing code pulls the data > from the {{PerfLogger}}}} (as a facade to the Metrics subsystem) in a static > way. Therefore, when I tried to replace the entire {{PerfLogger}} code, I > came across an issue because there is not (and should not) be a way to just > statically pull this information down from any point in the code. Information > that is required for publishing should be passed around within some sort of > context object, separate from the Metrics subsystem. There was no obvious way > to string a new {{PerfTimer}} to all the required locations. I propose > marking the {{PerfLogger}} as deprecated and leaving these complex section > alone. Instead, replace only the simple "I want a timer" use cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)