[ https://issues.apache.org/jira/browse/HIVE-24688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-24688: -------------------------------- Description: It's not necessarily copyToStandardObject which should be optimized, but we need to consider some optimization on the attached codepath. In a customer case, 3 reducer tasks run forever (handling skewed keys) and most of the time is spent on this code path, utilizing GC heavily. At the moment I'm open to any kind of optimization: 1. do we need to copy Text? cannot we get a reference back? !Screen Shot 2021-01-27 at 9.52.32 AM.png|width=652,height=280! {code:java} public Object copyObject(Object o) { ... if (o instanceof Text) { String str = ((Text)o).toString(); HiveVarcharWritable hcw = new HiveVarcharWritable(); hcw.set(str, ((VarcharTypeInfo)typeInfo).getLength()); return hcw; } {code} here we end up decoding a Text to String (toString()) and encoding back (hcw.set) just because we want to force a max length...I guess there is better way, e.g. if the Text is already truncated to a proper length, we can simply byte-copy its value. was: It's not necessarily copyToStandardObject which should be optimized, but we need to consider some optimization on the attached codepath. In a customer case, 3 reducer tasks run forever (handling skewed keys) and most of the time is spent on this code path, utilizing GC heavily. At the moment I'm open to any kind of optimization: 1. do we need to copy Text? cannot we get a reference back? !Screen Shot 2021-01-27 at 9.52.32 AM.png|width=652,height=280! {code} public Object copyObject(Object o) { ... if (o instanceof Text) { String str = ((Text)o).toString(); HiveVarcharWritable hcw = new HiveVarcharWritable(); hcw.set(str, ((VarcharTypeInfo)typeInfo).getLength()); return hcw; } {code} > Optimise ObjectInspectorUtils.copyToStandardObject > -------------------------------------------------- > > Key: HIVE-24688 > URL: https://issues.apache.org/jira/browse/HIVE-24688 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Attachments: Screen Shot 2021-01-27 at 9.52.32 AM.png > > > It's not necessarily copyToStandardObject which should be optimized, but we > need to consider some optimization on the attached codepath. > In a customer case, 3 reducer tasks run forever (handling skewed keys) and > most of the time is spent on this code path, utilizing GC heavily. At the > moment I'm open to any kind of optimization: > 1. do we need to copy Text? cannot we get a reference back? > !Screen Shot 2021-01-27 at 9.52.32 AM.png|width=652,height=280! > {code:java} > public Object copyObject(Object o) { > ... > if (o instanceof Text) { > String str = ((Text)o).toString(); > HiveVarcharWritable hcw = new HiveVarcharWritable(); > hcw.set(str, ((VarcharTypeInfo)typeInfo).getLength()); > return hcw; > } > {code} > here we end up decoding a Text to String (toString()) and encoding back > (hcw.set) just because we want to force a max length...I guess there is > better way, e.g. if the Text is already truncated to a proper length, we can > simply byte-copy its value. -- This message was sent by Atlassian Jira (v8.3.4#803005)