[ https://issues.apache.org/jira/browse/HIVE-26294?focusedWorklogId=792415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792415 ]
ASF GitHub Bot logged work on HIVE-26294: ----------------------------------------- Author: ASF GitHub Bot Created on: 19/Jul/22 02:30 Start Date: 19/Jul/22 02:30 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3433: URL: https://github.com/apache/hive/pull/3433#discussion_r923998397 ########## ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java: ########## @@ -64,6 +65,27 @@ public UDFSubstr() { r = new Text(); } + public Text evaluate(Text t, LongWritable pos, LongWritable len) { + + if ((t == null) || (pos == null) || (len == null)) { + return null; + } + + r.clear(); + if ((len.get() <= 0)) { + return r; + } + + String s = t.toString(); + int[] index = makeIndex(pos.get(), len.get(), s.length()); + if (index == null) { + return r; + } + + r.set(s.substring(index[0], index[1])); + return r; + } Review Comment: Java String accepts int parameters only https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int- How about creating a common `evaluate` which receives int parameters and both the `LongWritable` and `IntWritable` version call it? ``` public Text evaluate(Text t, IntWritable pos, IntWritable len) { if ((t == null) || (pos == null) || (len == null)) { return null; } return evaluate(t, pos.get(), len.get()); } public Text evaluate(Text t, LongWritable pos, LongWritable len) { if ((t == null) || (pos == null) || (len == null)) { return null; } // Handle long int overflow return evaluate(t, (int)pos.get(), (int)len.get()); } private Text evaluate(Text t, int pos, int len) { ... } ``` ########## ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java: ########## @@ -85,6 +107,31 @@ public Text evaluate(Text t, IntWritable pos, IntWritable len) { return r; } + private int[] makeIndex(long pos, long len, int inputLen) { + if ((Math.abs(pos) > inputLen)) { + return null; + } + + long start, end; + + if (pos > 0) { + start = pos - 1; + } else if (pos < 0) { + start = inputLen + pos; + } else { + start = 0; + } + + if ((inputLen - start) < len) { + end = inputLen; + } else { + end = start + len; + } + index[0] = (int) start; + index[1] = (int) end; + return index; + } Review Comment: This is unnecessary if both `LongWritable` and `IntWritable` `evaluate` call the common one. See my previous comment. ########## ql/src/test/queries/clientpositive/udf_substr.q: ########## @@ -76,3 +76,8 @@ SELECT substr("abc 玩玩玩 abc", 5), substr("abc 玩玩玩 abc", 5, 3) FROM src tablesample (1 rows); + +SELECT + substr('ABC', cast(1 as bigint), cast(0 as bigint)), + substr('ABC', cast(4 as bigint)) +FROM src tablesample (1 rows); Review Comment: Could you please add a test when one of the long params has a value which doesn't falls into the int range? How should we handle that? ########## ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java: ########## @@ -112,10 +159,34 @@ private int[] makeIndex(int pos, int len, int inputLen) { private final IntWritable maxValue = new IntWritable(Integer.MAX_VALUE); + private final LongWritable maxLongValue = new LongWritable(Long.MAX_VALUE); + public Text evaluate(Text s, IntWritable pos) { return evaluate(s, pos, maxValue); } + public Text evaluate(Text s, LongWritable pos) { + return evaluate(s, pos, maxLongValue); + } + + public BytesWritable evaluate(BytesWritable bw, LongWritable pos, LongWritable len) { + + if ((bw == null) || (pos == null) || (len == null)) { + return null; + } + + if ((len.get() <= 0)) { + return new BytesWritable(); + } + + int[] index = makeIndex(pos.get(), len.get(), bw.getLength()); + if (index == null) { + return new BytesWritable(); + } + + return new BytesWritable(Arrays.copyOfRange(bw.getBytes(), index[0], index[1])); + } Review Comment: The pattern for the `Text` version of the `evaluate` can be applied here. Issue Time Tracking ------------------- Worklog Id: (was: 792415) Time Spent: 40m (was: 0.5h) > Allow substr to take bigint as parameters > ----------------------------------------- > > Key: HIVE-26294 > URL: https://issues.apache.org/jira/browse/HIVE-26294 > Project: Hive > Issue Type: Improvement > Components: Types > Reporter: Steve Carlin > Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Small enhancement > Impala allows a bigint as an argument for the substr function. We should > allow Hive to allow bigint arguments too. -- This message was sent by Atlassian Jira (v8.20.10#820010)