[jira] [Work logged] (HIVE-26294) Allow substr to take bigint as parameters

ASF GitHub Bot (Jira) Mon, 18 Jul 2022 19:31:06 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26294?focusedWorklogId=792415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792415
 ]


ASF GitHub Bot logged work on HIVE-26294:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Jul/22 02:30
            Start Date: 19/Jul/22 02:30
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on code in PR #3433:
URL: https://github.com/apache/hive/pull/3433#discussion_r923998397


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -64,6 +65,27 @@ public UDFSubstr() {
     r = new Text();
   }
 
+  public Text evaluate(Text t, LongWritable pos, LongWritable len) {
+
+    if ((t == null) || (pos == null) || (len == null)) {
+      return null;
+    }
+
+    r.clear();
+    if ((len.get() <= 0)) {
+      return r;
+    }
+
+    String s = t.toString();
+    int[] index = makeIndex(pos.get(), len.get(), s.length());
+    if (index == null) {
+      return r;
+    }
+
+    r.set(s.substring(index[0], index[1]));
+    return r;
+  }

Review Comment:
   Java String accepts int parameters only
   
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int-
   
   How about creating a common `evaluate` which receives int parameters and 
both the `LongWritable` and `IntWritable` version call it?
   
   ```
   public Text evaluate(Text t, IntWritable pos, IntWritable len) {
   
       if ((t == null) || (pos == null) || (len == null)) {
         return null;
       }
   
      return evaluate(t, pos.get(), len.get());
   }
     
   public Text evaluate(Text t, LongWritable pos, LongWritable len) {
   
       if ((t == null) || (pos == null) || (len == null)) {
         return null;
       }
   
      // Handle long int overflow
      return evaluate(t, (int)pos.get(), (int)len.get());
   }
   
   private Text evaluate(Text t, int pos, int len) {
   ...
   }
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -85,6 +107,31 @@ public Text evaluate(Text t, IntWritable pos, IntWritable 
len) {
     return r;
   }
 
+  private int[] makeIndex(long pos, long len, int inputLen) {
+    if ((Math.abs(pos) > inputLen)) {
+      return null;
+    }
+
+    long start, end;
+
+    if (pos > 0) {
+      start = pos - 1;
+    } else if (pos < 0) {
+      start = inputLen + pos;
+    } else {
+      start = 0;
+    }
+
+    if ((inputLen - start) < len) {
+      end = inputLen;
+    } else {
+      end = start + len;
+    }
+    index[0] = (int) start;
+    index[1] = (int) end;
+    return index;
+  }

Review Comment:
   This is unnecessary if both `LongWritable` and `IntWritable` `evaluate` call 
the common  one. See my previous comment.



##########
ql/src/test/queries/clientpositive/udf_substr.q:
##########
@@ -76,3 +76,8 @@ SELECT
   substr("abc 玩玩玩 abc", 5),
   substr("abc 玩玩玩 abc", 5, 3)
 FROM src tablesample (1 rows);
+
+SELECT
+  substr('ABC', cast(1 as bigint), cast(0 as bigint)),
+  substr('ABC', cast(4 as bigint))
+FROM src tablesample (1 rows);

Review Comment:
   Could you please add a test when one of the long params has a value which 
doesn't falls into the int range? How should we handle that?



##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -112,10 +159,34 @@ private int[] makeIndex(int pos, int len, int inputLen) {
 
   private final IntWritable maxValue = new IntWritable(Integer.MAX_VALUE);
 
+  private final LongWritable maxLongValue = new LongWritable(Long.MAX_VALUE);
+
   public Text evaluate(Text s, IntWritable pos) {
     return evaluate(s, pos, maxValue);
   }
 
+  public Text evaluate(Text s, LongWritable pos) {
+    return evaluate(s, pos, maxLongValue);
+  }
+
+  public BytesWritable evaluate(BytesWritable bw, LongWritable pos, 
LongWritable len) {
+
+    if ((bw == null) || (pos == null) || (len == null)) {
+      return null;
+    }
+
+    if ((len.get() <= 0)) {
+      return new BytesWritable();
+    }
+
+    int[] index = makeIndex(pos.get(), len.get(), bw.getLength());
+    if (index == null) {
+      return new BytesWritable();
+    }
+
+    return new BytesWritable(Arrays.copyOfRange(bw.getBytes(), index[0], 
index[1]));
+  }

Review Comment:
   The pattern for the `Text` version of the `evaluate` can be applied here.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 792415)
    Time Spent: 40m  (was: 0.5h)

> Allow substr to take bigint as parameters
> -----------------------------------------
>
>                 Key: HIVE-26294
>                 URL: https://issues.apache.org/jira/browse/HIVE-26294
>             Project: Hive
>          Issue Type: Improvement
>          Components: Types
>            Reporter: Steve Carlin
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Small enhancement
> Impala allows a bigint as an argument for the substr function. We should 
> allow Hive to allow bigint arguments too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26294) Allow substr to take bigint as parameters

Reply via email to