Very nice! Thank you, Chandeep.
On Sat, Mar 12, 2016 at 8:20 PM, Chandeep Singh <c...@chandeep.com> wrote: > Not sure if there is a function for that but I wrote a UDF to do so - > https://github.com/chandeepsingh/Hive-UDFs > > hive> ADD JAR hive-udfs-1.0-uber.jar; > Added [hive-udfs-1.0-uber.jar] to class path > Added resources: [hive-udfs-1.0-uber.jar] > > hive> CREATE TEMPORARY FUNCTION array_dedup AS > 'com.hive.udfs.UdfArrayDeDup'; > OK > Time taken: 0.015 seconds > > hive> *SELECT array_dedup(array("blah","blah","blah")) from table1 limit > 1;* > OK > ["blah"] > Time taken: 0.502 seconds, Fetched: 1 row(s) > > Here is the code: > > package com.hive.udfs; > > /** > * > * @author chandeepsingh > * Remove duplicates from an array > */ > > import java.util.ArrayList; > import org.apache.hadoop.hive.ql.exec.UDFArgumentException; > import org.apache.hadoop.hive.ql.metadata.HiveException; > import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; > import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; > import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; > import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils; > > import java.util.HashSet; > import java.util.List; > > public class UdfArrayDeDup extends GenericUDF { > > ListObjectInspector arrayOI = null; > > @Override > public ObjectInspector initialize(ObjectInspector[] arguments) > throws UDFArgumentException { > > arrayOI = (ListObjectInspector) arguments[0]; > return ObjectInspectorUtils.getStandardObjectInspector(arrayOI); > } > > @Override > public Object evaluate(DeferredObject[] arguments) throws > HiveException { > > List<?> myArr = (List<?>) > ObjectInspectorUtils.copyToStandardObject(arguments[0].get(), arrayOI); > HashSet<Object> myHashSet = new HashSet<>(); > > myHashSet.addAll(myArr); > > if (myHashSet != null) { > return new ArrayList<>(myHashSet); > } else { > return null; > } > > } > > @Override > public String getDisplayString(String[] input) { > return new String(); > } > } > > > > On Mar 13, 2016, at 1:30 AM, Rex X <dnsr...@gmail.com> wrote: > > For the first question, is there any way to use "set" instead of an > "array" to dedupe all elements? > > "select array(1,1)" will return "[1,1]", not "[1]". > > > > On Sat, Mar 12, 2016 at 5:26 PM, Rex X <dnsr...@gmail.com> wrote: > >> Thank you, Chandeep. Yes, my first problem solved. >> How about the second one? Is there any way to append an element to an >> existing array? >> >> >> >> On Sat, Mar 12, 2016 at 5:10 PM, Chandeep Singh <c...@chandeep.com> wrote: >> >>> If you only want the array while you’re querying table1 your example >>> should work. If you want to add AB to the table you’ll probably need to >>> create a new table by selecting everything you need from table1. >>> >>> hive> select * from table1 limit 1; >>> OK >>> temp1 temp2 temp3 >>> >>> hive> select f1, array(f2, f3) AS AB from table1 limit 1; >>> OK >>> temp1 [“temp2”,"temp3"] >>> >>> >>> On Mar 13, 2016, at 12:33 AM, Rex X <dnsr...@gmail.com> wrote: >>> >>> How to make the following work? >>> >>> 1. combine columns A and B to make one array as a new column AB. Both >>> column A and B are string types. >>> >>> select >>> string_columnA, >>> string_columnB, >>> *array(string_columnA, string_columnB) *as AB >>> from Table1; >>> >>> 2. append columnA to an existing array-type column B >>> >>> select >>> string_columnA, >>> array_columnB, >>> array_flatmerge(string_columnA, array_columnB) as AB >>> from Table2; >>> >>> In fact, I should say "set" instead of "array" above, since I expect no >>> duplicates. >>> >>> Any idea? >>> >>> >>> >> > >