-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/
-----------------------------------------------------------

(Updated July 7, 2016, 5:03 a.m.)


Review request for hive and Carl Steinbach.


Changes
-------

added udf name in show function q.out file


Repository: hive-git


Description
-------

Problem Statement:

When we are working with complext structure of data like avro.
Most of the times we are encountering array contains multiple tuples and each 
tuple have struct schema.

Suppose here struct schema is like below:

{
        "name": "employee",
        "type": [{
                "type": "record",
                "name": "Employee",
                "namespace": "com.company.Employee",
                "fields": [{
                        "name": "empId",
                        "type": "int"
                }, {
                        "name": "empName",
                        "type": "string"
                }, {
                        "name": "age",
                        "type": "int"
                }, {
                        "name": "salary",
                        "type": "double"
                }]
        }]
}


Then while running our hive query complex array looks like array of employee 
objects.
Example: 
        //(array<struct<empId,empName,age,salary>>)
        
Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]


When we are implementing business use cases day to day life we are encountering 
problems like sorting a tuple array by specific field[s] like empIdm,salary,etc.


Proposal:

I have developed a udf 'sort_array_field' which will sort a tuple array by one 
or more fields in naural order.

Example:
        1.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
        output: 
array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
        
        2.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
        output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]

        3.Select 
sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
        output: 
array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]


Diffs (updated)
-----

  itests/src/test/resources/testconfiguration.properties 1ab914d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayField.java 
PRE-CREATION 
  
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayField.java
 PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_sort_array_field_wrong1.q PRE-CREATION 
  ql/src/test/queries/clientnegative/udf_sort_array_field_wrong2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_sort_array_field.q PRE-CREATION 
  ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 
  ql/src/test/results/clientnegative/udf_sort_array_field_wrong1.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/udf_sort_array_field_wrong2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/udf_sort_array_field.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/49619/diff/


Testing
-------

Junit test cases and query.q files are attached


Thanks,

Simanchal Das

Reply via email to