----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49619/ -----------------------------------------------------------
(Updated July 8, 2016, 12:35 p.m.) Review request for hive, Ashutosh Chauhan and Carl Steinbach. Changes ------- renamed the udf to sort_array_by and fixed all review the comments Repository: hive-git Description (updated) ------- Problem Statement: When we are working with complex structure of data like avro. Most of the times we are encountering array contains multiple tuples and each tuple have struct schema. Suppose here struct schema is like below: { "name": "employee", "type": [{ "type": "record", "name": "Employee", "namespace": "com.company.Employee", "fields": [{ "name": "empId", "type": "int" }, { "name": "empName", "type": "string" }, { "name": "age", "type": "int" }, { "name": "salary", "type": "double" }] }] } Then while running our hive query complex array looks like array of employee objects. Example: //(array<struct<empId,empName,age,salary>>) Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)] When we are implementing business use cases day to day life we are encountering problems like sorting a tuple array by specific field[s] like empId,name,salary,etc by ASC or DESC order. Proposal: I have developed a udf 'sort_array_by' which will sort a tuple array by one or more fields in ASC or DESC order provided by user ,default is ascending order . Example: 1.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC"); output: array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)] 2.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC"); output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] 3.Select sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC"); output: array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] Diffs (updated) ----- itests/src/test/resources/testconfiguration.properties 1ab914d ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java PRE-CREATION ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out PRE-CREATION ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out PRE-CREATION ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out PRE-CREATION ql/src/test/results/clientpositive/show_functions.q.out a811747 ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION Diff: https://reviews.apache.org/r/49619/diff/ Testing ------- Junit test cases and query.q files are attached Thanks, Simanchal Das