----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49619/#review141130 -----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java (line 427) <https://reviews.apache.org/r/49619/#comment206540> To me "sort_array_field" makes it sound like this function sorts the elements in an array field, as opposed to sorting an array on a particular field, which is what is actually does. I think the purpose of this function would be clearer if the name were changed 'sort_array_on_field' or 'sort_array_by' (I prefer the latter). ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 1) <https://reviews.apache.org/r/49619/#comment206556> Is this really necessary? ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 9) <https://reviews.apache.org/r/49619/#comment206557> No need for this. Please remove. ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 16) <https://reviews.apache.org/r/49619/#comment206559> The rows should have different struct values. ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 25) <https://reviews.apache.org/r/49619/#comment206558> Consider using named_struct() instead of struct(). This will allow you to provide names for the struct fields. ql/src/test/results/beelinepositive/show_functions.q.out (line 183) <https://reviews.apache.org/r/49619/#comment206555> The number of rows is off by 8. This looks like a bug, thought not one caused by this patch. ql/src/test/results/beelinepositive/show_functions.q.out (line 184) <https://reviews.apache.org/r/49619/#comment206553> It looks like you're stripping whitespace out of the patch. I suspect this is the cause of the failure in show_functions.q - Carl Steinbach On July 7, 2016, 5:07 a.m., Simanchal Das wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/49619/ > ----------------------------------------------------------- > > (Updated July 7, 2016, 5:07 a.m.) > > > Review request for hive, Ashutosh Chauhan and Carl Steinbach. > > > Repository: hive-git > > > Description > ------- > > Problem Statement: > > When we are working with complext structure of data like avro. > Most of the times we are encountering array contains multiple tuples and each > tuple have struct schema. > > Suppose here struct schema is like below: > > { > "name": "employee", > "type": [{ > "type": "record", > "name": "Employee", > "namespace": "com.company.Employee", > "fields": [{ > "name": "empId", > "type": "int" > }, { > "name": "empName", > "type": "string" > }, { > "name": "age", > "type": "int" > }, { > "name": "salary", > "type": "double" > }] > }] > } > > > Then while running our hive query complex array looks like array of employee > objects. > Example: > //(array<struct<empId,empName,age,salary>>) > > Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)] > > > When we are implementing business use cases day to day life we are > encountering problems like sorting a tuple array by specific field[s] like > empIdm,salary,etc. > > > Proposal: > > I have developed a udf 'sort_array_field' which will sort a tuple array by > one or more fields in naural order. > > Example: > 1.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary"); > output: > array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)] > > 2.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > > 3.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > > > Diffs > ----- > > itests/src/test/resources/testconfiguration.properties 1ab914d > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayField.java > PRE-CREATION > > ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayField.java > PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_field_wrong1.q > PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_field_wrong2.q > PRE-CREATION > ql/src/test/queries/clientpositive/udf_sort_array_field.q PRE-CREATION > ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 > ql/src/test/results/clientnegative/udf_sort_array_field_wrong1.q.out > PRE-CREATION > ql/src/test/results/clientnegative/udf_sort_array_field_wrong2.q.out > PRE-CREATION > ql/src/test/results/clientpositive/udf_sort_array_field.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/49619/diff/ > > > Testing > ------- > > Junit test cases and query.q files are attached > > > Thanks, > > Simanchal Das > >