----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49619/#review141413 -----------------------------------------------------------
Some more comments. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java (line 56) <https://reviews.apache.org/r/49619/#comment206912> A couple notes: 1. I think the example should actually return a value that is different than the input. It would also be good to include more than two elements in the input. If screen space is an issue I recommend only including a single element in each of the structs in the example, which I think has the added benefit of making the example clearer by not distracting the reader with irrelevant details. 2. It looks like the default sorting order (ASC) is actually the reverse of what I would expect it to be, i.e. I expect 'b' to come before 'g'. 3. Related to point (2), I think it's important to ensure that the sorting order of this UDF is consisent with ORDER BY, e.g. for a table t containing a single row with a single array of struct field a_struct_array, the queries "SELECT a_struct FROM t LATERAL VIEW explode(a_struct_array) structTable AS a_struct ORDER BY a_struct.col1 DESC" should return the same results as "SELECT a_struct FROM t LATERAL VIEW explode(sort_array_by(a_struct_array, 'col1', 'DESC')) structTable AS a_struct". Note that I probably didn't get the syntax for LATERAL VIEW and explode() correct. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java (line 93) <https://reviews.apache.org/r/49619/#comment206913> Unnecessary string concatenation operators. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java (line 104) <https://reviews.apache.org/r/49619/#comment206914> Unnecessary "+" operator. ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java (line 1) <https://reviews.apache.org/r/49619/#comment206915> Does this unit test provide any additional coverage or advantages over the q file tests? Is it necessary to have both? Note that I am a strong advocate of end-to-end qfile tests over unit tests, which is an opinion that not everyone holds. - Carl Steinbach On July 8, 2016, 12:35 p.m., Simanchal Das wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/49619/ > ----------------------------------------------------------- > > (Updated July 8, 2016, 12:35 p.m.) > > > Review request for hive, Ashutosh Chauhan and Carl Steinbach. > > > Repository: hive-git > > > Description > ------- > > Problem Statement: > > When we are working with complex structure of data like avro. > Most of the times we are encountering array contains multiple tuples and each > tuple have struct schema. > Suppose here struct schema is like below: > { > "name": "employee", > "type": [{ > "type": "record", > "name": "Employee", > "namespace": "com.company.Employee", > "fields": [{ > "name": "empId", > "type": "int" > }, { > "name": "empName", > "type": "string" > }, { > "name": "age", > "type": "int" > }, { > "name": "salary", > "type": "double" > }] > }] > } > > Then while running our hive query complex array looks like array of employee > objects. > Example: > //(array<struct<empId,empName,age,salary>>) > > Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)] > > When we are implementing business use cases day to day life we are > encountering problems like sorting a tuple array by specific field[s] like > empId,name,salary,etc by ASC or DESC order. > Proposal: > I have developed a udf 'sort_array_by' which will sort a tuple array by one > or more fields in ASC or DESC order provided by user ,default is ascending > order . > Example: > 1.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC"); > output: > array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)] > > 2.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > > 3.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > > > Diffs > ----- > > itests/src/test/resources/testconfiguration.properties 1ab914d > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java > PRE-CREATION > > ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java > PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION > ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION > ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 > ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out > PRE-CREATION > ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out > PRE-CREATION > ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out > PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out a811747 > ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/49619/diff/ > > > Testing > ------- > > Junit test cases and query.q files are attached > > > Thanks, > > Simanchal Das > >