> On July 9, 2016, 8:53 p.m., Carl Steinbach wrote: > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java, > > line 56 > > <https://reviews.apache.org/r/49619/diff/3/?file=1439691#file1439691line56> > > > > A couple notes: > > > > 1. I think the example should actually return a value that is different > > than the input. It would also be good to include more than two elements in > > the input. If screen space is an issue I recommend only including a single > > element in each of the structs in the example, which I think has the added > > benefit of making the example clearer by not distracting the reader with > > irrelevant details. > > > > 2. It looks like the default sorting order (ASC) is actually the > > reverse of what I would expect it to be, i.e. I expect 'b' to come before > > 'g'. > > > > 3. Related to point (2), I think it's important to ensure that the > > sorting order of this UDF is consisent with ORDER BY, e.g. for a table t > > containing a single row with a single array of struct field a_struct_array, > > the queries "SELECT a_struct FROM t LATERAL VIEW explode(a_struct_array) > > structTable AS a_struct ORDER BY a_struct.col1 DESC" should return the same > > results as "SELECT a_struct FROM t LATERAL VIEW > > explode(sort_array_by(a_struct_array, 'col1', 'DESC')) structTable AS > > a_struct". Note that I probably didn't get the syntax for LATERAL VIEW and > > explode() correct.
1. Sorry there was a typo. 2. corrected the sorting order in example. 3. As per your instruction I have added test example of LATERAL VIEW explode(array) and explode(udf). Which gives same results. > On July 9, 2016, 8:53 p.m., Carl Steinbach wrote: > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java, > > line 93 > > <https://reviews.apache.org/r/49619/diff/3/?file=1439691#file1439691line93> > > > > Unnecessary string concatenation operators. removed > On July 9, 2016, 8:53 p.m., Carl Steinbach wrote: > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java, > > line 104 > > <https://reviews.apache.org/r/49619/diff/3/?file=1439691#file1439691line104> > > > > Unnecessary "+" operator. removed > On July 9, 2016, 8:53 p.m., Carl Steinbach wrote: > > ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java, > > line 1 > > <https://reviews.apache.org/r/49619/diff/3/?file=1439692#file1439692line1> > > > > Does this unit test provide any additional coverage or advantages over > > the q file tests? Is it necessary to have both? > > > > Note that I am a strong advocate of end-to-end qfile tests over unit > > tests, which is an opinion that not everyone holds. These are kind of same as q file. I feels test cases on Test classes are good for doing unit testing while development and takes less time to identify problem comparare to q files. Any ways I have removed some test cases from Test class. - Simanchal ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/49619/#review141413 ----------------------------------------------------------- On July 8, 2016, 12:35 p.m., Simanchal Das wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/49619/ > ----------------------------------------------------------- > > (Updated July 8, 2016, 12:35 p.m.) > > > Review request for hive, Ashutosh Chauhan and Carl Steinbach. > > > Repository: hive-git > > > Description > ------- > > Problem Statement: > > When we are working with complex structure of data like avro. > Most of the times we are encountering array contains multiple tuples and each > tuple have struct schema. > Suppose here struct schema is like below: > { > "name": "employee", > "type": [{ > "type": "record", > "name": "Employee", > "namespace": "com.company.Employee", > "fields": [{ > "name": "empId", > "type": "int" > }, { > "name": "empName", > "type": "string" > }, { > "name": "age", > "type": "int" > }, { > "name": "salary", > "type": "double" > }] > }] > } > > Then while running our hive query complex array looks like array of employee > objects. > Example: > //(array<struct<empId,empName,age,salary>>) > > Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)] > > When we are implementing business use cases day to day life we are > encountering problems like sorting a tuple array by specific field[s] like > empId,name,salary,etc by ASC or DESC order. > Proposal: > I have developed a udf 'sort_array_by' which will sort a tuple array by one > or more fields in ASC or DESC order provided by user ,default is ascending > order . > Example: > 1.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC"); > output: > array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)] > > 2.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > > 3.Select > sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC"); > output: > array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)] > > > Diffs > ----- > > itests/src/test/resources/testconfiguration.properties 1ab914d > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java > PRE-CREATION > > ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java > PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION > ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION > ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION > ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 > ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out > PRE-CREATION > ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out > PRE-CREATION > ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out > PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out a811747 > ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/49619/diff/ > > > Testing > ------- > > Junit test cases and query.q files are attached > > > Thanks, > > Simanchal Das > >