Re: Review Request 49619: sorting of tuple array using multiple fields

Simanchal Das Mon, 11 Jul 2016 01:34:53 -0700


> On July 9, 2016, 8:53 p.m., Carl Steinbach wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java,
> >  line 56
> > <https://reviews.apache.org/r/49619/diff/3/?file=1439691#file1439691line56>
> >
> >     A couple notes:
> >     
> >     1. I think the example should actually return a value that is different 
> > than the input. It would also be good to include more than two elements in 
> > the input. If screen space is an issue I recommend only including a single 
> > element in each of the structs in the example, which I think has the added 
> > benefit of making the example clearer by not distracting the reader with 
> > irrelevant details.
> >     
> >     2. It looks like the default sorting order (ASC) is actually the 
> > reverse of what I would expect it to be, i.e. I expect 'b' to come before 
> > 'g'.
> >     
> >     3. Related to point (2), I think it's important to ensure that the 
> > sorting order of this UDF is consisent with ORDER BY, e.g. for a table t 
> > containing a single row with a single array of struct field a_struct_array, 
> > the queries "SELECT a_struct FROM t LATERAL VIEW explode(a_struct_array) 
> > structTable AS a_struct ORDER BY a_struct.col1 DESC" should return the same 
> > results as "SELECT a_struct FROM t LATERAL VIEW 
> > explode(sort_array_by(a_struct_array, 'col1', 'DESC')) structTable AS 
> > a_struct". Note that I probably didn't get the syntax for LATERAL VIEW and 
> > explode() correct.


1. Sorry there was a typo.
2. corrected the sorting order in example.
3. As per your instruction I have added test example of LATERAL VIEW 
explode(array) and explode(udf). Which gives same results.


> On July 9, 2016, 8:53 p.m., Carl Steinbach wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java,
> >  line 93
> > <https://reviews.apache.org/r/49619/diff/3/?file=1439691#file1439691line93>
> >
> >     Unnecessary string concatenation operators.

removed


> On July 9, 2016, 8:53 p.m., Carl Steinbach wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java,
> >  line 104
> > <https://reviews.apache.org/r/49619/diff/3/?file=1439691#file1439691line104>
> >
> >     Unnecessary "+" operator.

removed


> On July 9, 2016, 8:53 p.m., Carl Steinbach wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java,
> >  line 1
> > <https://reviews.apache.org/r/49619/diff/3/?file=1439692#file1439692line1>
> >
> >     Does this unit test provide any additional coverage or advantages over 
> > the q file tests? Is it necessary to have both?
> >     
> >     Note that I am a strong advocate of end-to-end qfile tests over unit 
> > tests, which is an opinion that not everyone holds.

These are kind of same as q file. I feels test cases on Test classes are good 
for doing unit testing while development and takes less time to identify 
problem comparare to q files.
Any ways I have removed some test cases from Test class.


- Simanchal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/#review141413
-----------------------------------------------------------


On July 8, 2016, 12:35 p.m., Simanchal Das wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49619/
> -----------------------------------------------------------
> 
> (Updated July 8, 2016, 12:35 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Carl Steinbach.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem Statement:
> 
> When we are working with complex structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> Suppose here struct schema is like below:
> {
>       "name": "employee",
>       "type": [{
>               "type": "record",
>               "name": "Employee",
>               "namespace": "com.company.Employee",
>               "fields": [{
>                       "name": "empId",
>                       "type": "int"
>               }, {
>                       "name": "empName",
>                       "type": "string"
>               }, {
>                       "name": "age",
>                       "type": "int"
>               }, {
>                       "name": "salary",
>                       "type": "double"
>               }]
>       }]
> }
> 
> Then while running our hive query complex array looks like array of employee 
> objects.
> Example: 
>       //(array<struct<empId,empName,age,salary>>)
>       
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> 
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empId,name,salary,etc by ASC or DESC order.
> Proposal:
> I have developed a udf 'sort_array_by' which will sort a tuple array by one 
> or more fields in ASC or DESC order provided by user ,default is ascending 
> order .
> Example:
>       1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary","ASC");
>       output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>       
>       2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","ASC");
>       output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
>       3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age,"ASC");
>       output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 1ab914d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayByField.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayByField.java
>  PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong1.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong2.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_by_wrong3.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/udf_sort_array_by.q PRE-CREATION 
>   ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong1.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_sort_array_by_wrong3.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out a811747 
>   ql/src/test/results/clientpositive/udf_sort_array_by.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/49619/diff/
> 
> 
> Testing
> -------
> 
> Junit test cases and query.q files are attached
> 
> 
> Thanks,
> 
> Simanchal Das
> 
>

Re: Review Request 49619: sorting of tuple array using multiple fields

Reply via email to