-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49619/#review141130
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java (line 427)
<https://reviews.apache.org/r/49619/#comment206540>

    To me "sort_array_field" makes it sound like this function sorts the 
elements in an array field, as opposed to sorting an array on a particular 
field, which is what is actually does. I think the purpose of this function 
would be clearer if the name were changed 'sort_array_on_field' or 
'sort_array_by' (I prefer the latter).



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 1)
<https://reviews.apache.org/r/49619/#comment206556>

    Is this really necessary?



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 9)
<https://reviews.apache.org/r/49619/#comment206557>

    No need for this. Please remove.



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 16)
<https://reviews.apache.org/r/49619/#comment206559>

    The rows should have different struct values.



ql/src/test/queries/clientpositive/udf_sort_array_field.q (line 25)
<https://reviews.apache.org/r/49619/#comment206558>

    Consider using named_struct() instead of struct(). This will allow you to 
provide names for the struct fields.



ql/src/test/results/beelinepositive/show_functions.q.out (line 183)
<https://reviews.apache.org/r/49619/#comment206555>

    The number of rows is off by 8. This looks like a bug, thought not one 
caused by this patch.



ql/src/test/results/beelinepositive/show_functions.q.out (line 184)
<https://reviews.apache.org/r/49619/#comment206553>

    It looks like you're stripping whitespace out of the patch. I suspect this 
is the cause of the failure in show_functions.q


- Carl Steinbach


On July 7, 2016, 5:07 a.m., Simanchal Das wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49619/
> -----------------------------------------------------------
> 
> (Updated July 7, 2016, 5:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Carl Steinbach.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem Statement:
> 
> When we are working with complext structure of data like avro.
> Most of the times we are encountering array contains multiple tuples and each 
> tuple have struct schema.
> 
> Suppose here struct schema is like below:
> 
> {
>       "name": "employee",
>       "type": [{
>               "type": "record",
>               "name": "Employee",
>               "namespace": "com.company.Employee",
>               "fields": [{
>                       "name": "empId",
>                       "type": "int"
>               }, {
>                       "name": "empName",
>                       "type": "string"
>               }, {
>                       "name": "age",
>                       "type": "int"
>               }, {
>                       "name": "salary",
>                       "type": "double"
>               }]
>       }]
> }
> 
> 
> Then while running our hive query complex array looks like array of employee 
> objects.
> Example: 
>       //(array<struct<empId,empName,age,salary>>)
>       
> Array[Employee(100,Foo,20,20990),Employee(500,Boo,30,50990),Employee(700,Harry,25,40990),Employee(100,Tom,35,70990)]
> 
> 
> When we are implementing business use cases day to day life we are 
> encountering problems like sorting a tuple array by specific field[s] like 
> empIdm,salary,etc.
> 
> 
> Proposal:
> 
> I have developed a udf 'sort_array_field' which will sort a tuple array by 
> one or more fields in naural order.
> 
> Example:
>       1.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Salary");
>       output: 
> array[struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(500,Boo,30,50990),struct(100,Tom,35,70990)]
>       
>       2.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,80990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary");
>       output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
>       3.Select 
> sort_array_field(array[struct(100,Foo,20,20990),struct(500,Boo,30,50990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)],"Name","Salary","Age);
>       output: 
> array[struct(500,Boo,30,50990),struct(500,Boo,30,80990),struct(100,Foo,20,20990),struct(700,Harry,25,40990),struct(100,Tom,35,70990)]
> 
> 
> Diffs
> -----
> 
>   itests/src/test/resources/testconfiguration.properties 1ab914d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 2f4a94c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArrayField.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSortArrayField.java
>  PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_field_wrong1.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/udf_sort_array_field_wrong2.q 
> PRE-CREATION 
>   ql/src/test/queries/clientpositive/udf_sort_array_field.q PRE-CREATION 
>   ql/src/test/results/beelinepositive/show_functions.q.out 4f3ec40 
>   ql/src/test/results/clientnegative/udf_sort_array_field_wrong1.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/udf_sort_array_field_wrong2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/udf_sort_array_field.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/49619/diff/
> 
> 
> Testing
> -------
> 
> Junit test cases and query.q files are attached
> 
> 
> Thanks,
> 
> Simanchal Das
> 
>

Reply via email to