[ 
https://issues.apache.org/jira/browse/HIVE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726475#comment-13726475
 ] 

Edward Capriolo commented on HIVE-2482:
---------------------------------------

[~thejas] Thank you for your comment, I am going to agree and disagree with 
your for my prospective on this issue.

* I use hive_test to tests my udfs https://github.com/edwardcapriolo/hive_test
* At one point we added a plugin developer kit to hive which allowed annotation 
based testing of UDFS
At one point this was removed, there were reports that it was flakey and I was 
not paying much attention at that time, but I probably would have advocated 
that it not be removed.

Now, I do agree with you that we can get better coverage of some things outside 
end-to-end tests, but believe it or not functions are not one of them.

Why do I say this? A few reasons:
* Most functions are not functional. 
* They actually have state, conf at initialization, reusable objects shared 
between calls to evaluate. 
* UDAFs have entire aggregation buffers systems.

To your specific points
1) Welcome to my life, I have been complaining about our test infrastructure 
for years. Honestly now that we have a build system we can test udf's fairly 
fast, and there is not a huge volume of them anyway.

2) That can be true, again I use hive_test and I am not against having units + 
end-to-end tests
3) I agree with this to an extent, but even in a real unit test one still has 
to write Assert.assertEquals( something, somethingElse ) so you still eyeball 
something. From a review standpoints it's easier to eyeball the .out then tens 
or hundreds of asserts.

Again I am not against having more traditionally unit tests and writing code in 
functional style that is easier to document and and reason about, but I think 
to cover all the corner cases of exceptions and cleaning out private state 
properly the unit tests will be more ugly then the q tests.

I am talking on hive-dev about the project split up. This is one of the things 
I want to do, move all the end-to-end test to a final project and really step 
up the unit style testing.

There is lots of things we can do to make the tests faster
* move all the UDFs into 1 big test :) save the overhead of launching multiple 
tests
* optimize 'select udf(column) from table limit 1' <-- we should be able to 
make that test scream

Anyway unlike the past where stuff like this sits on the queue forever we now 
have a build bot and I am dedicated to seeing patches reviewed and committed 
fast (especially those like these)

BTW at minimum there is show_functions.q, so every time you add a function you 
at least have to touch that test.



                
> Convenience UDFs for binary data type
> -------------------------------------
>
>                 Key: HIVE-2482
>                 URL: https://issues.apache.org/jira/browse/HIVE-2482
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Mark Wagner
>         Attachments: HIVE-2482.1.patch
>
>
> HIVE-2380 introduced binary data type in Hive. It will be good to have 
> following udfs to make it more useful:
> * UDF's to convert to/from hex string
> * UDF's to convert to/from string using a specific encoding
> * UDF's to convert to/from base64 string
> * UDF's to convert to/from non-string types using a particular serde

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to