[ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603988#comment-14603988
 ] 

Nishant Kelkar commented on HIVE-9557:
--------------------------------------

The TestCliDriver tests actually fail with the following error:

{code}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hive.cli.TestCliDriver
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 71.797 sec <<< 
FAILURE! - in org.apache.hadoop.hive.cli.TestCliDriver
testCliDriver_udf_cosine_similarity(org.apache.hadoop.hive.cli.TestCliDriver)  
Time elapsed: 0.346 sec  <<< FAILURE!
junit.framework.AssertionFailedError: Unexpected exception 
junit.framework.AssertionFailedError: Client Execution failed with error code = 
10014 running 

select
cosine_similarity('kitten', 'sitting', ' '),
cosine_similarity('sitting kitten', 'kitten sitting', ' '),
cosine_similarity('sitting kitten', 'sitting kittens', ' '),
cosine_similarity('two#delimiters,here', 'two#delimiters#,here,too', '#,'),
cosine_similarity('test string', '', ' '),
cosine_similarity(cast(null as string), 'test string', ' '),
cosine_similarity('test string', cast(null as string), ','),
cosine_similarity(cast(null as string), cast(null as string), ' '),
cosine_similarity('a string', 'another string', '')
See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
        at junit.framework.Assert.fail(Assert.java:57)
        at org.apache.hadoop.hive.ql.QTestUtil.failed(QTestUtil.java:1984)
        at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:152)
        at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_cosine_similarity(TestCliDriver.java:134)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at junit.framework.TestSuite.runTest(TestSuite.java:255)
        at junit.framework.TestSuite.run(TestSuite.java:250)
        at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{code}

> create UDF to measure strings similarity using Cosine Similarity algo
> ---------------------------------------------------------------------
>
>                 Key: HIVE-9557
>                 URL: https://issues.apache.org/jira/browse/HIVE-9557
>             Project: Hive
>          Issue Type: Improvement
>          Components: UDF
>            Reporter: Alexander Pivovarov
>            Assignee: Nishant Kelkar
>              Labels: CosineSimilarity, SimilarityMetric, UDF
>         Attachments: udf_cosine_similarity-v01.patch
>
>
> algo description http://en.wikipedia.org/wiki/Cosine_similarity
> {code}
> --one word different, total 2 words
> str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
> {code}
> reference implementation:
> https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to