[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600195#comment-14600195 ]
Alexander Pivovarov commented on HIVE-9557: ------------------------------------------- rename clientnegative/udf_cosine_similarity.q to clientnegative/udf_cosine_similarity_error_1.q then {code} # build hive mvn clean install -Phadoop-2,dist -DskipTests # build itest cd itest mvn clean install -Phadoop-2 -DskipTests # build qtest cd qtest mvn clean install -Phadoop-2 -DskipTests # run q test. it will overwrite q.out file mvn test -Dtest=TestCliDriver -Dqfile=udf_cosine_similarity.q,show_functions.q -Dtest.output.overwrite=true -Phadoop-2 # run negative q file test mvn test -Dtest=TestNegativeCliDriver -Dqfile=udf_cosine_similarity_error_1.q -Dtest.output.overwrite=true -Phadoop-2 {code} > create UDF to measure strings similarity using Cosine Similarity algo > --------------------------------------------------------------------- > > Key: HIVE-9557 > URL: https://issues.apache.org/jira/browse/HIVE-9557 > Project: Hive > Issue Type: Improvement > Components: UDF > Reporter: Alexander Pivovarov > Assignee: Nishant Kelkar > Labels: CosineSimilarity, SimilarityMetric, UDF > Attachments: udf_cosine_similarity-v01.patch > > > algo description http://en.wikipedia.org/wiki/Cosine_similarity > {code} > --one word different, total 2 words > str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f > {code} > reference implementation: > https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)