Hiten Java created PIG-3668:
-------------------------------
Summary: COR built-in function when atleast one of the coefficient
values is NaN
Key: PIG-3668
URL: https://issues.apache.org/jira/browse/PIG-3668
Project: Pig
Issue Type: Bug
Components: internal-udfs
Affects Versions: 0.11.1, 0.12.0, 0.11
Reporter: Hiten Java
Priority: Trivial
When passing multiple column keys for Correlation analysis, if coefficient
value of one of the combinations is NaN, then the value for all other
combinations is not computed.
Pearson Co-efficient value is NaN if all values for a given column are the same.
Example:
A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader();
B = group A all;
c = foreach B generate group, FLATTEN(COR((bag{tuple(double)})
A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3,
(bag{tuple(double)}) A.col_4));
If the value of pearson coefficient for col_1 and col_2 is NaN, then value of
co-efficients for all combinations is NaN
This is happening because of 'return null' statement in catch block on lines
157 and 235 in file org.apache.pig.builtin.COR.java
If the catch block is removed, then the correlation analysis would continue for
the remaining columns. (ApachePig 0.12.0)
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)