Aihua Xu created HIVE-10720: ------------------------------- Summary: Pig using HCatLoader to access RCFile and perform join but get incorrect result. Key: HIVE-10720 URL: https://issues.apache.org/jira/browse/HIVE-10720 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 1.3.0 Reporter: Aihua Xu
{noformat} Create table tbl1 (key string, value string) stored as rcfile; Create table tbl2 (key string, value string); insert into tbl1 values('1', 'value1'); insert into tbl2 values('1', 'value2'); {noformat} Pig script: {noformat} tbl1 = LOAD 'tbl1' USING org.apache.hive.hcatalog.pig.HCatLoader(); tbl2 = LOAD 'tbl2' USING org.apache.hive.hcatalog.pig.HCatLoader(); src_tbl1 = FILTER tbl1 BY (key == '1'); prj_tbl1 = FOREACH src_tbl1 GENERATE key as tbl1_key, value as tbl1_value, '333' as tbl1_v1; src_tbl2 = FILTER tbl2 BY (key == '1'); prj_tbl2 = FOREACH src_tbl2 GENERATE key as tbl2_key, value as tbl2_value; result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key); prj_result = FOREACH result GENERATE prj_tbl1::tbl1_key AS key1, prj_tbl1::tbl1_value AS value1, prj_tbl1::tbl1_v1 AS v1, prj_tbl2::tbl2_key AS key2, prj_tbl2::tbl2_value AS value2; dump prj_result; {noformat} Based on the pig script, we could see different invalid results or even no result which should return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)