Re: Assign unique link ID

2015-10-31 Thread ayan guha
Hi The way I see it, your dedup condition needs to be defined. If you have it variable, then the joining approach is no good either. You may want to stub columns (like putting a default value in the joining clause) to achieve this. If not, you would probably state the problem with all other condit

Re: Assign unique link ID

2015-10-31 Thread Sarath Chandra
Thanks for the reply Ayan. I got this idea earlier but the problem is the number of columns used for joining will be varying depending on the some data conditions. Also their data types will be different. So I'm not getting how to define the UDF as we need to upfront specify the argument count and

Re: Assign unique link ID

2015-10-31 Thread ayan guha
Can this be a solution? 1. Write a function which will take a string and convert to md5 hash 2. From your base table, generate a string out of all columns you have used for joining. So, records 1 and 4 should generate same hash value. 3. group by using this new id (you have already linked the reco

Assign unique link ID

2015-10-31 Thread Sarath Chandra
Hi All, I have a hive table where data from 2 different sources (S1 and S2) get accumulated. Sample data below - *RECORD_ID|SOURCE_TYPE|TRN_NO|DATE1|DATE2|BRANCH|REF1|REF2|REF3|REF4|REF5|REF6|DC_FLAG|AMOUNT|CURRENCY* *1|S1|55|19-Oct-2015|19-Oct-2015|25602|999||41106|47311|379|9|004|999|99