I would suggest to use a subquery WITH unique_currency AS ( SELECT CCY_CD, MAX(CNTRY_DESC) as CNTRY_DESC FROM CURRENCY GROUP BY CCY_CD )
and then perform your left join on it. Some SQL engine (e.g. Presto) have aggregation functions like arbitrary(col) that take any value and are a little less costly than a max. Sometimes, they also have functions like max_by(x, y) <https://prestodb.io/docs/current/functions/aggregate.html#max_by> that would allow you to get the most recent description. It is a shame that this function is not included in Hive yet, but still you can find some UDAF implementations on github <https://github.com/dataiku/dataiku-hive-udf#first_of_group-last_of_group>. On Wed, Aug 23, 2017 at 3:37 PM, Ramasubramanian Narayanan < ramasubramanian.naraya...@gmail.com> wrote: > Hi, > > Need your suggestion on the below. > > Have two tables TXN and CURRENCY. > > Need all records in TXN and hence doing Left Join with CURRENCY. > > *Two problems :* > 1. CURRENCY table may contain duplicate records hence it needs to be > handled through RANK or some other function. > 2. If we equate TXN_DT between EFF_ST_DT and EFF_END_DT in the 'where > clause' then we will loose the EUR records which should not happen. > > Please suggest a solution to over come both the problems. For duplicated > records it is fine if we select any of the CNTRY_DESC. > > *Table : CURRENCY* > > *Table : TXN* > > *CCY_CD* > > *CNTRY_DESC* > > *EFF_ST_DT* > > *EFF_END_DT* > > *ROW_NUM* > > *CCY_CD* > > *TXN_DT* > > INR > > Indian Rupee > > 1-Jan-15 > > 20-Feb-16 > > 1 > > INR > > 16-Feb-17 > > INR > > Indian Rupee New > > 21-Feb-16 > > 20-Feb-99 > > 2 > > USD > > 16-Feb-17 > > USD > > US Dollar > > 1-Jan-15 > > 20-Feb-16 > > 3 > > SGD > > 16-Feb-17 > > SGD > > Singapore Dollar > > 1-Jan-15 > > 20-Feb-17 > > 4 > > EUR > > 16-Feb-17 > > SGD > > Singapore Dollar New > > 15-Feb-17 > > 20-Feb-99 > > SGD > > Singapore Dollar Latest > > 16-Feb-17 > > 16-Feb-17 > > *Expected Output* > > *ROW_NUM* > > *CCY_CD* > > *TXN_DT* > > *CNTRY_DESC* > > 1 > > INR > > 16-Feb-17 > > Indian Rupee > > 2 > > USD > > 16-Feb-17 > > US Dollar > > 3 > > SGD > > 16-Feb-17 > > Singapore Dollar Latest (Any of three valid valid is fine) > > 4 > > EUR > > 16-Feb-17 > > <Null> > > > > *Query : *Select ROW_NUM,CCY_CD,TXN_DT,CNTRY_DESC > from CURRENCY LEFT JOIN TXN on (CURRENCY.CCY_CD = TXN.CCY_CD) > where > TXN_DT between EFF_ST_DT and EFF_END_DT; > > > > This query will drop the "EUR" record because of the where clause used. > It cannot be handled with case statement instead of 'where clause' as we > have more than one record for 'SGD' when TXN_DT is 16-FEB. > > regards, > Rams >