Raihan, There is no need to implement a custom mapper or reducer. If you are experiencing issues with performance you might consider to use bucketized tables and do a bucketed map join/ sorted merge map join. A good example of performance in joins can be found in this slide from Facebook: https://cwiki.apache.org/Hive/presentations.data/Hive%20Summit%202011-join.pdfbut basically you need to choose a good strategy depending on your data.
Regards, Esteban. -- Cloudera, Inc. On Thu, Jul 12, 2012 at 2:18 PM, Raihan Jamal <jamalrai...@gmail.com> wrote: > Sending it again. As I haven't got any reply on this. Any personal > experience will be appreciated. > > > > *Raihan Jamal* > > > > On Mon, Jul 9, 2012 at 3:37 PM, Raihan Jamal <jamalrai...@gmail.com>wrote: > >> *Problem Statement:-* >> >> I need to compare two tables Table1 and Table2 and they both store same >> thing. So I need to compare Table2 with Table1 as Table1 is the main >> table through which comparisons need to be made. So after comparing I need >> to make a report that Table2 has some sort of discrepancy. And these two >> tables has lots of data, around TB of data. So currently I have written >> HiveQL to do the comparisons and get the data back. >> >> So my question is which is better in terms of PERFORMANCE, writing a CUSTOM >> MAPPER and REDUCERto do this kind of job or the HiveQL that I wrote will >> be fine as I will be joining these two tables on millions of records. As >> far as I know HiveQL internally (behind the scenes) generates optimized >> custom map-reducer and submits for execution and gets back the results. >> >> >> *Raihan Jamal* >> >> >