The "small" table can be any size. You want the small table to be /path/to/table/b here because that will result in more parallelism. There is a ticket on hive theta join that you might want to look at.
On Thu, Jul 10, 2014 at 10:23 PM, Malligarjunan S <malligarju...@gmail.com> wrote: > Hello Edwards, > > Thank you very much for the update. > What size you mean is small table. In our case the small table will have > minimum of 1 million records. > Can we use this UDTF? how much time improvement will be there? > > Appreciate your help! > Thanks and Regards > SankarS > > > On Thu, Jul 10, 2014 at 11:26 PM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: > >> There is no magic. Hopefully one table is smaller then the other. You >> could make a UDTF to do something like this MR job is doing >> >> Make a mapper that runs over table A. >> InputFormat.setInputPath("/path/to/table/a") >> >> Then inside the mapper >> >> private Conf c >> setup(Conf c){ >> this.c = c >> } >> public void map(Text key, Text value, Collector c){ >> FileSystem fs = Filesystem.get(c); >> file f =fs.open("/path/to/table/b") >> for (line in f){ >> c.collect( value + line); >> } >> } >> >> >> >> On Thu, Jul 10, 2014 at 12:56 PM, Malligarjunan S < >> malligarju...@gmail.com> wrote: >> >>> Hello Edward, >>> >>> Thank you very much for helping me. >>> I am new to hive. Could you please provide the sample map reduce job? >>> >>> Regards, >>> Sankar S >>> >>> >>> >>> >>> On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo <edlinuxg...@gmail.com> >>> wrote: >>> >>>> Hive cross product stinks . I have a map reduce job that will do it >>>> >>>> >>>> On Wednesday, July 9, 2014, Navis류승우 <navis....@nexr.com> wrote: >>>> >>>>> Yes, 2M x 1M makes 2T pairing in single reducer. >>>>> >>>>> Thanks, >>>>> Navis >>>>> >>>>> >>>>> 2014-07-10 1:50 GMT+09:00 Malligarjunan S <malligarju...@gmail.com>: >>>>> >>>>>> Hello All, >>>>>> Is that the expected behavior from hive to take so much of time? >>>>>> >>>>>> >>>>>> Thanks and Regards, >>>>>> Sankar S >>>>>> >>>>>> >>>>>> On Tue, Jul 8, 2014 at 11:23 PM, Malligarjunan S < >>>>>> malligarju...@gmail.com> wrote: >>>>>> >>>>>>> Hello All, >>>>>>> >>>>>>> Can any one help me to answer to my question posted on Stackoverflow? >>>>>>> >>>>>>> http://stackoverflow.com/questions/24416373/hive-udf-performance-too-slow >>>>>>> It is pretty urgent. Please help me. >>>>>>> >>>>>>> Thanks and Regards, >>>>>>> Sankar S. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> Sorry this was sent from mobile. Will do less grammar and spell check >>>> than usual. >>>> >>> >>> >> >