Chandra, You don't necessary need java to implement the mapper/reducer. Checkout the answer in this post: http://stackoverflow.com/questions/6178614/custom-map-reduce-program-on-hive-whats-the-rulehow-about-input-and-output
also in my sample, A.column1, A.column2 ==> mymapper ==> key, value, and myapper simply read from std.in, and convert to key,value. Chen On Mon, Feb 3, 2014 at 5:51 AM, Bogala, Chandra Reddy <chandra.bog...@gs.com > wrote: > Hi Wang, > > I am first time trying MAP & Reduce inside hive query. Is it possible > to share mymapper and myreducer code? So that I can understand how the > columns (A.column1,A.... to key, value) converted? Also can you point me > to some documents to read more about it. > > Thanks, > > Chandra > > > > > > *From:* Chen Wang [mailto:chen.apache.s...@gmail.com] > *Sent:* Monday, February 03, 2014 12:26 PM > *To:* user@hive.apache.org > *Subject:* Re: Hadoop streaming with insert dynamic partition generate > many small files > > > > it seems that hive.exec.reducers.bytes.per.reducer is still not big > enough: I added another 0, and now i only gets one file under each > partition. > > > > On Sun, Feb 2, 2014 at 10:14 PM, Chen Wang <chen.apache.s...@gmail.com> > wrote: > > Hi, > > I am using java reducer reading from a table, and then write to another > one: > > FROM ( > > FROM ( > > SELECT column1,... > > FROM table1 > > WHERE ( partition > 6 and partition < 12 ) > > ) A > > MAP A.column1,A.... > > USING 'java -cp .my.jar mymapper.mymapper' > > AS key, value > > CLUSTER BY key > > ) map_output > > INSERT OVERWRITE TABLE target_table PARTITION(partition) > > REDUCE > > map_output.key, > > map_output.value > > USING 'java -cp .:myjar.jar myreducer.myreducer' > > AS column1,column2;" > > Its all working fine, except that there are many (20-30) small files > generated under each partition. i am setting SET > hive.exec.reducers.bytes.per.reducer=1280,000,000; hoping to get one big > enough file under for each partition.But it does not seem to have any > effect. I still get 20-30 small files under each folder, and each file size > is around 7kb. > > How can I force to generate only 1 big file for one partition? Does this > have anything to do with the streaming? I recall in the past i was directly > reading from a table with UDF, and write to another table, it only > generates one big file for the target partition. Not sure why is that. > > > > Any help appreciated. > > Thanks, > > Chen > > > > > > >