Re: Merging small files with dynamic partitions

2010-11-12 Thread Dave Brondsema
I copied Hadoop19Shims' implementation of getCombineFileInputFormat (HIVE-1121) into Hadoop18Shims and it worked, if anyone is interested. And hopefully we can upgrade our Hadoop version soon :) On Fri, Nov 12, 2010 at 12:44 PM, Dave Brondsema wrote: > It seems that I can't use this with Hadoop

Re: Merging small files with dynamic partitions

2010-11-12 Thread Dave Brondsema
It seems that I can't use this with Hadoop 0.18 since the Hadoop18Shims.getCombineFileInputFormat returns null, and SemanticAnalyzer.java sets HIVEMERGEMAPREDFILES to false if CombineFileInputFormat is not supported. Is that right? Maybe I can copy the Hadoop19Shims implementation of getCombineFi

Re: Merging small files with dynamic partitions

2010-11-10 Thread yongqiang he
I think the problem was solved in hive trunk. You can just try hive trunk. On Wed, Nov 10, 2010 at 10:05 AM, Dave Brondsema wrote: > Hi, has there been any resolution to this?  I'm having the same trouble. >  With Hive 0.6 and Hadoop 0.18 and a dynamic partition > insert, hive.merge.mapredfiles d

Re: Merging small files with dynamic partitions

2010-11-10 Thread Dave Brondsema
Hi, has there been any resolution to this? I'm having the same trouble. With Hive 0.6 and Hadoop 0.18 and a dynamic partition insert, hive.merge.mapredfiles doesn't work. It works fine for a static partition insert. What I'm seeing is that even when I set hive.merge.mapredfiles=true, the jobcon

Re: Merging small files with dynamic partitions

2010-10-15 Thread Sammy Yu
Hi guys, Thanks for the response. I tried running without hive.mergejob.maponly with the same result. I've attached the explain extended output. I am running this query on EC2 boxes, however it's not running on EMR. Hive is running on top of a hadoop 0.20.2 setup.. Thanks, Sammy On Fri, O

Re: Merging small files with dynamic partitions

2010-10-15 Thread Ning Zhang
The output file shows it only have 2 jobs (the mapreduce job and the move task). This indicates that the plan does not have merge enabled. Merge should consists of a ConditionalTask and 2 sub tasks (a MR task and a move task). Can you send the plan of the query? One thing I noticed is that you

Re: Merging small files with dynamic partitions

2010-10-15 Thread Edward Capriolo
Sammy, This is not the exact remedy you were looking for, but my company open sourced our file crusher utility. http://www.jointhegrid.com/hadoop_filecrush/index.jsp We use it to good effect to turn many small files into one. Works with text and sequence files , and custom writables. Edward On