I recommend trying a daily partitioning scheme over an hourly one. We had a similar setup and ran into the same problem and ultimately found that daily works fine for us, even with larger file sizes.
At the very least it is worth evaluating. Sent from my iPhone On Jan 5, 2012, at 2:23 PM, Matt Vonkip <mattvon...@yahoo.com> wrote: > Shoot, I meant to reply to the group, not respond to Mark directly. (Mark > replied offline to me; not sure the etiquette in pasting that response in > here as well!) > > Hi Mark, thanks for the response! I tried using the memory-intensive > boostrap action and got a different error; however, I'm not sure if it > represents progress in the right direction or regression. (I thought the > memory-intensive script was for memory intensive map-reduce jobs -- not table > DDL. So I am wondering if it made things even worse.) > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > As for the other suggestion, I agree that 15k partitions (and growing) is > unruly; but, the files are not small! Each is over one gigabyte and > represents one hour from the past twenty months. I would imagine others must > have similar setups and have some way around my issue. Also, since it worked > in the older hadoop/hive stack, I'm suspicious that there is some > configuration item I should be able to tweak. > > In the meantime, I am tempted to drop the entire database and recreate from > scratch (since all tables are external anyway). If no solution is found, we > will probably look into some kind of hybrid system where older data is > archived in other tables and a union is used in queries. > > Sincerely, > Matt > >