Yes, I understand that is how it works today and I'll use a cron job or something to create these as needed. I'll also look into the bucketing by hour aspect.
That said, I'm suggesting that perhaps an alternative implementation (and associated abstraction/plugability) may be of value. On Mon, Apr 15, 2013 at 9:53 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > whenever you create a partition in hive, it needs to be registered with the > metadata store. So short answer would be partition data is looked from > metadata store instead of the actual source data. > having a lot of partitions does slow down hive (around 10000+). Normally > have not seen anyone using hourly partitions. You may want to look at > adding daily partition and bucket by hour. > > but if you are adding data directly into partition directories then there > is no alternative other than adding partitions to metadata store manually > apart from alter partition. > > if you are using hcatalog as metadata store then it does provide an api to > register your partition so you can automate your data loading and > registering both in a single flow. > > Others will correct me if I have made any wrong assumption > > > On Mon, Apr 15, 2013 at 8:15 PM, Steve Hoffman <ste...@goofy.net> wrote: > > > Looking for some pointers on where the partitioning is figured out in the > > source when a query is executed. > > I'm investigating an alternative partitioning scheme based on date > patterns > > (using external tables). > > > > The situation is that I have data being written to some HDFS root > directory > > with some dated pattern (i.e. YYYY/MM/DD). Today I have to run an alter > > table to insert this partition every day. It gets worse if you have > hourly > > partitions. This seems like it can be described once (root + date > > partition pattern in the metastore). > > > > So looking for some pointers on where in the code this is currently > > handled. > > > > Thanks, > > Steve > > > > > > -- > Nitin Pawar >