Yes, I understand that is how it works today and I'll use a cron job or
something to create these as needed.  I'll also look into the bucketing by
hour aspect.

That said, I'm suggesting that perhaps an alternative implementation (and
associated abstraction/plugability) may be of value.




On Mon, Apr 15, 2013 at 9:53 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote:

> whenever you create a partition in hive, it needs to be registered with the
> metadata store. So short answer would be partition data is looked from
> metadata store instead of  the actual source data.
> having a lot of partitions does slow down hive (around 10000+). Normally
> have not seen anyone using hourly partitions. You may want to look at
> adding daily partition and bucket by hour.
>
> but if you are adding data directly into partition directories then there
> is no alternative other than adding partitions to metadata store manually
> apart from alter partition.
>
> if you are using hcatalog as metadata store then it does provide an api to
> register your partition so you can automate your data loading and
> registering both in a single flow.
>
> Others will correct me if I have made any wrong assumption
>
>
> On Mon, Apr 15, 2013 at 8:15 PM, Steve Hoffman <ste...@goofy.net> wrote:
>
> > Looking for some pointers on where the partitioning is figured out in the
> > source when a query is executed.
> > I'm investigating an alternative partitioning scheme based on date
> patterns
> > (using external tables).
> >
> > The situation is that I have data being written to some HDFS root
> directory
> > with some dated pattern (i.e. YYYY/MM/DD).  Today I have to run an alter
> > table to insert this partition every day.  It gets worse if you have
> hourly
> > partitions.  This seems like it can be described once (root + date
> > partition pattern in the metastore).
> >
> > So looking for some pointers on where in the code this is currently
> > handled.
> >
> > Thanks,
> > Steve
> >
>
>
>
> --
> Nitin Pawar
>

Reply via email to