> e.g File produce by the camus job: /user/[hive.user]/output/ > *partition_month_utc=2015-03/partition_day_utc=2015-03-11/partition_minute_bucket=2015-03-11-02-09/*
Bhavesh, how do you get Camus to write into a directory hierarchy like this? Is it reading the partition values from your messages' timestamps? > On Mar 11, 2015, at 11:29, Bhavesh Mistry <mistry.p.bhav...@gmail.com> wrote: > > HI Yang, > > We do this today camus to hive (without the Avro) just plain old tab > separated log line. > > We use the hive -f command to add dynamic partition to hive table: > > Bash Shell Scripts add time buckets into HIVE table before camus job runs: > > for partition in "${@//\//,}"; do > echo "ALTER TABLE ${env:TABLE_NAME} ADD IF NOT EXISTS PARTITION > ($partition);" > done | hive -f > > > e.g File produce by the camus job: /user/[hive.user]/output/ > *partition_month_utc=2015-03/partition_day_utc=2015-03-11/partition_minute_bucket=2015-03-11-02-09/* > > Above will add hive dynamic partition before camus job runs. It works, and > you can have any schema: > > CREATE EXTERNAL TABLE IF NOT EXISTS ${env:TABLE_NAME} ( > SOME Table FIELDS... > ) > PARTITIONED BY ( > partition_month_utc STRING, > partition_day_utc STRING, > partition_minute_bucket STRING > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > STORED AS SEQUENCEFILE > LOCATION '${env:TABLE_LOCATION_CAMUS_OUTPUT}' > ; > > > I hope this will help ! You will have to construct hive query according > to partition define. > > Thanks, > > Bhavesh > > On Wed, Mar 11, 2015 at 7:24 AM, Andrew Otto <ao...@wikimedia.org> wrote: > >>> Hive provides the ability to provide custom patterns for partitions. You >>> can use this in combination with MSCK REPAIR TABLE to automatically >> detect >>> and load the partitions into the metastore. >> >> I tried this yesterday, and as far as I can tell it doesn’t work with a >> custom partition layout. At least not with external tables. MSCK REPAIR >> TABLE reports that there are directories in the table’s location that are >> not partitions of the table, but it wouldn’t actually add the partition >> unless the directory layout matched Hive’s default >> (key1=value1/key2=value2, etc.) >> >> >> >>> On Mar 9, 2015, at 17:16, Pradeep Gollakota <pradeep...@gmail.com> >> wrote: >>> >>> If I understood your question correctly, you want to be able to read the >>> output of Camus in Hive and be able to know partition values. If my >>> understanding is right, you can do so by using the following. >>> >>> Hive provides the ability to provide custom patterns for partitions. You >>> can use this in combination with MSCK REPAIR TABLE to automatically >> detect >>> and load the partitions into the metastore. >>> >>> Take a look at this SO >>> >> http://stackoverflow.com/questions/24289571/hive-0-13-external-table-dynamic-partitioning-custom-pattern >>> >>> Does that help? >>> >>> >>> On Mon, Mar 9, 2015 at 1:42 PM, Yang <teddyyyy...@gmail.com> wrote: >>> >>>> I believe many users like us would export the output from camus as a >> hive >>>> external table. but the dir structure of camus is like >>>> /YYYY/MM/DD/xxxxxx >>>> >>>> while hive generally expects /year=YYYY/month=MM/day=DD/xxxxxx if you >>>> define that table to be >>>> partitioned by (year, month, day). otherwise you'd have to add those >>>> partitions created by camus through a separate command. but in the >> latter >>>> case, would a camus job create >1 partitions ? how would we find out the >>>> YYYY/MM/DD values from outside ? ---- well you could always do >> something by >>>> hadoop dfs -ls and then grep the output, but it's kind of not clean.... >>>> >>>> >>>> thanks >>>> yang >>>> >> >>