Re: date format question

2012-09-17 Thread Ashish Thusoo
You could use the unix_timestamp function ... unix_timestamp(ts, ) >= unix_timestamp("2012-09-10", '-MM-dd') ... something on those lines Also checkout https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-DateFunctions for more datetime functions in Hive. Ashish On Mon,

Re: Hive client / thrift service query submission auditing

2012-09-12 Thread Ashish Thusoo
Hey Matt, We did something similar at Facebook to capture the information on who ran what on the clusters and dumped that out to an audit db. Specifically we were using Hive post execution hooks to achive that http://hive.apache.org/docs/r0.7.0/api/org/apache/hadoop/hive/ql/hooks/PostExecute.html

Re: Converting rows into dynamic colums in Hive

2012-08-07 Thread Ashish Thusoo
you should be able to do this in hive using a group by on alpha and then using a combination of the max and if statement... something on the following lines select alpha, max(abc), max(pqr), ... ( select alpha, if (beta == 'abc', Gamma, NULL) as abc, if (beta == 'pqr', Gamma, NUL) as pqr,

Re: SQL help

2012-05-24 Thread Ashish Thusoo
Hi Mohit, Hive does not support window functions afaik. The following link might be useful if you can bring that in... https://github.com/hbutani/SQLWindowing/wiki Not sure if this is being brought into trunk at some point... Ashish On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia wrote: > I a

Re: how to select without Mapreduce after index build?

2012-05-11 Thread Ashish Thusoo
Indexing in Hive works through map/reduce. There are no active components in Hive (such as the region servers in Hbase), so the way the index is basically used is by running the map/reduce job on the table that holds the index data to get all the relevant offsets into the main table and then using

Re: Passing date as hive configuration variable

2012-05-10 Thread Ashish Thusoo
I think you have to put quotes around the variable to tell give that you are comparing against a string... Ashish On May 10, 2012 2:06 PM, "Saurabh S" wrote: > > I'm having a hard time passing a date as a hive environment variable. > > The setting is this: The table I'm querying is partitioned o

Re: Dimensional Data Model on Hive

2012-05-10 Thread Ashish Thusoo
Also of most of the things that you will be doing is full scans as opposed to needle in haystack queries there is usually no point in paying the overhead of running hbase region servers. Only if your data is heavily accessed by a key is the overhead of hbase justified. Another case could be when pa

Re: Dynamic partitions in Hive

2012-04-25 Thread Ashish Thusoo
According to https://cwiki.apache.org/Hive/dynamicpartitions.html for dynamic partitions the partition clause must look like PARTITION(year, month, edate) the actual expressions should be included in the select list. So in your example the select list should look something like SELECT sh.EVENT

Re: Hive on Standalone Machine

2012-04-25 Thread Ashish Thusoo
Hive needs the hadoop jars to talk to hadoop. The machine that it is installed on has to have those jars installed. However, it does not need to be a "part" of the hadoop cluster in the sense that it does not need to have a TaskTracker or DataNode running. The machine can operate purely as a client

Re: Logging MySQL queries

2011-05-23 Thread Ashish Thusoo
you will have to write a pre execute or post execute hook to do this. The Hook api is at http://hive.apache.org/docs/r0.7.0/api/org/apache/hadoop/hive/ql/hooks/package-summary.html and then specify your

Re: Can Hive 0.7 Rebuild partitions ?

2011-05-19 Thread Ashish Thusoo
afaik there is nothing like that currently. File a feature for this on the JIRA? Ashish On May 19, 2011, at 2:25 AM, Jasper Knulst wrote: > Hi, > > I have a partitioned external table on Hive 0.7. New subfolders are regularly > added to the base table HDFS folder. > I now have to perform this

Re: Strategy for Loading Apache Logs

2011-05-11 Thread Ashish Thusoo
you could always have another sub partition under the daily partition. This sub partition could be the timestamp on when you did the load. So when you run the statement you would create a new sub partition within the date partition and in effect you end up doing an append to the Hive partition.

Re: Implementing conditional and control statements in Hive

2011-05-11 Thread Ashish Thusoo
With streaming, UDF or UDTFs you would get almost any kind of control flow you want without having those features implemented in Hive proper. For udf, udaf or udtf you use java for implementation. In streaming you can use any language of your choice. Not sure if this addresses stuff? Ashish On

Re: Cross join in Hive.

2011-05-02 Thread Ashish Thusoo
you could probably just say (1 = 1) in the on clause for the join. set hive.mapred.mode=nonstrict; select ... from T1 join T2 on (1 = 1); Ashish On May 1, 2011, at 10:27 PM, Raghunath, Ranjith wrote: Forgot to mentionthe condition for the inner join should be the column set to 1 in the fir

Re: insert - Hadoop vs. Hive

2011-03-30 Thread Ashish Thusoo
If the data is already in the right format you should use LOAD syntax in Hive. This basically moves files into hdfs (so it should be not less performant than hdfs). If the data is not in the correct format or it needs to be transformed then the insert statement needs to be used. Ashish On Mar 3

Re: left outer join and nulls

2011-02-18 Thread Ashish Thusoo
You could use the following construct if ( T.c is null, 0, T.c) Checkout the conditional functions at http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF#Conditional_Functions Ashish On Feb 18, 2011, at 12:01 AM, Ca

Re: Efficient mechanism to simulate the row level updates in Hive

2011-02-16 Thread Ashish Thusoo
This is quite difficult to do in Hive on Hadoop. Hive over Hadoop really does not support row level updates so basically you are reduced to periodically merging the raw stream of updates with the main table and generating a new snapshot of the table. Another possible approach could be to use hba

Re: [VOTE] Bylaws for Apache Hive Project

2010-10-27 Thread Ashish Thusoo
With 10 +1s ad 0 -1s this vote passes. Thanks, Ashish On Oct 26, 2010, at 10:40 AM, Ashish Thusoo wrote: I have also added the clarification on the approval process for code changes. So far we have 6 binding +1s and 4 non-binding +1s There are no -1s The vote will be open till tomorrow 3

RE: [VOTE] Bylaws for Apache Hive Project

2010-10-26 Thread Ashish Thusoo
From: Carl Steinbach [c...@cloudera.com] Sent: Monday, October 25, 2010 8:40 PM To: user@hive.apache.org Subject: Re: [VOTE] Bylaws for Apache Hive Project +1 On Mon, Oct 25, 2010 at 2:55 PM, Alan Gates mailto:ga...@yahoo-inc.com>> wrote: On Oct 25, 2010, at 2:18 PM, Ashish Thusoo wrote:

RE: [VOTE] Bylaws for Apache Hive Project

2010-10-25 Thread Ashish Thusoo
un, Oct 24, 2010 at 2:23 PM, Zheng Shao wrote: > +1 > > > On Oct 22, 2010, at 3:34 PM, Ashish Thusoo wrote: > >> I knew I was going to miss a pig somewhere... :) >> >> Ashish >> >> Sent from my iPhone >> >> On Oct 22, 2010, at 2:55 PM, &q

Re: [VOTE] Bylaws for Apache Hive Project

2010-10-22 Thread Ashish Thusoo
I knew I was going to miss a pig somewhere... :) Ashish Sent from my iPhone On Oct 22, 2010, at 2:55 PM, "John Sichi" wrote: > Hive users etc are encouraged to vote too :) > > JVS (gotta love cut-and-paste) > > On Oct 22, 2010, at 2:51 PM, Ashish Thusoo wrote: >

[VOTE] Bylaws for Apache Hive Project

2010-10-22 Thread Ashish Thusoo
Hi Folks, I propose that we adopt the following bylaws for the Apache Hive Project https://cwiki.apache.org/HIVE/bylaws.html These are basically a cut-and-paste job of the Apache Pig bylaws that were recently proposed by Alan Gates. We will keep the vote open for 6 business days. In order for