Re: Order by Sort by partitioned columns

2012-05-15 Thread Sukhendu Chakraborty
I think in HIVE the partitioned columns are virtual. They are not physical data columns but a directory name corr. to the partition key value to facilitate partition pruning during select. On May 14, 2012 6:29 AM, "Mark Grover" wrote: > Hi Shin, > If you could list the query that failed and the q

Re: What's the right data storage/representation?

2012-05-15 Thread Mark Grover
Hi Jon, First off, processing the data from (say, Apache) logs in Hive and storing aggregates in a reporting server like you mentioned is a fairly common paradigm. You have some large scale data (Apache logs) and some dimension data (user data). The problem you really have is how to make use of

Re: What's the right data storage/representation?

2012-05-15 Thread shrikanth shankar
Hive tables can sit on top of S3 storage so you dont really need a separate export process thanks, Shrikanth On May 15, 2012, at 11:35 AM, Jon Palmer wrote: > That seems like a very reasonable approach. However, if we use a technology > like Amazon Elastic Map Reduce my Hive cluster is (potenti

RE: What's the right data storage/representation?

2012-05-15 Thread Jon Palmer
That seems like a very reasonable approach. However, if we use a technology like Amazon Elastic Map Reduce my Hive cluster is (potentially) going to be destroyed and recreated. As a result I'd really need to export the update history Hive table to some other store (like S3) so that it can be re-

Re: how to select without Mapreduce after index build?

2012-05-15 Thread shrikanth shankar
For one, your data size is so small that I am not sure that indexes would help (the fixed cost of the extra MR job would probably over shadow any benefits from indexes). AFAIK the difference b/w compact and bitmap indexes is how they store the mapping from values to the rows in which the val

Re: What's the right data storage/representation?

2012-05-15 Thread shrikanth shankar
I would agree on keeping track of the history of updates in a separate table in Hive (you may not need to maintain it in the application tier). This pattern seems to be the "Slowly Changing Dimension" pattern used in other (more traditional) Data Warehouses... I suspect the challenge here would

Re: What's the right data storage/representation?

2012-05-15 Thread Owen O'Malley
On Tue, May 15, 2012 at 5:11 AM, Jon Palmer wrote: > I can see a few potential solutions: > > 1.   Don’t solve it. Accept that you have some artifacts in your > reporting data that cannot be recovered from the source data. > > 2.   Create status and location history tables in the applicati

Re: Date format - any easier way

2012-05-15 Thread Philip Tromans
I knocked up the following when we were experimenting with Hive. I've been meaning to go and tidy it up for a while, but using it with a separator of "" (empty string) should have the desired effect. (Obviously the UDF throws an exception if the array is empty, been meaning to fix that for a while.

Re: Date format - any easier way

2012-05-15 Thread Nitin Pawar
I will write an UDF for array concatenation and upload on GIT if anyone does not have it already On Tue, May 15, 2012 at 7:24 PM, Zoltán Tóth-Czifra < zoltan.tothczi...@softonic.com> wrote: > Matt, thanks! > > Luckily the order of the parts of the date is correct (reordering them > would bet he

RE: Date format - any easier way

2012-05-15 Thread Zoltán Tóth-Czifra
Matt, thanks! Luckily the order of the parts of the date is correct (reordering them would bet he same craziness). Finally it is: regexp_replace( date_sub( to_date( from_unixtime( unix_timestamp() ) ), 1 ), "[-]", "" ) Nitin, concat apparently doesn't take arrays, and I did not find any other

RE: Date format - any easier way

2012-05-15 Thread Tucker, Matt
What about wrapping it in regexp_replace(..., "[-]", "") ? It may not be the cleanest, but I'd recommend passing variables from the shell :) Matt Tucker From: Zoltán Tóth-Czifra [mailto:zoltan.tothczi...@softonic.com] Sent: Tuesday, May 15, 2012 9:27 AM To: user@hive.apache.org Subject: RE: Dat

Re: Date format - any easier way

2012-05-15 Thread Nitin Pawar
may be something like this will work can you try using concat(split(date_sub(),"-"))) split returns the array and then you can concat them as you want if this does not work for you, writing a simple UDF is easy as well Thanks, nitin On Tue, May 15, 2012 at 6:56 PM, Zoltán Tóth-Czifra < zoltan

RE: Date format - any easier way

2012-05-15 Thread Zoltán Tóth-Czifra
Nitin, Thank you. As you see below I know and use this function. My problem is that it doesn't give MMDD format, but -MM-DD instead, and formatting is not trivial as you can see it too. From: Nitin Pawar [nitinpawar...@gmail.com] Sent: Tuesday, May 15, 2

Re: Date format - any easier way

2012-05-15 Thread Nitin Pawar
you may want to have a look at this function date_sub(string startdate, int days)Subtract a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30' On Tue, May 15, 2012 at 6:41 PM, Zoltán Tóth-Czifra < zoltan.tothczi...@softonic.com> wrote: > Hi guys, > > Thanks you very much in a

Date format - any easier way

2012-05-15 Thread Zoltán Tóth-Czifra
Hi guys, Thanks you very much in advance for your help. My problem in short is getting the date for yesterday in a MMDD format. As I use this format for partitions, I need this format in quite some queries. So far I have this: concat( year( date_sub( to_date( from_unixtime( unix_timestamp(

What's the right data storage/representation?

2012-05-15 Thread Jon Palmer
All, I'm a relative newcomer to Hadoop/Hive. We have a very standard setup of multiple webapp servers backed by a mySql database. We are evaluating Hive as a high scale solution for our relatively sophisticated reporting and analytics needs. However, it's not clear what the best practices are a

Re: Is my Use Case possible with Hive?

2012-05-15 Thread Nitin Pawar
the problem with hive server with jdbc currently is that it does not handle concurrent connection in a seamless manner and chokes down on larger number of parallel query executions. For this one reason, I had actually written a pipeline kind of infra using shell scripts which used to run queries a

Re: Is my Use Case possible with Hive?

2012-05-15 Thread Bhavesh Shah
Thanks all for their replies. Just now I tried one thing that as folows: 1) I open tho two hive CLI. hive> 2) I have one query which takes 7 jobs for execution. I submitted that query to both the CLI. 3) one of the hive CLI took 147.319 seconds and second one took: 161.542 seconds 4) Later I trie

Edit access to Wiki

2012-05-15 Thread Lars Francke
Hi, I'd like to document HIVE-2810 and possibly HIVE-1634 in the wiki. Could I get edit access? My username is lars.francke. I think it'd be great to have a policy to accept patches only with documentation. There's a lot of stuff in Hive that people I know don't use because it's not documented an