RE: Best practices for storing data on Hive

2011-09-12 Thread Steven Wong
ns do. -Original Message- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Monday, September 12, 2011 10:09 AM To: user@hive.apache.org Cc: Steven Wong; Travis Powell; Baiju Devani; Bob Tiernay Subject: Re: Best practices for storing data on Hive Thanks, Steven. So, am I correct in understa

Re: Best practices for storing data on Hive

2011-09-12 Thread Mark Grover
use on bucketed column? All of the above? Thanks in advance! Mark - Original Message - From: "Edward Capriolo" To: user@hive.apache.org Cc: "Travis Powell", "Baiju Devani", "Bob Tiernay" Sent: Thursday, September 8, 2011 9:26:10 PM Subject: Re: Best p

RE: Best practices for storing data on Hive

2011-09-09 Thread Steven Wong
4:18 AM To: user@hive.apache.org Cc: Travis Powell; Baiju Devani; Bob Tiernay Subject: Re: Best practices for storing data on Hive Edward, Steven or anyone else on the mailing list: Is it possible to optimize queries like the one below with bucketing? select * from where user_id='blah'

Re: Best practices for storing data on Hive

2011-09-09 Thread Mark Grover
the above? Thanks in advance! Mark - Original Message - From: "Edward Capriolo" To: user@hive.apache.org Cc: "Travis Powell" , "Baiju Devani" , "Bob Tiernay" Sent: Thursday, September 8, 2011 9:26:10 PM Subject: Re: Best practices for storing data on Hive

Re: Best practices for storing data on Hive

2011-09-08 Thread Edward Capriolo
@hive.apache.org > Cc: Travis Powell; Baiju Devani; Bob Tiernay > Subject: Re: Best practices for storing data on Hive > > Thanks for your reply, Travis. > > I was under the impression that for Hive to make use of sorted structure > of data (i.e. for the table named "data&q

RE: Best practices for storing data on Hive

2011-09-08 Thread Steven Wong
(assuming that user_id is not a partition column or indexed column). -Original Message- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Tuesday, September 06, 2011 2:36 PM To: user@hive.apache.org Cc: Travis Powell; Baiju Devani; Bob Tiernay Subject: Re: Best practices for storing da

Re: Best practices for storing data on Hive

2011-09-06 Thread Mark Grover
@oanda.com] Sent: Tuesday, September 06, 2011 12:39 PM To: user@hive.apache.org Cc: wd; Bob Tiernay; Baiju Devani Subject: Re: Best practices for storing data on Hive Thanks for the response, wd. I would REALLY APPRECIATE if other people can share their views as well. Here are the possible solut

RE: Best practices for storing data on Hive

2011-09-06 Thread Aggarwal, Vaibhav
>However, given the amount of users that visit our website (hundreds of >thousands of unique users every day), this would lead to a large number of >partitions (and rather small file sizes, ranging from a >couple of bytes to a >couple of KB). From the documentation I've read online, it seems th

RE: Best practices for storing data on Hive

2011-09-06 Thread Aggarwal, Vaibhav
Hi You could choose to have the second table (for user ids) partitioned by date also. table_root/userid=ab/date=2010-12-31/ That way you can split your data set by both a userid and a date. You can use dynamic partitions to transform existing date partitioned table into userid/date partition

RE: Best practices for storing data on Hive

2011-09-06 Thread Travis Powell
Original Message- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Tuesday, September 06, 2011 12:39 PM To: user@hive.apache.org Cc: wd; Bob Tiernay; Baiju Devani Subject: Re: Best practices for storing data on Hive Thanks for the response, wd. I would REALLY APPRECIATE if other people can

Re: Best practices for storing data on Hive

2011-09-06 Thread Mark Grover
Thanks for the response, wd. I would REALLY APPRECIATE if other people can share their views as well. Here are the possible solutions that I have thought about to the problem (see original email for description of problem): 1) Multiple partitions: We would partition the table by day and userI

Re: Best practices for storing data on Hive

2011-09-04 Thread wd
Hive support more than one partitions, have your tried? Maybe you can create to partitions named as date and user. Hive 0.7 also support index, maybe you can have a try. On Sat, Sep 3, 2011 at 1:18 AM, Mark Grover wrote: > Hello folks, > I am fairly new to Hive and am wondering if you could shar