ns do.
-Original Message-
From: Mark Grover [mailto:mgro...@oanda.com]
Sent: Monday, September 12, 2011 10:09 AM
To: user@hive.apache.org
Cc: Steven Wong; Travis Powell; Baiju Devani; Bob Tiernay
Subject: Re: Best practices for storing data on Hive
Thanks, Steven.
So, am I correct in understa
use on bucketed
column? All of the above?
Thanks in advance!
Mark
- Original Message -
From: "Edward Capriolo"
To: user@hive.apache.org
Cc: "Travis Powell", "Baiju Devani", "Bob
Tiernay"
Sent: Thursday, September 8, 2011 9:26:10 PM
Subject: Re: Best p
4:18 AM
To: user@hive.apache.org
Cc: Travis Powell; Baiju Devani; Bob Tiernay
Subject: Re: Best practices for storing data on Hive
Edward, Steven or anyone else on the mailing list:
Is it possible to optimize queries like the one below with bucketing?
select * from where user_id='blah'
the above?
Thanks in advance!
Mark
- Original Message -
From: "Edward Capriolo"
To: user@hive.apache.org
Cc: "Travis Powell" , "Baiju Devani" ,
"Bob Tiernay"
Sent: Thursday, September 8, 2011 9:26:10 PM
Subject: Re: Best practices for storing data on Hive
@hive.apache.org
> Cc: Travis Powell; Baiju Devani; Bob Tiernay
> Subject: Re: Best practices for storing data on Hive
>
> Thanks for your reply, Travis.
>
> I was under the impression that for Hive to make use of sorted structure
> of data (i.e. for the table named "data&q
(assuming that user_id is
not a partition column or indexed column).
-Original Message-
From: Mark Grover [mailto:mgro...@oanda.com]
Sent: Tuesday, September 06, 2011 2:36 PM
To: user@hive.apache.org
Cc: Travis Powell; Baiju Devani; Bob Tiernay
Subject: Re: Best practices for storing da
@oanda.com]
Sent: Tuesday, September 06, 2011 12:39 PM
To: user@hive.apache.org
Cc: wd; Bob Tiernay; Baiju Devani
Subject: Re: Best practices for storing data on Hive
Thanks for the response, wd.
I would REALLY APPRECIATE if other people can share their views as well.
Here are the possible solut
>However, given the amount of users that visit our website (hundreds of
>thousands of unique users every day), this would lead to a large number of
>partitions (and rather small file sizes, ranging from a >couple of bytes to a
>couple of KB). From the documentation I've read online, it seems th
Hi
You could choose to have the second table (for user ids) partitioned by date
also.
table_root/userid=ab/date=2010-12-31/
That way you can split your data set by both a userid and a date.
You can use dynamic partitions to transform existing date partitioned table
into userid/date partition
Original Message-
From: Mark Grover [mailto:mgro...@oanda.com]
Sent: Tuesday, September 06, 2011 12:39 PM
To: user@hive.apache.org
Cc: wd; Bob Tiernay; Baiju Devani
Subject: Re: Best practices for storing data on Hive
Thanks for the response, wd.
I would REALLY APPRECIATE if other people can
Thanks for the response, wd.
I would REALLY APPRECIATE if other people can share their views as well.
Here are the possible solutions that I have thought about to the problem
(see original email for description of problem):
1) Multiple partitions: We would partition the table by day and userI
Hive support more than one partitions, have your tried? Maybe you can
create to partitions named as date and user.
Hive 0.7 also support index, maybe you can have a try.
On Sat, Sep 3, 2011 at 1:18 AM, Mark Grover wrote:
> Hello folks,
> I am fairly new to Hive and am wondering if you could shar
12 matches
Mail list logo