We run a multi-AZ RDS instance hosting our metastore, which is shared by
multiple EMR clusters. We utilize RDS's backup/snapshot feature, although we
haven't encountered a need to restore from backup for real yet (knock on wood).
-Original Message-
From: Sam Wilson [mailto:swil...@moneta
Fellow users,
I created the table as follows using the mapreduce output file
CREATE EXTERNAL TABLE mytable (
word string, count int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS SEQUENCEFILE
LOCATION 's3://mydata/';
This is what i have in my reduce method, key is of type Text
We also do #4. Initially we had lots of conversations about all the other
options and we should do this or that... Ultimately we focused on just going
live as quickly as possible and getting more involved in the setup later.
Since then the only thing we've needed to do is hack a few o the basel
Mark,
We do 4), basically. We have a simple hive script that does all the "create
external table" statements, and we run that script as step 1 of the EMR jobs we
spin up. Then our "real" processing takes over in step 2 and beyond. We're only
working with about 50 tables, so it's pretty manageab
Hi all,
I am trying to get an idea of what people do for setting up Hive metastore when
using Amazon EMR.
For those of you using Amazon EMR:
1) Do you have a dedicated RDS instance external to your EMR Hive+Hadoop
cluster that you use as a persistent metastore for all your cluster
instantiatio
Sorry, query 1 should be:
create table tmp__imp as select requestbegintime, count(*) from impressions2
where requestbegintime<'1239572996000' group by requestbegintime;
-Original Message-
From: Lu, Wei [mailto:w...@microstrategy.com]
Sent: Wednesday, March 07, 2012 9:08 AM
To: user@hive.
Hi, Mark
Query 1 is:
1) create table tmp__imp as select requestbegintime, count(*) from impressions2
where requestbegintime<'1239572996000';
from tmp__imp
insert OVERWRITE LOCAL DIRECTORY '/disk2/is1' select * where
requestbegintime<'1239572956000'
insert OVERWRITE LOCAL DIRECTORY '/disk2/is
Hi users,
I have a sequence file produced by mapreduce with TEXT, INTWRITABLE key value
pair...I tried to create a external hive table using the file but hive can't
read it.
Thank you
Sent from my iPhone
*Hi all:
Is there any software or website which can rewrite complex query with
sub-query in the WHERE clause to be in the FROM clause?? So that it can be
supported by HIVE.*
*Best Regards*
*Eugene z. Von*
I guess LazyBinaryColumnarSerDe is not saving spaces, but is cpu efficient.
You tests aligns with our internal tests long time ago.
On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai wrote:
> Hi,
>
> Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
> general?
>
> Let me make my questio
When you start the CLI you are in the default database. This is rooted at
hive.warehouse.dir which is typically rooted at /user/hive/warehouse
If you create a database the default location is /user/hive/warehouse/
+ databasename +".db"
Although when you create the database you can set the locatio
Hello,
How different databases are distinguished within Hive? Do they correspond
to different HDFS directories?
Thank you inn advance for your reply,
Mahsa
Farah – The easiest way to dump data to a file is with a query like the
following:
hive> INSERT OVERWRITE LOCAL DIRECTORY 'DIRECTORY_NAME' SELECT * from
TABLE_NAME;
The drawback of this is that Hive uses ^A as the separator by default. In
the past what I found easiest was to just run a simple sed
Farah – can you configure the remote server as a client machine? You would
just need to install Hadoop with a configuration pointing to your cluster,
and then install Hive. You'd then be able to execute all Hive commands
against your cluster. Note that you won't run any daemons on this node, so
yo
Hi Wei,
In query 1, it's invalid to requestbegintime in the select list if it's not in
the group by clause. There doesn't seem to be a group by clause there. Is that
the right query?
Mark
Mark Grover, Business Intelligence Analyst
OANDA Corporation
www: oanda.com www: fxtrade.com
"Best Trad
Hi,
Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
general?
Let me make my question more specific.
I generated two tables from the table lineitem of TPC-H
using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
CREATE TABLE lineitem_rcfile_lazybinary
ROW FORMAT SERDE
Whats the easiest way to get a flat file out from a table in Hive?
I have a table in HIVE, that has millions of rows. I want to get a dump of this
table out in flat file format, and it should be comma separated.
Anyone knows the syntax to do it?
Thanks for the help!
Farah Omer
Senior DB Engin
Hi,
I tried to do aggregation based on Table impressions2, and then need to save
the results to two different local files (or tables).
I tried three methods, only the first one succeeded:
1) create a new table and store aggregation results to it, and then use
multi-insert to split the results t
18 matches
Mail list logo