RE: Amazon EMR Best Practices for Hive metastore

2012-03-06 Thread Steven Wong
We run a multi-AZ RDS instance hosting our metastore, which is shared by multiple EMR clusters. We utilize RDS's backup/snapshot feature, although we haven't encountered a need to restore from backup for real yet (knock on wood). -Original Message- From: Sam Wilson [mailto:swil...@moneta

Re: Hive table creation over sequence file

2012-03-06 Thread Weishung Chung
Fellow users, I created the table as follows using the mapreduce output file CREATE EXTERNAL TABLE mytable ( word string, count int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE LOCATION 's3://mydata/'; This is what i have in my reduce method, key is of type Text

Re: Amazon EMR Best Practices for Hive metastore

2012-03-06 Thread Sam Wilson
We also do #4. Initially we had lots of conversations about all the other options and we should do this or that... Ultimately we focused on just going live as quickly as possible and getting more involved in the setup later. Since then the only thing we've needed to do is hack a few o the basel

RE: Amazon EMR Best Practices for Hive metastore

2012-03-06 Thread Jeff Sternberg
Mark, We do 4), basically. We have a simple hive script that does all the "create external table" statements, and we run that script as step 1 of the EMR jobs we spin up. Then our "real" processing takes over in step 2 and beyond. We're only working with about 50 tables, so it's pretty manageab

Amazon EMR Best Practices for Hive metastore

2012-03-06 Thread Mark Grover
Hi all, I am trying to get an idea of what people do for setting up Hive metastore when using Amazon EMR. For those of you using Amazon EMR: 1) Do you have a dedicated RDS instance external to your EMR Hive+Hadoop cluster that you use as a persistent metastore for all your cluster instantiatio

RE: how to split query result into several smaller tables without creating temp table??

2012-03-06 Thread Lu, Wei
Sorry, query 1 should be: create table tmp__imp as select requestbegintime, count(*) from impressions2 where requestbegintime<'1239572996000' group by requestbegintime; -Original Message- From: Lu, Wei [mailto:w...@microstrategy.com] Sent: Wednesday, March 07, 2012 9:08 AM To: user@hive.

RE: how to split query result into several smaller tables without creating temp table??

2012-03-06 Thread Lu, Wei
Hi, Mark Query 1 is: 1) create table tmp__imp as select requestbegintime, count(*) from impressions2 where requestbegintime<'1239572996000'; from tmp__imp insert OVERWRITE LOCAL DIRECTORY '/disk2/is1' select * where requestbegintime<'1239572956000' insert OVERWRITE LOCAL DIRECTORY '/disk2/is

Hive table creation over sequence file

2012-03-06 Thread Wei Shung Chung
Hi users, I have a sequence file produced by mapreduce with TEXT, INTWRITABLE key value pair...I tried to create a external hive table using the file but hive can't read it. Thank you Sent from my iPhone

About Re-write complex SQL query

2012-03-06 Thread shule ney
*Hi all: Is there any software or website which can rewrite complex query with sub-query in the WHERE clause to be in the FROM clause?? So that it can be supported by HIVE.* *Best Regards* *Eugene z. Von*

Re: ColumnarSerDe and LazyBinaryColumnarSerDe

2012-03-06 Thread yongqiang he
I guess LazyBinaryColumnarSerDe is not saving spaces, but is cpu efficient. You tests aligns with our internal tests long time ago. On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai wrote: > Hi, > > Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in > general? > > Let me make my questio

Re: Hive databases

2012-03-06 Thread Edward Capriolo
When you start the CLI you are in the default database. This is rooted at hive.warehouse.dir which is typically rooted at /user/hive/warehouse If you create a database the default location is /user/hive/warehouse/ + databasename +".db" Although when you create the database you can set the locatio

Hive databases

2012-03-06 Thread mahsa mofidpoor
Hello, How different databases are distinguished within Hive? Do they correspond to different HDFS directories? Thank you inn advance for your reply, Mahsa

Re: How to get a flat file out of a table in Hive

2012-03-06 Thread Jonathan Seidman
Farah – The easiest way to dump data to a file is with a query like the following: hive> INSERT OVERWRITE LOCAL DIRECTORY 'DIRECTORY_NAME' SELECT * from TABLE_NAME; The drawback of this is that Hive uses ^A as the separator by default. In the past what I found easiest was to just run a simple sed

Re: How to load a table from external server....

2012-03-06 Thread Jonathan Seidman
Farah – can you configure the remote server as a client machine? You would just need to install Hadoop with a configuration pointing to your cluster, and then install Hive. You'd then be able to execute all Hive commands against your cluster. Note that you won't run any daemons on this node, so yo

Re: how to split query result into several smaller tables without creating temp table??

2012-03-06 Thread Mark Grover
Hi Wei, In query 1, it's invalid to requestbegintime in the select list if it's not in the group by clause. There doesn't seem to be a group by clause there. Is that the right query? Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com "Best Trad

ColumnarSerDe and LazyBinaryColumnarSerDe

2012-03-06 Thread Yin Huai
Hi, Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in general? Let me make my question more specific. I generated two tables from the table lineitem of TPC-H using ColumnarSerDe and LazyBinaryColumnarSerDe as follows... CREATE TABLE lineitem_rcfile_lazybinary ROW FORMAT SERDE

How to get a flat file out of a table in Hive

2012-03-06 Thread Omer, Farah
Whats the easiest way to get a flat file out from a table in Hive? I have a table in HIVE, that has millions of rows. I want to get a dump of this table out in flat file format, and it should be comma separated. Anyone knows the syntax to do it? Thanks for the help! Farah Omer Senior DB Engin

how to split query result into several smaller tables without creating temp table??

2012-03-06 Thread Lu, Wei
Hi, I tried to do aggregation based on Table impressions2, and then need to save the results to two different local files (or tables). I tried three methods, only the first one succeeded: 1) create a new table and store aggregation results to it, and then use multi-insert to split the results t