Re: Extract and Load to Hadoop via Pipes

2011-05-26 Thread Time Less
I had trouble with Sqoop, so here's what I do (Perl): $cmd = qq#echo "select * from $tableName where $dateColumn >= '$dayStart 00:00:00' and $dateColumn < '$dayEnd 00:00:00'" \\ | mysql -h $dwIP --quick -B --skip-column-names --user=$USER --password=$PASS $databaseName \\ | ssh hdfs\@$

RE: Multiple Hive queries at once

2011-05-26 Thread Steven Wong
There are jiras saying that Hive Server has issues with concurrent queries, such as HIVE-80 and HIVE-1019. (Hive Server, not Hive CLI.) I don't use Hive Server heavily, so I cannot confirm or deny. From: jonathan.hw...@accenture.com [mailto:jonathan.hw...@accenture.com] Sent: Wednesday, May 25,

Re: questions about statistics in 0.7

2011-05-26 Thread Ning Zhang
On May 26, 2011, at 1:28 PM, Guy Bayes wrote: Crap sorry hit send too early questions 1: Job overhead of generating statistics on the fly with set hive.stats.autogather=true;? Overhead is minimum. The only accountable overhead is to insert a row into a RDBMS/HBase at the end of a task. At the

Re: Hive assert()?

2011-05-26 Thread Edward Capriolo
You can write a UDF. If it throws an exception from the UDF that will end your hive job. On Thu, May 26, 2011 at 5:46 PM, Igor Tatarinov wrote: > Here is one example. I want to make sure I don't have negative prices in my > data. I would like to write something like: > > assert_empty(select * fr

Re: Hive assert()?

2011-05-26 Thread Igor Tatarinov
Here is one example. I want to make sure I don't have negative prices in my data. I would like to write something like: assert_empty(select * from Prices where price <= 0); as part of my Hive script. I expect my job to fail if there are negative prices in the data. I understand there is high cos

Re: Providing multiple hints in the same query

2011-05-26 Thread Viral Bajaria
You should be able to comma separate multiple hints i.e. /*+ MAPJOIN(a), MAPJOIN(b), MAPJOIN(c) */ -Viral On Thu, May 26, 2011 at 2:26 PM, Shantian Purkad wrote: > Hello, > > We want to provide hints to hive to use mapside join on multiple tables. > something like /*+ MAPJOIN(a, b,c) */ > > Is

Providing multiple hints in the same query

2011-05-26 Thread Shantian Purkad
Hello, We want to provide hints to hive to use mapside join on multiple tables. something like /*+ MAPJOIN(a, b,c) */ Is it possible? If yes, what is the syntax for the same? Thanks and Regards, Shantian

Re: Hive assert()?

2011-05-26 Thread Alex Kozlov
1) Would `select count(1) from (query)` do the same thing? I am a bit confused what is the semantic of assert: is it just no rows or some kind of syntax error check? 2) Hive is not an OLTP and is not optimized for single row inserts (or updates for this matter). In a trivial implementation one wo

Hive assert()?

2011-05-26 Thread Igor Tatarinov
I would like to implement some kind of assert functionality in Hive QL. Here is how I do it in MySQL. I can assert that a given query returns no (bad) rows by creating a table with one row containing '1' and a unique index. Then, I try to insert into that table select 1 from (query). If the query

Re: questions about statistics in 0.7

2011-05-26 Thread Guy Bayes
Crap sorry hit send too early questions 1: Job overhead of generating statistics on the fly with set hive.stats.autogather=true;? 2: Is stat descriptions in describe table extended implemented? I've gathered stats on a table but do not see the expected entries (rowNum = , etc) in the describe st

questions about statistics in 0.7

2011-05-26 Thread Guy Bayes
Hello all, I'm new to this list, I was wondering if anyone could answer a couple questions about the implementation of statistics in 0.7? I've reviewed http://wiki.apache.org/hadoop/Hive/StatsDev and have the following q

Re: HiveQL for 'rank() over (partition by ... order by ...)'?

2011-05-26 Thread thinker0
Hi. -- /** * @author thinker0 * * TiaraUDFRank. */ @Description(name = "t_row_rank", value = "_FUNC_() - Returns a gener

Re: how to invoke hive command line client?

2011-05-26 Thread 김영우
AFAIK, Hive standalone server allows multiple clients to make connections. See https://issues.apache.org/jira/browse/HIVE-73 - Youngwoo 2011/5/26 jinhang du > So what's the difference between a embedded server and a standalone server? > Can you help me understand it? > > > 2011/5/26 김영우 > >> S