Re: Mapjoin Usage Question

2011-01-19 Thread hadoop n00b
Thanks Leo, Does the smaller table go into the mapjoin hint? Actually, when I ran a test query with the bigger table in the hint, it performed better. On Thu, Jan 20, 2011 at 12:40 PM, Leo Alekseyev wrote: > You can only specify one table, and make sure to include its name, > i.e. /*+ mapjoin(t

Re: Mapjoin Usage Question

2011-01-19 Thread Leo Alekseyev
You can only specify one table, and make sure to include its name, i.e. /*+ mapjoin(t2)*/. For more info see http://wiki.apache.org/hadoop/Hive/JoinOptimization and http://www.slideshare.net/aiolos127/join-optimization-in-hive. Also, you are using a relatively old version of Hive, but I'll let m

Mapjoin Usage Question

2011-01-19 Thread hadoop n00b
Hi, How do I use the mapjoin hint in a query. Say, I have two tables t1 and t2 where t2 is the smaller table. Do I specify t2 in the mapjoin hint? select /*+ mapjoin(b)*/ * from t1 join t2 b on (a.id = b.id) If I am joining two smaller tables, can I specify two clauses in the mapjoin? /*+mapjoi

Re: partitioned column join does not work as expected

2011-01-19 Thread Viral Bajaria
Thanks again. I think I figured out the bug (not sure if it's a bug or whether that's a known limitation when creating a third-level join) we need another table c to re-create my scenario. table_a create table table_a(a_id bigint, common_id bigint, int_a int, int_b int, int_c int, int_d int,

Re: Problem starting Hive with local metastore on mysql

2011-01-19 Thread vipul sharma
Yup. Thanks for you help J-D, much appreciated! On Wed, Jan 19, 2011 at 4:41 PM, Jean-Daniel Cryans wrote: > Have a looksee here http://wiki.apache.org/hadoop/Hive/FAQ > > J-D > > On Wed, Jan 19, 2011 at 4:38 PM, vipul sharma > wrote: > > Thanks! > > > > Now I am hitting mysql bug of max key len

Re: Problem starting Hive with local metastore on mysql

2011-01-19 Thread Jean-Daniel Cryans
Have a looksee here http://wiki.apache.org/hadoop/Hive/FAQ J-D On Wed, Jan 19, 2011 at 4:38 PM, vipul sharma wrote: > Thanks! > > Now I am hitting mysql bug of max key length: Specified key was too long; > max key length is 767 bytes > > 11/01/19 16:34:47 ERROR DataNucleus.Datastore: Error throw

Re: Problem starting Hive with local metastore on mysql

2011-01-19 Thread vipul sharma
Thanks! Now I am hitting mysql bug of max key length: Specified key was too long; max key length is 767 bytes 11/01/19 16:34:47 ERROR DataNucleus.Datastore: Error thrown executing CREATE TABLE `SD_PARAMS` ( `SD_ID` BIGINT NOT NULL, `PARAM_KEY` VARCHAR(256) BINARY NOT NULL, `PARAM_VALU

Re: Problem starting Hive with local metastore on mysql

2011-01-19 Thread Jean-Daniel Cryans
Try setting this in your hive-site: datanucleus.transactionIsolation repeatable-read datanucleus.valuegeneration.transactionIsolation repeatable-read J-D On Wed, Jan 19, 2011 at 4:05 PM, vipul sharma wrote: > Hi, > > we had been running cloudera distribution of hadoop. We installed

Problem starting Hive with local metastore on mysql

2011-01-19 Thread vipul sharma
Hi, we had been running cloudera distribution of hadoop. We installed hive following this document https://wiki.cloudera.com/display/DOC/Hive+Installation. hive-site.xml was later modified for storing metastore in mysql very similar to the config in this blog http://blog.milford.io/2010/06/install

Re: On compressed storage : why are sequence files bigger than text files?

2011-01-19 Thread Ajo Fod
I didn't do the test you suggested, but With the sequence file case: - the size of what should have been compressed was bigger than the uncompressed - it didn't have .defate suffix - in contrast to the text file case, where I got 10x compression or so, Cheers, -Ajo On Wed, Jan 19, 2011 at 11:30

RE: On compressed storage : why are sequence files bigger than text files?

2011-01-19 Thread Steven Wong
Here's a simple check -- look inside one of your sequence files: hadoop fs -cat /your/seq/file | head If it is compressed, the header will contain the compression codec's name and the data will look gibberish. Otherwise, it is not compressed. -Original Message- From: Ajo Fod [mailto:aj

Re: partitioned column join does not work as expected

2011-01-19 Thread Appan Thirumaligai
EXPLAIN select t1.some_string,t2.some_string,sum(t1.total_count),sum(t2.total_count) from table_a t1 join table_b t2 on t1.part_col = t2.part_col and t1.common_id = t2.common_id where t1.part_col >= 'mypart' and t2.part_col >= 'mypart' group by t1.some_string,t2.some_string; OK ABSTRACT SYNTAX

Re: partitioned column join does not work as expected

2011-01-19 Thread Viral Bajaria
Thanks Appan for verifying. I will do some more tests on my side too and let you know the results. I tried a different version of the query where I join'ed two sub-queries for the same partitions and the data comes out to be correct. I will see if I can post the real-world example to the list, be

Re: partitioned column join does not work as expected

2011-01-19 Thread Appan Thirumaligai
Viral, I tried the queries below (similar to yours) and I get the expected results when I do the join. I ran my queries after building hive from the latest source and hadoop 0.20+. create table table_a(a_id bigint, common_id bigint, some_string string,total_count bigint) partitioned by

Re: how do I use multiple reducers in hive?

2011-01-19 Thread Edward Capriolo
On Wed, Jan 19, 2011 at 12:00 PM, Ajo Fod wrote: > The wiki probably needs to be fixed : > For 32, buckets, I need to set the following flags. > >>>set hive.merge.mapfiles = false; >>>set mapred.map.tasks=32; > > ... the set mapred.reduce.tasks ... is irrelevant. > > The query mechanism should ide

Re: Drop table if exists

2011-01-19 Thread Ajo Fod
Ah! ok. Thanks. -Ajo. On Wed, Jan 19, 2011 at 9:03 AM, Ping Zhu wrote: > I think only Hive 0,7 or later accepts syntax drop table if exists. >  http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Drop_Table > > > On Wed, Jan 19, 2011 at 8:54 AM, Ajo Fod wrote: >> >> I don't think this works.

Re: Drop table if exists

2011-01-19 Thread Ping Zhu
I think only Hive 0,7 or later accepts syntax drop table if exists. http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Drop_Table On Wed, Jan 19, 2011 at 8:54 AM, Ajo Fod wrote: > I don't think this works. > >> drop table if exists ; > ... it seems to fail on the if exists part. > > Is anyo

Re: how do I use multiple reducers in hive?

2011-01-19 Thread Ajo Fod
The wiki probably needs to be fixed : For 32, buckets, I need to set the following flags. >>set hive.merge.mapfiles = false; >>set mapred.map.tasks=32; ... the set mapred.reduce.tasks ... is irrelevant. The query mechanism should ideally set this automatically !! Cheers, -Ajo On Wed, Jan 19, 2

Drop table if exists

2011-01-19 Thread Ajo Fod
I don't think this works. >> drop table if exists ; ... it seems to fail on the if exists part. Is anyone's experience different ?... I'm using CDH3 ... Hive 0.5.0. -Ajo

Re: how do I use multiple reducers in hive?

2011-01-19 Thread Edward Capriolo
On Wed, Jan 19, 2011 at 10:46 AM, Ajo Fod wrote: > I've 2 questions: > 1) how to raise the number of reducers? > 2) why are there only 2 bucket files per partition even though I > specified 32 buckets? > > > I've set the following and don't see an increase in the number of reducers. >>>set hive.ex

how do I use multiple reducers in hive?

2011-01-19 Thread Ajo Fod
I've 2 questions: 1) how to raise the number of reducers? 2) why are there only 2 bucket files per partition even though I specified 32 buckets? I've set the following and don't see an increase in the number of reducers. >>set hive.exec.reducers.max=32; >>set mapred.reduce.tasks=32; Could this b

Re: Making temporary functions- permanent

2011-01-19 Thread Edward Capriolo
On Wed, Jan 19, 2011 at 2:37 AM, Guy Doulberg wrote: > Hey All again, > > > > I bet I am not the first one to ask this question, but I could not find an > answer anywhere. > > > > I am using the following temporary function: > > CREATE TEMPORARY FUNCTION jeval AS 'org.apache.hadoop.hive.ql.udf.UDF