Trying to understand partition level grants.

2012-03-22 Thread Edward Capriolo
Time taken: 0.14 seconds hive> create table authorization_part (key int, value string) partitioned by (ds string); OK Time taken: 0.055 seconds hive> ALTER TABLE authorization_part SET TBLPROPERTIES ("PARTITION_LEVEL_PRIVILEGE"="TRUE"); OK Time taken: 0.252 seconds hive> set hive.security.authoriza

# of failed Reduce Tasks exceeded allowed limit

2012-03-22 Thread Vik Raguttahalli
Hello All, I am running the following hive query in a 10 node cluster on very huge dataset (6.6 billion records) create table tst as select a,b,c,d,..w, sum(case when x= 'C' then 1 else 0 end) as CS, sum(case when y = 'I' then 1 else 0 end) as IP, sum(case when z= 'A' then 1 els

Re: Optimization on bucketized/sorted tables

2012-03-22 Thread Mark Grover
Hi Michael, This JIRA is along the lines of your questions: https://issues.apache.org/jira/browse/HIVE-2846 The following is based on my understanding so take it with a grain of salt:-) You're right. The 4 kinds of queries you pointed out can be potentially be optimized if the source table(s) are

Re: Create Partitioned Table w/ Partition= Substring of Raw Data

2012-03-22 Thread Gabi D
Mark, thanks for elaborating. I was unaware of the dynamic partitioning option, it sounds great! Gabi On Thu, Mar 22, 2012 at 3:33 PM, Mark Grover wrote: > Hi Dan, > What Gabi is right. > > To solve your problem, you could have a non-partitioned table on the raw > data and run a Hive query that

Re: Snappy Error

2012-03-22 Thread Jagat
Hi Did you install Snappy , following the instructions present on the website? Just for reference quoting from there 1. Expand hadoop-snappy-0.0.1-SNAPSHOT.tar.gz file Copy (recursively) the lib directory of the expanded tarball in the /lib of all Hadoop nodes $ cp -r hadoop-snappy-0.0.1-SNAPS

Re: Snappy Error

2012-03-22 Thread Edward Capriolo
The codec have to be in the TaskTrackers hadoop lib. listed in there io.compression.codes, and you have the restart the TaskTracker for it pick this up. On Thu, Mar 22, 2012 at 7:42 AM, Zizon Qiu wrote: > seems the tasktracker could not > locate org.apache.hadoop.io.compress.SnappyCodec. > did yo

Re: Create Partitioned Table w/ Partition= Substring of Raw Data

2012-03-22 Thread Mark Grover
Hi Dan, What Gabi is right. To solve your problem, you could have a non-partitioned table on the raw data and run a Hive query that reads this raw data and inserts it into a partitioned table. Dynamic partitioning could come in handy in that case. Look at https://cwiki.apache.org/Hive/tutorial.

Re: Snappy Error

2012-03-22 Thread Zizon Qiu
seems the tasktracker could not locate org.apache.hadoop.io.compress. SnappyCodec. did you deploy on every tasktracker or package into the mapreduce job jar? On Thu, Mar 22, 2012 at 7:30 PM, hadoop hive wrote: > HI Folks, > > i follow all ther steps and build and install snappy and after creatin

Re: What is the rule of job name generation in Hive?

2012-03-22 Thread Nitin Pawar
In hive by default the job name is set as your query you can have more meaningful name mapred.job.name='jobname' thanks, nitin On Thu, Mar 22, 2012 at 12:48 PM, Felix.徐 wrote: > Hi,all..I find that the job names of Hive are like this " INSERT > OVERWRITE TABLE u...userID,neighborid(Stage-4) "

Re: Create Partitioned Table w/ Partition= Substring of Raw Data

2012-03-22 Thread Gabi D
Dan, the partition value does not look at your raw data, you assign a value to the partition when you put the data in. So what you need to do is this: Create table mytable (Time string, OtherData string) Partition by (danDate string); (never a good idea to give fields a name that's a reserv

What is the rule of job name generation in Hive?

2012-03-22 Thread Felix . 徐
Hi,all..I find that the job names of Hive are like this " INSERT OVERWRITE TABLE u...userID,neighborid(Stage-4) " What is the rule of generating such a name?