[ANNOUNCE] Apache Hive 0.7.1 Released

2011-06-21 Thread Carl Steinbach
The Apache Hive team is pleased to announce the release of Hive 0.7.1. Apache Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc querying, and analysis of large datasets stored in Hadoop compatible file systems Hive 0.7.1 is available in both binary and so

Re: Hive running out of memory

2011-06-21 Thread Igor Tatarinov
Yes, that's probably it. I found a related JIRA: https://issues.apache.org/jira/browse/HIVE-1316 doesn't look like the EMR installation has this fix. I am going to increase the heap size and see if that helps. On Tue, Jun 21, 2011 at 1:52 PM, Steven Wong wrote: > Is the OOM in the Hive client?

RE: Hive running out of memory

2011-06-21 Thread Steven Wong
Is the OOM in the Hive client? If so, you should try increasing its max heap size by setting the env var HADOOP_HEAPSIZE. One place to set it in is hive-env.sh; see /home/hadoop/.versions/hive-0.7/conf/hive-env.sh.template for more info. From: Igor Tatarinov [mailto:i...@decide.com] Sent: Tues

Re: After submiited the query Hive Server is down need to submit same query again

2011-06-21 Thread Shouguo Li
60 mins seems too long. there are many reason why a task runs slow. you have to supply more info for us to help you, :) how much data is that query crunching through? are you sure hive/hadoop is running on remote mode? is the cluster balanced? what are hive/hadoop configs look like, i.e. mapred.tas

Re: [Hive-80] Multiple client connections in Standalone mode

2011-06-21 Thread Shouguo Li
that's odd error case... hope someone can point you to the right direction in fixing it. in the mean time, if you don't have to use mysql as metastore, try derby, http://wiki.apache.org/hadoop/HiveDerbyServerMode i followed that page and set up derby server, works without any problems. :) good luc

After submiited the query Hive Server is down need to submit same query again

2011-06-21 Thread Chinna
Hi all, I am using Hive Server once submitted the query(It will take 60 mins) if Hive Server is down after 50 mins I cannot get the results of the submitted query. In this case I need to submit same query again. In case of CLI also same problem is there.. Any suggestions or any one hand

Re: Issue on using hive Dynamic Partitions on larger tables

2011-06-21 Thread Bejoy Ks
Hey Guys I was able to resolve the same by groping and distributing records to reducers using DISTRIBUTE BY. My modified query would be as folows FROM parameter_def p INSERT OVERWRITE TABLE parameter_part PARTITION(location) SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,p.del_

[Hive-80] Multiple client connections in Standalone mode

2011-06-21 Thread Vikramsinh Katkar
Hello , I have setup Hive server (0.6.0) in standalone mode with meta-store in MYSQL. While executing JDBC SQL queries, we discovered that 1. create table fails with error as "table already exists", even if table was not existing. 2. However Table gets created 3. Any SQL(sele

Hive running out of memory

2011-06-21 Thread Igor Tatarinov
I have a table with 3 levels of partitioning and about 10,000 files (one file at every 'leaf'). I am using EMR and the table is stored in S3. For some reason, Hive can't even start running a simple query that creates a local copy of a subset of the big table. Does this look like an EMR-specific is