Re: FW: Big table join optimization

2014-01-30 Thread Stephen Sprague
lets see the minimal query that shows your problem with some comments about cardinality of the tables in the join. maybe there could be a crude workaround using a temp table or some such device if nothing jumps out at us. On Thu, Jan 30, 2014 at 4:07 PM, Guy Doulberg wrote: > > hi guys > > I a

Re: Issue with Hive and table with lots of column

2014-01-30 Thread Edward Capriolo
Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it

FW: Big table join optimization

2014-01-30 Thread Guy Doulberg
hi guys I am trying to optimize a hive join query, I have a join of two big tables. The join between them is taking too long, no matter how many reducers I set, there are always two reducers struggling to finish in the end of the job The job not always ends, sometime it fails with memory prob

Re: Simple UDF to return array

2014-01-30 Thread Sunita Arvind
Thanks Roberto. Will try that out. regards Sunita On Thu, Jan 30, 2014 at 10:14 AM, Roberto Congiu wrote: > Hi Sunita, > yes, it's definitely possible and you should use Generic UDFs. > I wrote one UDF that takes n arrays (each one with the same number of > elements) and returns an array of str

Re: Issue with Hive and table with lots of column

2014-01-30 Thread Stephen Sprague
oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? "show create table " would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are co

Re: Issue with Hive and table with lots of column

2014-01-30 Thread Stephen Sprague
thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hi

Re: Simple UDF to return array

2014-01-30 Thread Roberto Congiu
Hi Sunita, yes, it's definitely possible and you should use Generic UDFs. I wrote one UDF that takes n arrays (each one with the same number of elements) and returns an array of structs which is usually used in a lateral view. A good article on how to write a generic UDF is this one: http://www.ba

Re: Hive metastore is creating mysql instead of Derby.

2014-01-30 Thread Prasad Mujumdar
Hi, Looks like you are passing the 'dbType' as derby however the Metastore connection URL is configured (hive-site.xml) for mysql. Both Hive and schemaTool will use the metastore URL and driver configured in the hive-site to connect to the database. If you intend to use derby as backend, please

Re: disable internal tables

2014-01-30 Thread Alex Nastetsky
Thanks. But if I assign a group of the users to /apps/hive/warehouse then they can still create internal tables, which is what I am trying to prevent. I am on version 0.12.0.2.0.6.0. On Thu, Jan 30, 2014 at 11:55 AM, Peyman Mohajerian wrote: > This is a known issue, it still will write somethin

Re: disable internal tables

2014-01-30 Thread Peyman Mohajerian
This is a known issue, it still will write something at '/apps/hive/warehouse', it's best to assign a common group to your hive and hdfs users and assign that group to both of these directories. I heard this issue is fixed in .12 or .13, others can confirm. On Thu, Jan 30, 2014 at 8:27 AM, Alex N

disable internal tables

2014-01-30 Thread Alex Nastetsky
Hi, I am trying to enforce all Hive tables to be created with EXTERNAL. The way I am doing this is by making the location of the warehouse (/apps/hive/warehouse in my case) to have permissions 000 (completely inaccessible). But then when I try to create an external table, I see that it still trie

Fwd: Simple UDF to return array

2014-01-30 Thread Sunita Arvind
Can someone please suggest if this is doable or not? Is generic udf the only option? How would using generic vs simple udf make any difference since I would be returning the same object either ways. Thank you Sunita -- Forwarded message -- From: *Sunita Arvind* Date: Wednesday, J

Performance problem with HBase

2014-01-30 Thread Бородин Владимир
Hi all! I'm having a performance problem with quering data from hbase using hive. I use CDH 4.5 (hbase-0.94.6, hive-0.10.0 and hadoop-yarn-2.0.0) on a cluster of 10 hosts. Right now it stores 3 TB of data in hbase table which now consists of 1000+ regions. One record in it looks like this: hba

Re: Issue with Hive and table with lots of column

2014-01-30 Thread David Gayou
We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 Our hadoop version is 1.1.2. Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + 16Gb Ram each The error message i get is : 2014-01-29 12:41:09,086 ERROR

Re: delete duplicate records in Hive table

2014-01-30 Thread Raj hadoop
Hi Nitin, Thanks a ton for quick response, Could you please share if any sql syntax for this Thanks, Raj. On Thu, Jan 30, 2014 at 3:29 PM, Nitin Pawar wrote: > easiest way to do is .. write it in a temp table and then select uniq of > each column and writing to real table > > > On Thu, Jan 30

Re: delete duplicate records in Hive table

2014-01-30 Thread Nitin Pawar
easiest way to do is .. write it in a temp table and then select uniq of each column and writing to real table On Thu, Jan 30, 2014 at 3:19 PM, Raj hadoop wrote: > Hi, > > Can someone help me how to delete duplicate records in Hive table, > > I know that delete and update are not supported by h

delete duplicate records in Hive table

2014-01-30 Thread Raj hadoop
Hi, Can someone help me how to delete duplicate records in Hive table, I know that delete and update are not supported by hive but still, if some know's some alternative can help me in this Thanks, Raj.

Hive metastore is creating mysql instead of Derby.

2014-01-30 Thread unmesha sreeveni
Hive metastore is creating mysql instead of Derby. schematool -dbType derby -initSchema Metastore connection URL:jdbc:mysql://localhost/metastore Metastore Connection Driver :com.mysql.jdbc.Driver Metastore connection User: hive schematool -dbType derby -info Metastore connection URL: