Re: hive cli escaping TAB and NEW LINE Characters.

2013-05-06 Thread Stephen Sprague
Maybe i misread your original post. Didn't you say you were parsing the hive client output? You don't have to change the way you're writing the data - you only have to change the output hive emits. so for example when producing hive output i presume you do something like this currently: h

Re: HIVE-3979 in Hive 0.11

2013-05-06 Thread John Omernik
Bummer, ok thank you for fixing the release notes. :) On Mon, May 6, 2013 at 12:43 AM, Carl Steinbach wrote: > Hi John, > > This is a mistake in the release notes. It will be fixed in the next 0.11 > release candidate. > > Thanks. > > Carl > > > On Sat, May 4, 2013 at 6:18 AM, John Omernik wrot

Re: Hive Queries on S3 Data not working after moving to Hive metastore on CDH4

2013-05-06 Thread Himanish Kushary
Any ideas regarding this ? For now, i have resolved this issue by putting the amazon credentials into the Cloudera Manager Hive service safety valve and deploying the new client configs to the hive gateway nodes. But this restricts me to using only one amazon account for the Hive operations. - H

RE: Table vs View

2013-05-06 Thread Connell, Chuck
I am not sure about speed, but my understanding is that tables are real things, they exist as files on disk and can be reused. Views are temporary entities that are created on-the-fly and cannot be reused later. Chuck From: Peter Chu [mailto:pete@outlook.com] Sent: Monday, May 06, 2013 2:4

Re: Table vs View

2013-05-06 Thread Nitin Pawar
views in hive are similar to those in any rdbms schema normally a view is created to have a well defined interface over an inconsistently defined table so that modification in the table definition does not alter the view definition another use case would be suppose you have 100 columns in a tabl

Table vs View

2013-05-06 Thread Peter Chu
On Hive, I am using lots of tables that builds other tables, sorts of like a funnel, funneling data to get the parts I want. I am wondering what is the advantage to creating views vs creating tables. Is it faster to use views as compare to tables? Peter

Re: Hive Group By Limitations

2013-05-06 Thread John Meagher
"Not quite sure but I think each group by will give another M/R job." It will be done in a single M/R job no matter how many fields are in the GROUP BY clause. On Mon, May 6, 2013 at 2:07 PM, Peter Chu wrote: > In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY > clause. >

Re: Hive Group By Limitations

2013-05-06 Thread Nitin Pawar
best way to do all this would be run a distinct and group by along side a join (its just a guess but a more detailed approach other guys will suggest ) On Mon, May 6, 2013 at 11:57 PM, Peter Chu wrote: > Thanks Nitin and Michael, > > The reason I asked is because I cannot help but wonder if it

RE: Hive Group By Limitations

2013-05-06 Thread Peter Chu
Thanks Nitin and Michael, The reason I asked is because I cannot help but wonder if it takes extra time with all those group by columns. Say for example, I have a employees table with 10 columns pertaining to employees but there could be duplicates, I need to de dup it by performing a group by

Re: Hive Group By Limitations

2013-05-06 Thread Nitin Pawar
hi Peter, In hive if you are running a group by, then all the select columns have to be in the group by clause. This limitation is for the column definition only and not for the column operations like count etc All the columns for group by do go to a single map reduce job and it does not launch m

Re: Hive Group By Limitations

2013-05-06 Thread Michael Malak
--- On Mon, 5/6/13, Peter Chu wrote: > In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY > clause. Although MySQL allows it, it is not ANSI SQL. http://stackoverflow.com/questions/1225144/why-does-mysql-allow-group-by-queries-without-aggregate-functions

Hive Group By Limitations

2013-05-06 Thread Peter Chu
In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY clause. Example: SELECT st.a, st.b, st.c, st.d, FROM some_table st GROUP BY st.a; -- This does not work. To make it work, I would need to add the other fields in the group by clause. Not quite sure but I think each group b

RE: Hive QL - NOT IN, NOT EXIST

2013-05-06 Thread Peter Chu
Thanks Stephen, Will start a cluster today to see if it helps. Peter Date: Mon, 6 May 2013 00:05:45 -0700 Subject: Re: Hive QL - NOT IN, NOT EXIST From: java...@gmail.com To: user@hive.apache.org Hi Peter, Looks like mapjoin does not work with outer join so streamtable is instead a possible ap

RE: hive cli escaping TAB and NEW LINE Characters.

2013-05-06 Thread Valluri, Sathish
This is the idea which I have thought, But in our scenario we have less control on writing avro data with delimited TABS and NEWLINES.(encoding tabs and newlines with other characters). Since avro data can be pumped on to the Warehouse system from many sources and if we have to implement this k

Re: [VOTE] Apache Hive 0.11.0 Release Candidate 1

2013-05-06 Thread Prasad Mujumdar
-1 (non-binding) My apologies, but HIVE-4505 is a regression that IMHO should be addressed. thanks Prasad On Tue, Apr 30, 2013 at 5:18 PM, Ashutosh Chauhan wrote: > Hey all, > > Based on feedback from folks, I have respun release candidate, RC1. > Please take a look. It basically fixes the si

Re: Hive QL - NOT IN, NOT EXIST

2013-05-06 Thread Stephen Boesch
Hi Peter, Looks like mapjoin does not work with outer join so streamtable is instead a possible approach. You would stream the larger table through the smaller one: can you see whether the following helps your perf issue? select /*+ streamtable(message) */ f.uuid from message m right outer j