Maybe i misread your original post. Didn't you say you were parsing the
hive client output?
You don't have to change the way you're writing the data - you only have to
change the output hive emits.
so for example when producing hive output i presume you do something like
this currently:
h
Bummer, ok thank you for fixing the release notes. :)
On Mon, May 6, 2013 at 12:43 AM, Carl Steinbach wrote:
> Hi John,
>
> This is a mistake in the release notes. It will be fixed in the next 0.11
> release candidate.
>
> Thanks.
>
> Carl
>
>
> On Sat, May 4, 2013 at 6:18 AM, John Omernik wrot
Any ideas regarding this ?
For now, i have resolved this issue by putting the amazon credentials into
the Cloudera Manager Hive service safety valve and deploying the new client
configs to the hive gateway nodes.
But this restricts me to using only one amazon account for the Hive
operations.
- H
I am not sure about speed, but my understanding is that tables are real things,
they exist as files on disk and can be reused. Views are temporary entities
that are created on-the-fly and cannot be reused later.
Chuck
From: Peter Chu [mailto:pete@outlook.com]
Sent: Monday, May 06, 2013 2:4
views in hive are similar to those in any rdbms schema
normally a view is created to have a well defined interface over
an inconsistently defined table so that modification in the table
definition does not alter the view definition
another use case would be suppose you have 100 columns in a tabl
On Hive, I am using lots of tables that builds other tables, sorts of like a
funnel, funneling data to get the parts I want.
I am wondering what is the advantage to creating views vs creating tables.
Is it faster to use views as compare to tables?
Peter
"Not quite sure but I think each group by will give another M/R job."
It will be done in a single M/R job no matter how many fields are in
the GROUP BY clause.
On Mon, May 6, 2013 at 2:07 PM, Peter Chu wrote:
> In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY
> clause.
>
best way to do all this would be run a distinct and group by along side a
join (its just a guess but a more detailed approach other guys will
suggest )
On Mon, May 6, 2013 at 11:57 PM, Peter Chu wrote:
> Thanks Nitin and Michael,
>
> The reason I asked is because I cannot help but wonder if it
Thanks Nitin and Michael,
The reason I asked is because I cannot help but wonder if it takes extra time
with all those group by columns.
Say for example, I have a employees table with 10 columns pertaining to
employees but there could be duplicates, I need to de dup it by performing a
group by
hi Peter,
In hive if you are running a group by, then all the select columns have to
be in the group by clause. This limitation is for the column definition
only and not for the column operations like count etc
All the columns for group by do go to a single map reduce job and it does
not launch m
--- On Mon, 5/6/13, Peter Chu wrote:
> In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY
> clause.
Although MySQL allows it, it is not ANSI SQL.
http://stackoverflow.com/questions/1225144/why-does-mysql-allow-group-by-queries-without-aggregate-functions
In Hive, I cannot perform a SELECT GROUP BY on fields not in the GROUP BY
clause.
Example: SELECT st.a, st.b, st.c, st.d, FROM some_table st GROUP BY st.a; --
This does not work.
To make it work, I would need to add the other fields in the group by clause.
Not quite sure but I think each group b
Thanks Stephen,
Will start a cluster today to see if it helps.
Peter
Date: Mon, 6 May 2013 00:05:45 -0700
Subject: Re: Hive QL - NOT IN, NOT EXIST
From: java...@gmail.com
To: user@hive.apache.org
Hi Peter, Looks like mapjoin does not work with outer join so streamtable is
instead a possible ap
This is the idea which I have thought, But in our scenario we have less control
on writing avro data with delimited TABS and NEWLINES.(encoding tabs and
newlines with other characters).
Since avro data can be pumped on to the Warehouse system from many sources and
if we have to implement this k
-1 (non-binding)
My apologies, but HIVE-4505 is a regression that IMHO should be addressed.
thanks
Prasad
On Tue, Apr 30, 2013 at 5:18 PM, Ashutosh Chauhan wrote:
> Hey all,
>
> Based on feedback from folks, I have respun release candidate, RC1.
> Please take a look. It basically fixes the si
Hi Peter,
Looks like mapjoin does not work with outer join so streamtable is
instead a possible approach. You would stream the larger table through the
smaller one:
can you see whether the following helps your perf issue?
select /*+ streamtable(message) */ f.uuid from message m right outer j
16 matches
Mail list logo