Hi,
I am new to Hive.
I am using Flume agent to collect log4j logs and sending to HDFS.
Now i wanted to load the log4j format logs from HDFS to Hive tables.
Each of the attributes in log statements like timestamp, level, classname
etc... should be loaded in seperate columns in the Hive tables.
jS,
Check out if this helps:
http://search-hadoop.com/m/l1usr1MAHX32&subj=Re+Severely+hit+by+curse+of+last+reducer+
Mark Grover, Business Intelligence Analyst
OANDA Corporation
www: oanda.com www: fxtrade.com
e: mgro...@oanda.com
"Best Trading Platform" - World Finance's Forex Awards 2009.
Hi list,
I am trying to run a Join query on my 10 node cluster. My query looks as
follows
select * from A JOIN B on (A.a = B.b)
size of A = 15 million rows
size of B = 1 million rows
The problem is A.a and B.b has around 25-30 distinct values per column
which implies that they have high selecti
or
select
A,
CASE
WHEN B IN(1,2) THEN 'Type A'
ELSE 'Type B'
END AS B,
C
from table_a
groupby
A,
CASE
WHEN B IN(1,2) THEN 'Type A'
ELSE 'Type B'
END,
C
using a column alias defined in the select clause, is not valid in the
group by.
On 4 December 2011 09:21, Mapred Learn wrote:
> Hi,
> I