Whoops, I pasted the wrong line in for the hcat table records. Should have been:
-rw-r--r-- 3 nbamford hive110 2014-10-22 07:52
/user/hive/warehouse/atest_hcat/part-m-355114470
From: Nathan Bamford
Sent: Wednesday, October 22, 2014 12:50 PM
To: user@h
??Hello,
I've been puzzling away at a permissions issue I get through the hcatalog
interface for a while now, and I think perhaps I've found a bug.
When I create a table via the hive cli as user nbamford the directory created
in hdfs has the owner I expect, nbamford:
drwxrwxrwt - nbamfo
lause --- ON (1=1) ?The where clause --- 1=1 ?
a) Code heatmap - how many users used some "code" WITH jt1 AS
(SELECT * FROM json_table WHERE 1=1 AND code in
(1,3,54,300,222,111) AND partitionDate > '20140922' AND
It could be the serde that is slow and not the compression ?If your input XML
is in multiline records then u may wanna write a bit of RecordReader code to
process the multiline XML yourself, just to see if it makes any changes to the
processing speed
?https://github.com/sanjaysubramanian/big_da
Hi guys,
Not sure if you run into this problem. We have 6 nodes cluster with CDH5.
It takes about 8 hours to process 80MB compressed files (in .deflate
format), while it is much faster (less than 1 hour) to process the
uncompressed files. I think there must be something wrong with my settings.
Any
I have seen cases where if the table is a partitioned one or the data in
table
is large it will take a lot of time to delete. Also its the case that hive
contacts
hadoop during deletion of a table and if this connection is delayed it
will take
more time.
On 22 October 2014 15:14, Dima Machlin wr
Hi,
Hive (0.12) seems to be very inconsistent with the duration for dropping a
table.
The tables' size varies between 5-10 GB.
Here is an example of few drops done in a row :
[hadoop@server ~]$ hive
Logging initialized using configuration in
file:/opt/mapr/hive/hive-0.12/conf/hive-log4j.prope
I can think of two approaches, one is to use hbase and use
hbasestoragehandler in hive to read the data, handling hierarchical
structures is easier in hbase, or in hive just store the data from left to
right in a single row with flexible number of columns. You can also store
it in json or xml and u
rOfEntries,
B.NumberOfAllEntries
FROM
(
SELECT
code,
COUNT(DISTINCT customerId) AS NumberOfDistinctCustomers,
COUNT(*) AS NumberOfEntries
FROM
json_table
WHERE 1=1
AND code in (1,3,54,300,222,111)
AND partitionDate > '20140922' AND partitionDate <= '20141022'
GROUP BY cod
Not sure. The issue you mentioned requires specifying additional columns,
whereas the one I mentioned return obviously incorrect results, which seems
to be much more severe issue.
Can anybody try to replicate this? If it's really the case on non Amazon
Hive I'll send a bug report on Jira.
On Tue,
10 matches
Mail list logo