Re: hcatalog table permissions error

2014-10-22 Thread Nathan Bamford
Whoops, I pasted the wrong line in for the hcat table records. Should have been: -rw-r--r-- 3 nbamford hive110 2014-10-22 07:52 /user/hive/warehouse/atest_hcat/part-m-355114470 From: Nathan Bamford Sent: Wednesday, October 22, 2014 12:50 PM To: user@h

hcatalog table permissions error

2014-10-22 Thread Nathan Bamford
??Hello, I've been puzzling away at a permissions issue I get through the hcatalog interface for a while now, and I think perhaps I've found a bug. When I create a table via the hive cli as user nbamford the directory created in hdfs has the owner I expect, nbamford: drwxrwxrwt - nbamfo

Re: Optimize hive external tables with serde

2014-10-22 Thread Sanjay Subramanian
lause   ---   ON (1=1) ?The where clause  ---   1=1 ?  a) Code heatmap - how many users used some "code"    WITH         jt1    AS   (SELECT  *  FROM json_table       WHERE        1=1   AND        code in (1,3,54,300,222,111)   AND        partitionDate > '20140922'    AND       

Re: It's extremely slow when hive reads compression files

2014-10-22 Thread Sanjay Subramanian
It could be the serde that is slow and not the compression ?If your input XML is in multiline records then u may wanna write a bit of RecordReader code to process the multiline XML yourself, just to see if it makes any changes to the processing speed ?https://github.com/sanjaysubramanian/big_da

It's extremely slow when hive reads compression files

2014-10-22 Thread Yan Fang
Hi guys, Not sure if you run into this problem. We have 6 nodes cluster with CDH5. It takes about 8 hours to process 80MB compressed files (in .deflate format), while it is much faster (less than 1 hour) to process the uncompressed files. I think there must be something wrong with my settings. Any

Re: dropping tables can take long time

2014-10-22 Thread Sreenath
I have seen cases where if the table is a partitioned one or the data in table is large it will take a lot of time to delete. Also its the case that hive contacts hadoop during deletion of a table and if this connection is delayed it will take more time. On 22 October 2014 15:14, Dima Machlin wr

dropping tables can take long time

2014-10-22 Thread Dima Machlin
Hi, Hive (0.12) seems to be very inconsistent with the duration for dropping a table. The tables' size varies between 5-10 GB. Here is an example of few drops done in a row : [hadoop@server ~]$ hive Logging initialized using configuration in file:/opt/mapr/hive/hive-0.12/conf/hive-log4j.prope

Re: USE CASE:: Hierarchical Structure in Hive and Java

2014-10-22 Thread Peyman Mohajerian
I can think of two approaches, one is to use hbase and use hbasestoragehandler in hive to read the data, handling hierarchical structures is easier in hbase, or in hive just store the data from left to right in a single row with flexible number of columns. You can also store it in json or xml and u

Re: Optimize hive external tables with serde

2014-10-22 Thread ptrstpppp
rOfEntries, B.NumberOfAllEntries FROM ( SELECT code, COUNT(DISTINCT customerId) AS NumberOfDistinctCustomers, COUNT(*) AS NumberOfEntries FROM json_table WHERE 1=1 AND code in (1,3,54,300,222,111) AND partitionDate > '20140922' AND partitionDate <= '20141022' GROUP BY cod

Re: Possible bug with max() together with rank() and grouping sets

2014-10-22 Thread Michal Krawczyk
Not sure. The issue you mentioned requires specifying additional columns, whereas the one I mentioned return obviously incorrect results, which seems to be much more severe issue. Can anybody try to replicate this? If it's really the case on non Amazon Hive I'll send a bug report on Jira. On Tue,