Re: Using snappy compresscodec in hive

2018-07-23 Thread Gopal Vijayaraghavan
> "TBLPROPERTIES ("orc.compress"="Snappy"); " That doesn't use the Hadoop SnappyCodec, but uses a pure-java version (which is slower, but always works). The Hadoop snappyCodec needs libsnappy installed on all hosts. Cheers, Gopal

Using snappy compresscodec in hive

2018-07-23 Thread Zhefu Peng
Hi, Here is a confusion I encountered these days: I don't install or build snappy on my hadoop cluster, but when I tested and compared about the compression ratio of Parquet and ORC storage format. During the test, I can set the way of compression for two storage format, for example, using "TB

what's the best practice to create an external hive table based on a csv file on HDFS with 618 columns in header?

2018-07-23 Thread Raymond Xie
We are using Cloudera CDH 5.11 I have seen solution for small xlsx files with only handful columns in header, in my case the csv file to be loaded into a new hive table has 618 columns. 1. Would it be saved as parquet by default if I upload it (save it to csv first) through HUE-> File B

Re: Ranger for standalone hive metastore

2018-07-23 Thread Vihang Karajgaonkar
I am not super-familiar with Ranger but do you see any errors in HMS logs. Assuming Ranger is trying to connect to HMS, it should log some exceptions if the connection is not successful. Also would be helpful to look for errors in Ranger logs. On Mon, Jul 23, 2018 at 4:21 AM, Sandhya Agarwal wrot

Re: Performance in hive fetch

2018-07-23 Thread Sowjanya Kakarala
Hi Shawn, How long is it taking to run the actual query if you create a temp table or something with the result? the time I mentioned was only for the actual queries run. How many rows are returned? as I mentioned depends on the dates we give if its 20days then 20 records based on the id.

RE: Performance in hive fetch

2018-07-23 Thread Shawn Weeks
How long is it taking to run the actual query if you create a temp table or something with the result? How many rows are returned? Need to narrow down if it’s the fetch taken a while or the actual query. Thanks Shawn From: Sowjanya Kakarala Sent: Monday, July 23, 2018 10:01 AM To: user@hive.ap

Performance in hive fetch

2018-07-23 Thread Sowjanya Kakarala
Hi Guys, I am trying to fetch the data from hive through python code based on dates and id. For fetch of 20days till current day for 7tables together it is taking 30seconds. for fetching for an year worth data for 7tables together it is taking 3minutes26seconds. My tables are stored as orc and tr

Ranger for standalone hive metastore

2018-07-23 Thread Sandhya Agarwal
Hello, I am trying to enable the ranger hive plugin 2.0.0, for standalone hive metastore 3.0.0. I do not see the link happening, even though both my ranger admin and hive metastore services are restarted and running without any errors, after enabling the ranger-hive-plugin. Any pointers ? Thank y