Hive encoding utf-8

2017-01-30 Thread chandra sekar
Dear All, I am using Hive 1.2.1 version and collecting tweets for some analysis purpose. Some of the text column mixed with chinesh and english charector. my objective to filter only english text only. i did the following setting in the hive table . setpropertY.SERIALISATION.ENCODING = 'utf-

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread Gopal Vijayaraghavan
> Gopal : (yarn logs -application $APPID) doesn't contain a line > containing HISTORY so it doesn't produce svg file. Should I turn on > some option to get the lines containing HISTORY in yarn application > log? There's a config option tez.am.log.level=INFO which controls who much data is wri

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread 김동원
As the size can cause confusion as you pointed, let me explain about it for a while for others. The benchmark size comes from a scale factor of dsdgen (tpc-"ds" "d"ata "gen"erator). If you take a look at http://eastcirclek.blogspot.kr/2016/12/loading-tpc-ds-data-into-mysql.html

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread Goden Yao
ORC works well with Presto too at least. Can you explain a little how you ran 1TB benchmark on a 5*80 = 400GB total memory in presto cluster. Did you use compression to fit them all in memory? or partitioned data , etc. On Mon, Jan 30, 2017 at 3:50 PM Dongwon Kim wrote: > Goun : Just to make al

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread Dongwon Kim
Goun : Just to make all the engines use the same data and I usually store data in ORC. I know that it can make biased results in favor of Hive. I did Spark experiments with Parquet, and Spark works better with Parquet as it is believed (not included in the result though). Goden : Oops, 128GB main

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Ricardo Fajardo
Attached the hive-site.xml configuration file. From: Vivek Shrivastava Sent: Monday, January 30, 2017 4:10:42 PM To: user@hive.apache.org Subject: Re: Pls Help me - Hive Kerberos Issue If this is working then your kerberos setup is ok. I suspect configuration is

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Vivek Shrivastava
If this is working then your kerberos setup is ok. I suspect configuration is Hiveserver2. What is the authentication and security setup in Hive config? Please see if you can attach it. On Mon, Jan 30, 2017 at 2:33 PM, Ricardo Fajardo < ricardo.faja...@autodesk.com> wrote: > [cloudera@quickstart

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread Goden Yao
was the master 128MB or 128GB memory? On Mon, Jan 30, 2017 at 3:24 AM Gopal Vijayaraghavan wrote: > > > Hive LLAP shows better performance than Presto and Spark for most > queries, but it shows very poor performance on the execution of query 72. > > My suspicion will be the the inventory x catal

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Ricardo Fajardo
[cloudera@quickstart bin]$ [cloudera@quickstart bin]$ hadoop fs -ls Java config name: null Native config name: /etc/krb5.conf Loaded from native config Found 20 items drwxr-xr-x - cloudera cloudera 0 2016-06-13 17:51 checkpoint -rw-r--r-- 1 cloudera cloudera 3249 2016-05-11 16:19

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Vivek Shrivastava
If you are using AES256, then please do update java unlimited strength jar files. What is the output of hadoop ls command after exporting the below environment variable? export HADOOP_OPTS="-Dsun.security.krb5.debug=true" hadoop fs -ls / On Mon, Jan 30, 2017 at 2:21 PM, Ricardo Fajardo < ricardo.

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Ricardo Fajardo
I did the changes but I am getting the same error. Klist: [cloudera@quickstart bin]$ klist -fe Ticket cache: FILE:/tmp/krb5cc_501 Default principal: t_fa...@ads.autodesk.com Valid starting ExpiresService principal 01/30/17 11:56:20 01/30/17 21:56:24 krbtgt/ads.autodesk@ads.

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Vivek Shrivastava
You can comment both default_tkt_enctypes and default_tgs_enctypes out, the default value will become aes256-cts-hmac-sha1-96aes128-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac-md5 camellia256-cts-cmac camellia128-cts-cmac des-cbc-crc des-cbc-md5 des-cbc-md4 . Then do kdestroy kinit klist -fev your

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Ricardo Fajardo
I don't have any particular reason for selecting arcfour encryption type. If I need to change it and it will work I can do. Values from krb5.conf: [Libdefaults] default_realm = ADS.AUTODESK.COM krb4_config = /etc/krb.conf krb4_realms = /etc/krb.realms kdc_timesync

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Vivek Shrivastava
Any particular reason for selecting arcfour encryption type? Could you please post defaults (e.g enc_type) values from krb5.conf On Mon, Jan 30, 2017 at 10:57 AM, Ricardo Fajardo < ricardo.faja...@autodesk.com> wrote: > > 1. klist -fe > > [cloudera@quickstart bin]$ klist -fe > Ticket cache: FILE:

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Ricardo Fajardo
1. klist -fe [cloudera@quickstart bin]$ klist -fe Ticket cache: FILE:/tmp/krb5cc_501 Default principal: t_fa...@ads.autodesk.com Valid starting ExpiresService principal 01/30/17 10:52:37 01/30/17 20:52:43 krbtgt/ads.autodesk@ads.autodesk.com renew until 01/31/17 10:52:37, F

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Vivek Shrivastava
Please paste the output of 1. klist -fe 2. relevant entries from HiveServer2 log On Mon, Jan 30, 2017 at 10:11 AM, Ricardo Fajardo < ricardo.faja...@autodesk.com> wrote: > I could not resolve the problem. > > > I have debugged the code and I found out that: > > > 1. On the org.apache.hadoop.hive.

Re: Pls Help me - Hive Kerberos Issue

2017-01-30 Thread Ricardo Fajardo
I could not resolve the problem. I have debugged the code and I found out that: 1. On the org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge class line 208 UserGroupInformation.getCurrentUser return (). Two ( .. This method always returns the user of the operative system but an

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread goun na
Thanks for sharing benchmark results. May I ask why you choose ORC? 2017-01-30 19:57 GMT+09:00 김동원 : > Hi, > > Recently I did some experiments using Hive, Spark, and Presto using TPC-DS > benchmark > and I'd like to share the result with the community: http://www. > slideshare.net/ssuser6bb12d/hi

Re: Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread Gopal Vijayaraghavan
> Hive LLAP shows better performance than Presto and Spark for most queries, > but it shows very poor performance on the execution of query 72. My suspicion will be the the inventory x catalog_sales x warehouse join - assuming the column statistics are present and valid. If you could send the

Experimental results using TPC-DS (versus Spark and Presto)

2017-01-30 Thread 김동원
Hi, Recently I did some experiments using Hive, Spark, and Presto using TPC-DS benchmark and I'd like to share the result with the community: http://www.slideshare.net/ssuser6bb12d/hive-presto-and-spark-on-tpcds-benchmark

Re: Hive Tez on External Table running on Single Mapper

2017-01-30 Thread Gopal Vijayaraghavan
> > 'skip.header.line.count'='1', Trying removing that config option. I've definitely seen footer markers disabling file splitting, possibly header also does. Cheers, Gopal