Re: Tez jobs on YARN failing sporadically..

2016-06-28 Thread saquib khan
Unsubscribe On Tuesday, June 28, 2016, Gautam wrote: > Hello, > > We have Tez being used for one of our main ETL workflows and have been > using it for couple months now. We recently started seeing the following > error for a query that regularly runs and hasn't been changed in any way. > It's a

Re: Tez jobs on YARN failing sporadically..

2016-06-28 Thread Gautam
*Software Versions* - Hive : 1.1.0 - Tez : 0.7.1 - Hadoop : 2.6.0 On Tue, Jun 28, 2016 at 5:58 PM, Gautam wrote: > Hello, > > We have Tez being used for one of our main ETL workflows and have been > using it for couple months now. We recently started seeing the following > error for a query tha

Tez jobs on YARN failing sporadically..

2016-06-28 Thread Gautam
Hello, We have Tez being used for one of our main ETL workflows and have been using it for couple months now. We recently started seeing the following error for a query that regularly runs and hasn't been changed in any way. It's a job that counts an hour's worth of data in a M-R-R flow. This erro

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
thanks a lot. let me give it a try. Regards Sanjiv Singh Mob : +091 9990-447-339 On Tue, Jun 28, 2016 at 5:32 PM, Markovitz, Dudu wrote: > There’s a distributed algorithm for windows function that is based on the > ORDER BY clause rather than the PARTITION BY clause. > > I doubt if is implemen

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
There’s a distributed algorithm for windows function that is based on the ORDER BY clause rather than the PARTITION BY clause. I doubt if is implemented in Hive, but it’s worth a shot. select * ,row_number () over (order by rand()) as ETL_ROW_ID fromINTER_ETL ; For unique

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
ETL_ROW_ID is to be consecutive number. I need to check if having unique number would not break any logic. Considering unique number for ETL_ROW_ID column, what are optimum options available? What id it has to be consecutive number only? Regards Sanjiv Singh Mob : +091 9990-447-339 On Tue, Ju

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
I’m guessing ETL_ROW_ID should be unique but not necessarily contain only consecutive numbers? From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Tuesday, June 28, 2016 10:57 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and l

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
Hi Dudu, You are correct ...ROW_NUMBER() is main culprit. ROW_NUMBER() OVER Not Fast Enough With Large Result Set, any good solution? Regards Sanjiv Singh Mob : +091 9990-447-339 On Tue, Jun 28, 2016 at 3:42 PM, Markovitz, Dudu wrote: > The row_number operation seems to be skewed. > > > >

Hive Query Error: Cannot obtain block length

2016-06-28 Thread Arun Patel
I am trying to do log analytics on the logs created by Flume. Hive queries are failing with below error. "hadoop fs -cat" command works on all these open files. Is there a way to read these open files? My requirement is to read the data from open files too. I am using tez as execution engine.

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
The row_number operation seems to be skewed. Dudu From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Tuesday, June 28, 2016 8:54 PM To: user@hive.apache.org Subject: Query Performance Issue : Group By and Distinct and load on reducer Hi All, I am having performance issue with data skew o

RE: Hive error : Can not convert struct<> to

2016-06-28 Thread Markovitz, Dudu
The staging table has no partitions, so no issue there. Also, the error specifically refers to the covertion between the struct types. Dudu FAILED: SemanticException [Error 10044]: Line 2:23 Cannot insert into target table because column number/types are different ''CA'': Cannot convert c

Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
Hi All, I am having performance issue with data skew of the distinct statement in Hive . See below query with DISTINCT operator. *Original Query : * SELECT DISTINCT SD.

Re: What is the best way to store IPv6 address in Hive?

2016-06-28 Thread Devopam Mittra
My best bet will be string data type itself with partitioning to aid partial search. Please do consider the fact that ipv6 address is more complicated than ipv4 in terms of searching . Regards Dev On 28 Jun 2016 9:35 pm, "Igor Kuzmenko" wrote: > Currently I'm using ORC transactional tables, and

What is the best way to store IPv6 address in Hive?

2016-06-28 Thread Igor Kuzmenko
Currently I'm using ORC transactional tables, and i need to store a lot of data containing IP addresses. With IPv4 it can be a Integer (4 bytes exacty), but what about IPv6? Obiously it should be space efficient and easy to search for exact match. As extra feature it would be good to do fast search

Re: External_Tables_Disadvantages

2016-06-28 Thread Ajay Chander
Hi Team, Any insights on this one? Thank you On Monday, June 27, 2016, Ajay Chander wrote: > Hi Everyone, > > I would like to know the disadvantages of using External tables in Hive. I > was told that "Managing security with sentry will be very limited for > external tables" is it true? Can some

Re: Hive error : Can not convert struct<> to

2016-06-28 Thread Gopal Vijayaraghavan
> PARTITION(state='CA') > SELECT * WHERE se.adr.st='CA' > FAILED: SemanticException [Error 10044]: Line 2:23 Cannot insert into >target table because column number/types are different ''CA'': The error is bogus, but the issue has to do with the "SELECT *". Inserts where a partition is specified

RE: Hive error : Can not convert struct<> to

2016-06-28 Thread Markovitz, Dudu
Hi The fields' names are part of the struct definition. Different names, different types of structs. Dudu e.g. Setup create table t1 (s struct); create table t2 (s struct); insert into table t1 select named_struct('c1',1,'c2',2);

WebHCat Hive POST callback not being called

2016-06-28 Thread Pau Tallada
Hi, I'm trying to use the WebHCat REST API to post a long query to Hive, and have an endpoint in my app called upon completion. The POST request is like this: POST /templeton/v1/hive HTTP/1.1 Host: data.astro:50111 Cache-Control: no-cache Postman-Token: 7531c7e9-f0e6-ce58-4482-4f5a0cc98b52 Conte

Hive error : Can not convert struct<> to

2016-06-28 Thread Kuldeep Chitrakar
Hi I have staged table as hive (revise)> desc employees_se; OK namestring salaryfloat subordinates array deductions map adr struct I am trying to insert the data in partitioned table employees as hive (revise)> desc e