Re: ORC Transaction Table - Spark

2017-08-23 Thread Aviral Agarwal
So, there is no way possible right now for Spark to read Hive 2.x data ? On Thu, Aug 24, 2017 at 12:17 AM, Eugene Koifman wrote: > This looks like you have some data written by Hive 2.x and Hive 1.x code > trying to read it. > > That is not supported. > > > > *From: *Aviral Agarwal > *Reply-To:

Re: How to optimize multiple count( distinct col) in Hive SQL

2017-08-23 Thread panfei
by decreasing mapreduce.reduce.shuffle.parallelcopies from 20 to 5, it seems that everything goes well, no OOM ~~ 2017-08-23 17:19 GMT+08:00 panfei : > The full error stack is (which described here : https://issues.apache.org/ > jira/browse/MAPREDUCE-6108) : > > this error can not reproduce ever

Re: LEFT JOIN and WHERE CLAUSE - How to handle

2017-08-23 Thread peter zhang
How about splitting your txn data into two parts, one for the tx that has currency info (just use join) and the other part for the tx that can't find currency info then use a union all operator combines two parts tx, as below: SELECT ROW_NUM,CCY_CD,TXN_DT,CNTRY_DESC FROM CURRENCY JOIN TXN ON (CU

One column into multiple column.

2017-08-23 Thread Deepak Khandelwal
Can someone tell the best way to implement below in hive. how can we take input from column c1 from tab Such that c1 has multiple values delimited by pipe. Each of the delimited value from col c1 of table t1 needs to be inserted into separate column in table t2. I can write a UDF for this but th

Re: ORC Transaction Table - Spark

2017-08-23 Thread Eugene Koifman
This looks like you have some data written by Hive 2.x and Hive 1.x code trying to read it. That is not supported. From: Aviral Agarwal Reply-To: "user@hive.apache.org" Date: Wednesday, August 23, 2017 at 12:24 AM To: "user@hive.apache.org" Subject: Re: ORC Transaction Table - Spark Hi, Yes

Re: LEFT JOIN and WHERE CLAUSE - How to handle

2017-08-23 Thread Furcy Pin
Ho, in that case... (First I notice that you say you want all records in TXN but in the query you give, you perform your join the other way round.) This is a typical use case that SQL is not very good at handling... The solutions I see are: - use RANK as you suggested. Note that Hive is smart

Re: LEFT JOIN and WHERE CLAUSE - How to handle

2017-08-23 Thread Ramasubramanian Narayanan
Hi, TXN.TXN_DT should be between CURRENCY.EFF_ST_DT and CURRENCY.EFF_END_DT. It needs to be equated. regards, Rams On Wed, Aug 23, 2017 at 7:55 PM, Furcy Pin wrote: > I would suggest to use a subquery > > WITH unique_currency AS ( > SELECT > CCY_CD, > MAX(CNTRY_DESC) as CNTRY_DESC >

Re: LEFT JOIN and WHERE CLAUSE - How to handle

2017-08-23 Thread Furcy Pin
I would suggest to use a subquery WITH unique_currency AS ( SELECT CCY_CD, MAX(CNTRY_DESC) as CNTRY_DESC FROM CURRENCY GROUP BY CCY_CD ) and then perform your left join on it. Some SQL engine (e.g. Presto) have aggregation functions like arbitrary(col) that take any value and are a

LEFT JOIN and WHERE CLAUSE - How to handle

2017-08-23 Thread Ramasubramanian Narayanan
Hi, Need your suggestion on the below. Have two tables TXN and CURRENCY. Need all records in TXN and hence doing Left Join with CURRENCY. *Two problems :* 1. CURRENCY table may contain duplicate records hence it needs to be handled through RANK or some other function. 2. If we equate TXN_DT bet

Re: How to optimize multiple count( distinct col) in Hive SQL

2017-08-23 Thread panfei
The full error stack is (which described here : https://issues.apache.org/jira/browse/MAPREDUCE-6108) : this error can not reproduce every time, after retry several times, the job successfully finished. 2017-08-23 17:16:03,574 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running chil

Re: ORC Transaction Table - Spark

2017-08-23 Thread Aviral Agarwal
Hi, Yes it caused by wrong naming convention of the delta directory : /apps/hive/warehouse/foo.db/bar/year=2017/month=5/delta_0645253_0645253_0001 How do I solve this ? Thanks ! Aviral Agarwal On Tue, Aug 22, 2017 at 11:50 PM, Eugene Koifman wrote: > Could you do recursive “ls” in your table