Hi Wlodeck, Let us look at this.
In Oracle I have two tables channels and sales. This code works in Oracle 1 select c.channel_id, sum(c.channel_id * (select count(1) from sales s WHERE c.channel_id = s.channel_id)) As R 2 from channels c 3* group by c.channel_id s...@mydb.mich.LOCAL> / CHANNEL_ID R ---------- ---------- 2 516050 3 1620984 4 473664 5 0 9 18666 I have the same tables In Hive but the same query crashes! hive> select c.channel_id, sum(c.channel_id * (select count(1) from sales s WHERE c.channel_id = s.channel_id)) As R > from channels c > group by c.channel_id > ; NoViableAltException(232@[435:1: precedenceEqualExpression : ( ( LPAREN precedenceBitwiseOrExpression COMMA )=> precedenceEqualExpressionMutiple | precedenceEqualExpressionSingle );]) The solution is to use a temporary table to keep the sum/group by from sales table as an intermediate stage (temporary tables are session specific and they are created and dropped after you finish the session) hive> create temporary table tmp as select channel_id, count(channel_id) as total from sales group by channel_id; Ok the rest is pretty easy hive> select c.channel_id, c.channel_id * t.total as results > from channels c, tmp t > where c.channel_id = t.channel_id; 2.0 2800432.0 3.0 8802300.0 4.0 2583552.0 9.0 104013.0 HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 14 March 2016 at 14:22, ws <wlodek_sie...@yahoo.com> wrote: > Hive 1.2.1.2.3.4.0-3485 > Spark 1.5.2 > Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit > Production > > ### > SELECT > f.description, > f.item_number, > sum(f.df_a * (select count(1) from e.mv_A_h_a where hb_h_name = r.h_id)) > as df_a > FROM e.eng_fac_atl_sc_bf_qty f, wv_ATL_2_qty_df_rates r > where f.item_number NOT LIKE 'HR%' AND f.item_number NOT LIKE 'UG%' AND > f.item_number NOT LIKE 'DEV%' > group by > f.description, > f.item_number > ### > > This query works fine in oracle but not Hive or Spark. > So the problem is: "sum(f.df_a * (select count(1) from e.mv_A_h_a where > hb_h_name = r.h_id)) as df_a" field. > > > Thanks, > Wlodek > -- > > > On Sunday, March 13, 2016 7:36 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > > Depending on the version of Hive on Spark engine. > > As far as I am aware the latest version of Hive that I am using (Hive 2) > has improvements compared to the previous versions of Hive (0.14,1.2.1) on > Spark engine. > > As of today I have managed to use Hive 2.0 on Spark version 1.3.1. So it > is not the latest Spark but it is pretty good. > > What specific concerns do you have in mind? > > HTH > > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 13 March 2016 at 23:27, sjayatheertha <sjayatheer...@gmail.com> wrote: > > Just curious if you could share your experience on the performance of > spark in your company? How much data do you process? And what's the latency > you are getting with spark engine? > > Vidya > > > > >