Re: Querying Hive tables from Spark

2016-06-27 Thread Gopal Vijayaraghavan
> I added a compact index to this table as below on 5 columns No, those are not what I recommend in this scenario. You made a statement that the table was sorted and it wasn't. >>Table is sorted in the order of prod_id, cust_id,time_id, channel_id and >> promo_id. It has 22 million rows. >> No

Re: Querying Hive tables from Spark

2016-06-27 Thread Mich Talebzadeh
Thanks Gopal. I added a compact index to this table as below on 5 columns hive> show formatted indexes on sales2; OK idx_nametab_namecol_names idx_tab_nameidx_typecomment sales2_idx sales2 prod_id, cust_id

Re: Querying Hive tables from Spark

2016-06-27 Thread Gopal Vijayaraghavan
> It appears to me that Spark does not rely on statistics that are >collected by Hive on say ORC tables. > It seems that Spark uses its own optimization to query the Hive tables >irrespective of Hive has collected by way of statistics etc? Spark does not have a cost based optimizer yet - please fo

RE: Querying Hive tables from Spark

2016-06-27 Thread Markovitz, Dudu
Hi Mich I could not figure out what is the point you are trying to make. Could you please clarify? Thanks Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Monday, June 27, 2016 12:20 PM To: user @spark ; user Subject: Querying Hive tables from Spark Hi, I have done some e