Re: Querying Hive tables from Spark

2016-06-27 Thread Gopal Vijayaraghavan
> I added a compact index to this table as below on 5 columns No, those are not what I recommend in this scenario. You made a statement that the table was sorted and it wasn't. >>Table is sorted in the order of prod_id, cust_id,time_id, channel_id and >> promo_id. It has 22 million rows. >> No

Re: Querying Hive tables from Spark

2016-06-27 Thread Mich Talebzadeh
Thanks Gopal. I added a compact index to this table as below on 5 columns hive> show formatted indexes on sales2; OK idx_nametab_namecol_names idx_tab_nameidx_typecomment sales2_idx sales2 prod_id, cust_id

Re: Querying Hive tables from Spark

2016-06-27 Thread Gopal Vijayaraghavan
> It appears to me that Spark does not rely on statistics that are >collected by Hive on say ORC tables. > It seems that Spark uses its own optimization to query the Hive tables >irrespective of Hive has collected by way of statistics etc? Spark does not have a cost based optimizer yet - please fo

RE: Querying Hive tables from Spark

2016-06-27 Thread Markovitz, Dudu
Hi Mich I could not figure out what is the point you are trying to make. Could you please clarify? Thanks Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Monday, June 27, 2016 12:20 PM To: user @spark ; user Subject: Querying Hive tables from Spark Hi, I have done some

Querying Hive tables from Spark

2016-06-27 Thread Mich Talebzadeh
Hi, I have done some extensive tests with Spark querying Hive tables. It appears to me that Spark does not rely on statistics that are collected by Hive on say ORC tables. It seems that Spark uses its own optimization to query the Hive tables irrespective of Hive has collected by way of statistic