> I added a compact index to this table as below on 5 columns
No, those are not what I recommend in this scenario.
You made a statement that the table was sorted and it wasn't.
>>Table is sorted in the order of prod_id, cust_id,time_id, channel_id and
>> promo_id. It has 22 million rows.
>> No
Thanks Gopal.
I added a compact index to this table as below on 5 columns
hive> show formatted indexes on sales2;
OK
idx_nametab_namecol_names
idx_tab_nameidx_typecomment
sales2_idx sales2 prod_id, cust_id
> It appears to me that Spark does not rely on statistics that are
>collected by Hive on say ORC tables.
> It seems that Spark uses its own optimization to query the Hive tables
>irrespective of Hive has collected by way of statistics etc?
Spark does not have a cost based optimizer yet - please fo
Hi Mich
I could not figure out what is the point you are trying to make.
Could you please clarify?
Thanks
Dudu
From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Monday, June 27, 2016 12:20 PM
To: user @spark ; user
Subject: Querying Hive tables from Spark
Hi,
I have done some e