> I added a compact index to this table as below on 5 columns
No, those are not what I recommend in this scenario.
You made a statement that the table was sorted and it wasn't.
>>Table is sorted in the order of prod_id, cust_id,time_id, channel_id and
>> promo_id. It has 22 million rows.
>> No
Thanks Gopal.
I added a compact index to this table as below on 5 columns
hive> show formatted indexes on sales2;
OK
idx_nametab_namecol_names
idx_tab_nameidx_typecomment
sales2_idx sales2 prod_id, cust_id
> It appears to me that Spark does not rely on statistics that are
>collected by Hive on say ORC tables.
> It seems that Spark uses its own optimization to query the Hive tables
>irrespective of Hive has collected by way of statistics etc?
Spark does not have a cost based optimizer yet - please fo
Hi Mich
I could not figure out what is the point you are trying to make.
Could you please clarify?
Thanks
Dudu
From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Monday, June 27, 2016 12:20 PM
To: user @spark ; user
Subject: Querying Hive tables from Spark
Hi,
I have done some
Hi,
I have done some extensive tests with Spark querying Hive tables.
It appears to me that Spark does not rely on statistics that are collected
by Hive on say ORC tables. It seems that Spark uses its own optimization to
query the Hive tables irrespective of Hive has collected by way of
statistic