;
>
> Yong
>
>
> ------------------
> *From:* Takeshi Yamamuro
> *Sent:* Thursday, August 4, 2016 8:18 AM
> *To:* Marco Colombo
> *Cc:* user
> *Subject:* Re: Spark SQL and number of task
>
> Seems the performance difference comes from `CassandraSou
Sent: Thursday, August 4, 2016 8:18 AM
To: Marco Colombo
Cc: user
Subject: Re: Spark SQL and number of task
Seems the performance difference comes from `CassandraSourceRelation`.
I'm not familiar with the implementation though, I guess the filter `IN` is
pushed down
into the datasource an
Seems the performance difference comes from `CassandraSourceRelation`.
I'm not familiar with the implementation though, I guess the filter `IN` is
pushed down
into the datasource and the other not.
You'd better off checking performance metrics in webUI.
// maropu
On Thu, Aug 4, 2016 at 8:41 PM,
Ok, thanx.
The 2 plan are very similar
with in condition
+--+--+
|
plan |
+-
Hi,
Please type `sqlCtx.sql("select * ").explain` to show execution plans.
Also, you can kill jobs from webUI.
// maropu
On Thu, Aug 4, 2016 at 4:58 PM, Marco Colombo
wrote:
> Hi all, I've a question on how hive+spark are handling data.
>
> I've started a new HiveContext and I'm extracti
Hi all, I've a question on how hive+spark are handling data.
I've started a new HiveContext and I'm extracting data from cassandra.
I've configured spark.sql.shuffle.partitions=10.
Now, I've following query:
select d.id, avg(d.avg) from v_points d where id=90 group by id;
I see that 10 task are