Hey Jerry,
When you ran these queries using different methods, did you see any
discrepancy in the returned results (i.e. the counts)?
On Thu, Jul 10, 2014 at 5:55 PM, Michael Armbrust
wrote:
> Yeah, sorry. I think you are seeing some weirdness with partitioned tables
> that I have also seen els
Yeah, sorry. I think you are seeing some weirdness with partitioned tables
that I have also seen elsewhere. I've created a JIRA and assigned someone
at databricks to investigate.
https://issues.apache.org/jira/browse/SPARK-2443
On Thu, Jul 10, 2014 at 5:33 PM, Jerry Lam wrote:
> Hi Michael,
>
Hi Michael,
Yes the table is partitioned on 1 column. There are 11 columns in the table
and they are all String type.
I understand that SerDes contributes to some overheads but using pure Hive,
we could run the query about 5 times faster than SparkSQL. Given that Hive
also has the same SerDes ove
On Thu, Jul 10, 2014 at 2:08 PM, Jerry Lam wrote:
>
> For the curious mind, the dataset is about 200-300GB and we are using 10
> machines for this benchmark. Given the env is equal between the two
> experiments, why pure spark is faster than SparkSQL?
>
There is going to be some overhead to parsi
Hi Spark users,
Also, to put the performance issue into perspective, we also ran the query
on Hive. It took about 5 minutes to run.
Best Regards,
Jerry
On Thu, Jul 10, 2014 at 5:10 PM, Jerry Lam wrote:
> By the way, I also try hql("select * from m").count. It is terribly slow
> too.
>
>
>
By the way, I also try hql("select * from m").count. It is terribly slow
too.
On Thu, Jul 10, 2014 at 5:08 PM, Jerry Lam wrote:
> Hi Spark users and developers,
>
> I'm doing some simple benchmarks with my team and we found out a potential
> performance issue using Hive via SparkSQL. It is very
Hi Spark users and developers,
I'm doing some simple benchmarks with my team and we found out a potential
performance issue using Hive via SparkSQL. It is very bothersome. So your
help in understanding why it is terribly slow is very very important.
First, we have some text files in HDFS which ar