Spark SQL performance issue.

Nikolay Tikhonov Wed, 22 Apr 2015 04:49:01 -0700

Hi,
I have Spark SQL performance issue. My code contains a simple JavaBean:


public class Person implements Externalizable {
>     private int id;
>     private String name;
>     private double salary;
>     ....................
> }
>

Apply a schema to an RDD and register table.

JavaRDD<Person> rdds = ...
> rdds.cache();
>
> DataFrame dataFrame = sqlContext.createDataFrame(rdds, Person.class);
> dataFrame.registerTempTable("person");
>
> sqlContext.cacheTable("person");
>

Run sql query.

sqlContext.sql("SELECT id, name, salary FROM person WHERE salary >= YYY AND
> salary <= XXX").collectAsList()
>

I launch standalone cluster which contains 4 workers. Each node runs on
machine with 8 CPU and 15 Gb memory. When I run the query on the
environment over RDD which contains 1000000 it takes 1 minute. Somebody can
tell me how to tuning the performance?

Spark SQL performance issue.

Reply via email to