Re: Compare a column in two different tables/find the distance between column data

2016-03-14 Thread Wail Alkowaileet
rom performance point of view where i have data volume in TB , i am not > sure if i can achieve this using the sql statement. What would be the best > approach of solving this problem. Should i look for MLLIB apis? > > Spark Gurus any pointers? > > Thanks, > Suniti > > > -- *Regards,* Wail Alkowaileet

Re: Dataset throws: Task not serializable

2016-01-11 Thread Wail Alkowaileet
gt;> the fields of a case class doesn't match with the order of the DataFrame's >>> schema. >> >> >> We have tests for reordering >> <https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L97>

Is SparkSQL optimizer aware of the needed data after the query?

2015-03-02 Thread Wail
Dears, I'm just curious about the complexity of the query optimizer. Can the optimizer evaluates what after the SQL? maybe it's a stupid question ,, but here is an example to show the case: >From the Spark SQL example: val teenagers = sqlContext.sql("SELECT * FROM people WHERE age >= 13 AND age <