Guys,
I'm trying to join 2-3 schemaRDD's for approx 30,000 rows and it is terribly
slow.No doubt I get the results but it takes 8s to do the  join and get the
results.
I'm running on a standalone spark in my m/c having 8 cores and 12gb RAM with
4 workers.
Not sure why it is consuming time,any inputs appreciated..

This is just an e.g on what I'm trying to say.

RDD1(30,000 rows)
state,city,amount

RDD2 (50 rows)
state,amount1

join by state
New RDD3:(30,000 rows)
state,city,amount,amount1

Do a select(amount-amount1) from New RDD3.









--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-with-join-terribly-slow-tp20751.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to