Guys, I'm trying to join 2-3 schemaRDD's for approx 30,000 rows and it is terribly slow.No doubt I get the results but it takes 8s to do the join and get the results. I'm running on a standalone spark in my m/c having 8 cores and 12gb RAM with 4 workers. Not sure why it is consuming time,any inputs appreciated..
This is just an e.g on what I'm trying to say. RDD1(30,000 rows) state,city,amount RDD2 (50 rows) state,amount1 join by state New RDD3:(30,000 rows) state,city,amount,amount1 Do a select(amount-amount1) from New RDD3. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-with-join-terribly-slow-tp20751.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org