Spark performance optimization

polkosity Mon, 24 Feb 2014 22:44:33 -0800

As mentioned in a previous post, I have an application which relies on a
quick response.  The application matches a client's image against a set of
stored images.  Image features are stored in a SequenceFile and passed over
JNI to match in OpenCV, along with the features for the client's image.  An
id for the matched image is returned.


I was using Hadoop 1.2.1 and achieved some pretty good results, but the job
initialization was taking about 15 seconds, and we'd hoped to have a
response in ~5 seconds.  So we moved to Hadoop 2.2, YARN & Spark.  Sadly,
job initialization is still taking over 10 seconds (on a cluster of 10 EC2
m1.large).

Any suggestions on what I can do to bring this initialization time down?

Once the executors begin work, the performance is quite good, but any
general performance optimization tips also welcome!

Thanks. 
- Dan



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-optimization-tp2017.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark performance optimization

Reply via email to