As mentioned in a previous post, I have an application which relies on a quick response. The application matches a client's image against a set of stored images. Image features are stored in a SequenceFile and passed over JNI to match in OpenCV, along with the features for the client's image. An id for the matched image is returned.
I was using Hadoop 1.2.1 and achieved some pretty good results, but the job initialization was taking about 15 seconds, and we'd hoped to have a response in ~5 seconds. So we moved to Hadoop 2.2, YARN & Spark. Sadly, job initialization is still taking over 10 seconds (on a cluster of 10 EC2 m1.large). Any suggestions on what I can do to bring this initialization time down? Once the executors begin work, the performance is quite good, but any general performance optimization tips also welcome! Thanks. - Dan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-optimization-tp2017.html Sent from the Apache Spark User List mailing list archive at Nabble.com.