i used to do 1) but couldnt get it to work on yarn and the trend seemed towards 2) using spark-submit so i gave in
the main promise of 2) is tha you can provide an application that can run on multiple hadoop and spark versions. however for that to become true spark needs to address the issue of user-classpath-first being broken. for example if i want to use a recent version of avro i am out of luck, even if i bundle it with my app, because an old version could be in classpath in spark (it is for example on cdh) and i cannot override the classpath currently. On Sun, Jul 27, 2014 at 9:32 PM, Mayur Rustagi <mayur.rust...@gmail.com> wrote: > Based on some discussions with my application users, I have been trying to > come up with a standard way to deploy applications built on Spark > > 1. Bundle the version of spark with your application and ask users store > it in hdfs before referring it in yarn to boot your application > 2. Provide ways to manage dependency in your app across various versions > of spark bundled in with Hadoop distributions > > 1 provides greater control and reliability as I am only working against > yarn versions and dependencies, I assume 2 gives me some benefits of > distribution versions of spark (easier management, common sysops tools ?? ) > . > I was wondering if anyone has thoughts around both and any reasons to > prefer one over the other. > > Sent from my iPad