i used to do 1) but couldnt get it to work on yarn and the trend seemed
towards 2) using spark-submit so i gave in

the main promise of 2) is tha you can provide an application that can run
on multiple hadoop and spark versions. however for that to become true
spark needs to address the issue of user-classpath-first being broken. for
example if i want to use a recent version of avro i am out of luck, even if
i bundle it with my app, because an old version could be in classpath in
spark (it is for example on cdh) and i cannot override the classpath
currently.


On Sun, Jul 27, 2014 at 9:32 PM, Mayur Rustagi <mayur.rust...@gmail.com>
wrote:

> Based on some discussions with my application users, I have been trying to
> come up with a standard way to deploy applications built on Spark
>
> 1. Bundle the version of spark with your application and ask users store
> it in hdfs before referring it in yarn to boot your application
> 2. Provide ways to manage dependency in your app across various versions
> of spark bundled in with Hadoop distributions
>
> 1 provides greater control and reliability as I am only working against
> yarn versions and dependencies, I assume 2 gives me some benefits of
> distribution versions of spark (easier management, common sysops tools ?? )
> .
> I was wondering if anyone has thoughts around both and any reasons to
> prefer one over the other.
>
> Sent from my iPad

Reply via email to