Based on some discussions with my application users, I have been trying to come 
up with a standard way to deploy applications built on Spark

1. Bundle the version of spark with your application and ask users store it in 
hdfs before referring it in yarn to boot your application
2. Provide ways to manage dependency in your app across various versions of 
spark bundled in with Hadoop distributions 

1 provides greater control and reliability as I am only working against yarn 
versions and dependencies, I assume 2 gives me some benefits of distribution 
versions of spark (easier management, common sysops tools ?? ) . 
I was wondering if anyone has thoughts around both and any reasons to prefer 
one over the other. 

Sent from my iPad

Reply via email to