Looks like going with cluster mode is not a good idea: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-administer-use-management-portal/
Seems like a non-HDInsight VM might be needed to make it the Spark master node. Marco On Mon, Jul 14, 2014 at 12:43 PM, Marco Shaw <marco.s...@gmail.com> wrote: > I'm a Spark and HDInsight novice, so I could be wrong... > > HDInsight is based on HDP2, so my guess here is that you have the option > of installing/configuring Spark in cluster mode (YARN) or in standalone > mode and package the Spark binaries with your job. > > Everything I seem to look at is related to UNIX shell scripts. So, one > might need to pull apart some of these scripts to pick out how to run this > on Windows. > > Interesting project... > > Marco > > > > On Mon, Jul 14, 2014 at 8:00 AM, Niek Tax <niek...@gmail.com> wrote: > >> Hi everyone, >> >> Currently I am working on parallelizing a machine learning algorithm >> using a Microsoft HDInsight cluster. I tried running my algorithm on Hadoop >> MapReduce, but since my algorithm is iterative the job scheduling overhead >> and data loading overhead severely limits the performance of my algorithm >> in terms of training time. >> >> Since recently, HDInsight supports Hadoop 2 with YARN, which I thought >> would allow me to use run Spark jobs, which seem more fitting for my task. So >> far I have not been able however to find how I can run Apache Spark jobs on >> a HDInsight cluster. >> >> It seems like remote job submission (which would have my preference) is >> not possible for Spark on HDInsight, as REST endpoints for Oozie and >> templeton do not seem to support submission of Spark jobs. I also tried to >> RDP to the headnode for job submission from the headnode. On the headnode >> drives I can find other new YARN computation models like Tez and I also >> managed to run Tez jobs on it through YARN. However, Spark seems to be >> missing. Does this mean that HDInsight currently does not support Spark, >> even though it supports Hadoop versions with YARN? Or do I need to install >> Spark on the HDInsight cluster first, in some way? Or is there maybe >> something else that I'm missing and can I run Spark jobs on HDInsight some >> other way? >> >> Many thanks in advance! >> >> >> Kind regards, >> >> Niek Tax >> > >