Looks like going with cluster mode is not a good idea:
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-administer-use-management-portal/

Seems like a non-HDInsight VM might be needed to make it the Spark master
node.

Marco



On Mon, Jul 14, 2014 at 12:43 PM, Marco Shaw <marco.s...@gmail.com> wrote:

> I'm a Spark and HDInsight novice, so I could be wrong...
>
> HDInsight is based on HDP2, so my guess here is that you have the option
> of installing/configuring Spark in cluster mode (YARN) or in standalone
> mode and package the Spark binaries with your job.
>
> Everything I seem to look at is related to UNIX shell scripts.  So, one
> might need to pull apart some of these scripts to pick out how to run this
> on Windows.
>
> Interesting project...
>
> Marco
>
>
>
> On Mon, Jul 14, 2014 at 8:00 AM, Niek Tax <niek...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Currently I am working on parallelizing a machine learning algorithm
>> using a Microsoft HDInsight cluster. I tried running my algorithm on Hadoop
>> MapReduce, but since my algorithm is iterative the job scheduling overhead
>> and data loading overhead severely limits the performance of my algorithm
>> in terms of training time.
>>
>> Since recently, HDInsight supports Hadoop 2 with YARN, which I thought
>> would allow me to use run Spark jobs, which seem more fitting for my task. So
>> far I have not been able however to find how I can run Apache Spark jobs on
>> a HDInsight cluster.
>>
>> It seems like remote job submission (which would have my preference) is
>> not possible for Spark on HDInsight, as REST endpoints for Oozie and
>> templeton do not seem to support submission of Spark jobs. I also tried to
>> RDP to the headnode for job submission from the headnode. On the headnode
>> drives I can find other new YARN computation models like Tez and I also
>> managed to run Tez jobs on it through YARN. However, Spark seems to be
>> missing. Does this mean that HDInsight currently does not support Spark,
>> even though it supports Hadoop versions with YARN? Or do I need to install
>> Spark on the HDInsight cluster first, in some way? Or is there maybe
>> something else that I'm missing and can I run Spark jobs on HDInsight some
>> other way?
>>
>> Many thanks in advance!
>>
>>
>> Kind regards,
>>
>> Niek Tax
>>
>
>

Reply via email to