@Daniel, there are at least 3 things that EMR can not solve, yet:
- HA support
- AWS provides auto scaling feature, but scale up/down EMR needs manual
operations
- security concerns in a public VPC

EMR is basically designed for short term running use cases with some
pre-defined bootstrap actions and steps, so mainly for scheduled querying
processes, not good as a permanent running cluster for adhoc queries and
analytical works.

Therefore in our organization (a e-commerce company in europe, most of you
may never heard :p but we have more than 1000 techies and 10k employees
now...), we made a solution for this:
https://github.com/zalando/spark-appliance

It enables HA with zookeeper, nodes are under a auto scaling group, and
running in private subnets, provides REST api secured with oauth, and even
integrated with jupyter notebook :)


Am Samstag, 20. Februar 2016 schrieb Sabarish Sasidharan :
> EMR does cost more than vanilla EC2. Using spark-ec2 can result in
savings with large clusters, though that is not everybody's cup of tea.
>
> Regards
> Sab
>
> On 19-Feb-2016 7:55 pm, "Daniel Siegmann" <daniel.siegm...@teamaol.com>
wrote:
>>
>> With EMR supporting Spark, I don't see much reason to use the spark-ec2
script unless it is important for you to be able to launch clusters using
the bleeding edge version of Spark. EMR does seem to do a pretty decent job
of keeping up to date - the latest version (4.3.0) supports the latest
Spark version (1.6.0).
>>
>> So I'd flip the question around and ask: is there any reason to continue
using the spark-ec2 script rather than EMR?
>>
>> On Thu, Feb 18, 2016 at 11:39 AM, James Hammerton <ja...@gluru.co> wrote:
>>>
>>> I have now... So far  I think the issues I've had are not related to
this, but I wanted to be sure in case it should be something that needs to
be patched. I've had some jobs run successfully but this warning appears in
the logs.
>>> Regards,
>>> James
>>>
>>> On 18 February 2016 at 12:23, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>> Have you seen this ?
>>>> HADOOP-10988
>>>>
>>>> Cheers
>>>> On Thu, Feb 18, 2016 at 3:39 AM, James Hammerton <ja...@gluru.co>
wrote:
>>>>>
>>>>> HI,
>>>>> I am seeing warnings like this in the logs when I run Spark jobs:
>>>>>
>>>>> OpenJDK 64-Bit Server VM warning: You have loaded library
/root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have
disabled stack guard. The VM will try to fix the stack guard now.
>>>>> It's highly recommended that you fix the library with 'execstack -c
<libfile>', or link it with '-z noexecstack'.
>>>>>
>>>>> I used spark-ec2 to launch the cluster with the default AMI, Spark
1.5.2, hadoop major version 2.4. I altered the jdk to be openjdk 8 as I'd
written some jobs in Java 8. The 6 workers nodes are m4.2xlarge and master
is m4.large.
>>>>> Could this contribute to any problems running the jobs?
>>>>> Regards,
>>>>> James
>>>
>>
>

Reply via email to