[GSoC][Proposal] Integration project to deploy and use Mesos on a CloudStack based cloud

Dharmesh Kakadia Tue, 30 Apr 2013 02:35:29 -0700

Hi,

I am Dharmesh Kakdia and interested in project "Integration project to
deploy and use Mesos on a CloudStack based cloud" (
https://issues.apache.org/jira/browse/CLOUDSTACK-1784)


I am working on proposal and want to get feedback. Please provide
suggestions :)

*

Abstract:

The project aims to bring cloudformation[1] like service to cloudstack. One
of the prime use-case is cluster computing frameworks on cloudstack. A
cloudformation service will give users and administrators of cloudstack
ability to manage and control a set of resources easily. The cloudformation
will allow booting and configuring a set of VMs and form a cluster. Simple
example would be LAMP stack. More complex clusters such as mesos or hadoop
cluster requires a little more advanced configuration. There is already
some work done by Chiradeep Vittal at this front [5] using route and
sinatra. In this project, I will implement cloudformation service and
demonstrate how to run mesos cluster using it.

Mesos:

Mesos is a resource management platform for clusters [2]. It aims to
increase resource utilization of clusters by sharing cluster resources
among multiple processing frameworks(like MapReduce, MPI, Graph Processing)
or multiple instances of same framework. It provides efficient resource
isolation through use of containers. Uses zookeeper for state maintenance
and fault tolerance.

What can run on mesos ?

Spark: A cluster computing framework based on the Resilient Distributed
Datasets (RDDs) abstraction. RDD is more generalized than MapReduce and can
support iterative and interactive computation while retaining fault
tolerance, scalability, data locality etc.

Hadoop: Hadoop is fault tolerant and scalable distributed computing
framework based on MapReduce abstraction.

Begel: A graph processing framework based on pregel.

and other frameworks like MPI, Hypertable.

How to deploy mesos

Mesos provides cluster installation scripts [7] for cluster deployment.
There are also scripts available to deploy a cluster on Amazon EC2 [8].

Deliverables:

1. Cloudformation service implementation on cloudstack.

2. Integration of cloudformation with cloudmonkey, CLI tool.

2. Proof of concept of running mesos on top of cloudstack using the service.

3. Related documentation.

Architecture and Tools:

The high level architecture I propose is as follows:

  It includes following components:

1. CloudFormation ReST server:

This acts as a point of contact to and exposes CloudFormation functionality
as ReST service. This can be accessed directly or through cloudmonkey. I
will add those functionalities in cloudmonkey. I plan to use dropwizard [3]
to start with. Later may be the API server can be merged with management
server. I plan to use mysql for storing details of clusters.

2. Provisioning:

Provisioning module is responsible for handling the booting process of the
VMs through cloudstack. This uses the cloudstack APIs for launching VMs. I
plan to use preconfigured templates/images with required dependencies
installed, which will make cluster creation process much faster even for
large clusters. Error handling is very important part of this module. For
example, what you do if few VMs fail to boot in cluster ?

3. Configuration:

This module deals with configuring the VMs to form a cluster. This can be
done via manual scripts/code or via configuration management tools like
chef. I plan to use workflow automation tools like rundeck [4].

In general, I want to use tools around java as much as possible as
cloudstack is mostly in java. This will make the project easier to maintain
and develop.

Why ReST ?

I believe decoupling provided by the ReST architecture makes it easy to
extend in future.  Say for example, if one wants to extend the
cloudformation service to include features like auto-scaling of clusters
based on some user criteria (rule-based/monitoring etc).

 Services:

1. POST : create a cluster

   -

      accepts : cluster configuration json
      -

      produces : clusterId

 2. GET : get the current status of request

   -

      accepts : clusterId
      -

      produces : json describing current status if the cluster.

3. DELETE : remove a cluster

   -

      accepts : clusterId
      -

      produces : result (sucess/failure)

 4. UPDATE : adding a node to a cluster

   -

      accepts : cluster configuration json and clusterId
      -

      produces : result (sucess/failure)


Timeline:

1-1.5 week : project design. Architecture, tools selection, API design.

1-1.5 week : getting familiar with cloudstack codebase and architecture
details.

1-1.5 week : getting familiar with mesos internals.

1-1.5 week : setting up the dev environment

2-3 week : build provisioning and configuration module

Midterm evaluation: provisioning module, configuration module

1-2 week : develope ReST server

2-3 week : test and integrate

About me:

I am MS by Research student at International Institute of Information
Technology Hyderabad (IIIT-H), Hyderabad, India. I operate our small lab
cluster operating on Openstack and I am working on a similar project,
HadoopStack [6], which aims to bring data processing to a multi-cloud
environment (work in progress). My area of research is scheduling in large
scale distributed systems. I have experience with related tools like
Hadoop, Mesos, OpenStack, Chef, Ironfan and jClouds.

Email-contact : [email protected]

More info: http://researchweb.iiit.ac.in/~dharmesh.kakadia/

Why me ?

I love open-source projects. I am fascinated by distributed computing and
interested in building and optimizing large scale systems and data
processing frameworks.

References

[1] http://aws.amazon.com/cloudformation/

[2] http://incubator.apache.org/mesos/

[3] http://dropwizard.codahale.com/

[4] http://rundeck.org/

[5] https://github.com/chiradeep/stackmate

[6] http://siel-iiith.github.io/HadoopStack/

[7] https://github.com/apache/mesos/blob/trunk/docs/Deploy-Scripts.textile

[8] https://github.com/apache/mesos/blob/trunk/docs/EC2-Scripts.textile
**

In case you are having trouble in reading, google docs of above is here :

https://docs.google.com/document/d/1ocoBmyHDtOVnBhCELVt1QcgkubSzCyksls2MCTuDPL0

*

[GSoC][Proposal] Integration project to deploy and use Mesos on a CloudStack based cloud

Reply via email to