Hi, I am Dharmesh Kakdia and interested in project "Integration project to deploy and use Mesos on a CloudStack based cloud" ( https://issues.apache.org/jira/browse/CLOUDSTACK-1784)
I am working on proposal and want to get feedback. Please provide suggestions :) * Abstract: The project aims to bring cloudformation[1] like service to cloudstack. One of the prime use-case is cluster computing frameworks on cloudstack. A cloudformation service will give users and administrators of cloudstack ability to manage and control a set of resources easily. The cloudformation will allow booting and configuring a set of VMs and form a cluster. Simple example would be LAMP stack. More complex clusters such as mesos or hadoop cluster requires a little more advanced configuration. There is already some work done by Chiradeep Vittal at this front [5] using route and sinatra. In this project, I will implement cloudformation service and demonstrate how to run mesos cluster using it. Mesos: Mesos is a resource management platform for clusters [2]. It aims to increase resource utilization of clusters by sharing cluster resources among multiple processing frameworks(like MapReduce, MPI, Graph Processing) or multiple instances of same framework. It provides efficient resource isolation through use of containers. Uses zookeeper for state maintenance and fault tolerance. What can run on mesos ? Spark: A cluster computing framework based on the Resilient Distributed Datasets (RDDs) abstraction. RDD is more generalized than MapReduce and can support iterative and interactive computation while retaining fault tolerance, scalability, data locality etc. Hadoop: Hadoop is fault tolerant and scalable distributed computing framework based on MapReduce abstraction. Begel: A graph processing framework based on pregel. and other frameworks like MPI, Hypertable. How to deploy mesos Mesos provides cluster installation scripts [7] for cluster deployment. There are also scripts available to deploy a cluster on Amazon EC2 [8]. Deliverables: 1. Cloudformation service implementation on cloudstack. 2. Integration of cloudformation with cloudmonkey, CLI tool. 2. Proof of concept of running mesos on top of cloudstack using the service. 3. Related documentation. Architecture and Tools: The high level architecture I propose is as follows: It includes following components: 1. CloudFormation ReST server: This acts as a point of contact to and exposes CloudFormation functionality as ReST service. This can be accessed directly or through cloudmonkey. I will add those functionalities in cloudmonkey. I plan to use dropwizard [3] to start with. Later may be the API server can be merged with management server. I plan to use mysql for storing details of clusters. 2. Provisioning: Provisioning module is responsible for handling the booting process of the VMs through cloudstack. This uses the cloudstack APIs for launching VMs. I plan to use preconfigured templates/images with required dependencies installed, which will make cluster creation process much faster even for large clusters. Error handling is very important part of this module. For example, what you do if few VMs fail to boot in cluster ? 3. Configuration: This module deals with configuring the VMs to form a cluster. This can be done via manual scripts/code or via configuration management tools like chef. I plan to use workflow automation tools like rundeck [4]. In general, I want to use tools around java as much as possible as cloudstack is mostly in java. This will make the project easier to maintain and develop. Why ReST ? I believe decoupling provided by the ReST architecture makes it easy to extend in future. Say for example, if one wants to extend the cloudformation service to include features like auto-scaling of clusters based on some user criteria (rule-based/monitoring etc). Services: 1. POST : create a cluster - accepts : cluster configuration json - produces : clusterId 2. GET : get the current status of request - accepts : clusterId - produces : json describing current status if the cluster. 3. DELETE : remove a cluster - accepts : clusterId - produces : result (sucess/failure) 4. UPDATE : adding a node to a cluster - accepts : cluster configuration json and clusterId - produces : result (sucess/failure) Timeline: 1-1.5 week : project design. Architecture, tools selection, API design. 1-1.5 week : getting familiar with cloudstack codebase and architecture details. 1-1.5 week : getting familiar with mesos internals. 1-1.5 week : setting up the dev environment 2-3 week : build provisioning and configuration module Midterm evaluation: provisioning module, configuration module 1-2 week : develope ReST server 2-3 week : test and integrate About me: I am MS by Research student at International Institute of Information Technology Hyderabad (IIIT-H), Hyderabad, India. I operate our small lab cluster operating on Openstack and I am working on a similar project, HadoopStack [6], which aims to bring data processing to a multi-cloud environment (work in progress). My area of research is scheduling in large scale distributed systems. I have experience with related tools like Hadoop, Mesos, OpenStack, Chef, Ironfan and jClouds. Email-contact : dhkaka...@gmail.com More info: http://researchweb.iiit.ac.in/~dharmesh.kakadia/ Why me ? I love open-source projects. I am fascinated by distributed computing and interested in building and optimizing large scale systems and data processing frameworks. References [1] http://aws.amazon.com/cloudformation/ [2] http://incubator.apache.org/mesos/ [3] http://dropwizard.codahale.com/ [4] http://rundeck.org/ [5] https://github.com/chiradeep/stackmate [6] http://siel-iiith.github.io/HadoopStack/ [7] https://github.com/apache/mesos/blob/trunk/docs/Deploy-Scripts.textile [8] https://github.com/apache/mesos/blob/trunk/docs/EC2-Scripts.textile ** In case you are having trouble in reading, google docs of above is here : https://docs.google.com/document/d/1ocoBmyHDtOVnBhCELVt1QcgkubSzCyksls2MCTuDPL0 *