> On Sep 18, 2013, at 4:53 PM, Sergey Lukjanov <slukja...@mirantis.com> wrote: > > Hi folks, > > I have few comments on Hadoop cluster provisioning in Savanna. > > Now Savanna provisions instances, install management console (like Apache > Ambari) on one them and communicate with it using REST API of the installed > console to prepare and run all requested services at all instances. So, the > only provisioning that we're doing in Savanna is the instance, volumes > creation and their initial configuration like /etc/hosts generation for all > instances. The most part of these operations or even all of them should be > eventually removed by Heat integration during the potential incubation in > Icehouse cycle, so, after it we'll be concentrated at EDP (Elastic Data > Processing) operations. > > I was surprised how much time was spent on clustering discussion at the last > TC meeting and that there was a small amount of other questions. So, I think > that it'll be better to separate clustering discussion that is a long-term > activity with plans to be discussed during the design summit and Savanna > incubation request that should be finally discussed at the next TC meeting. > Of course, I think that it's a right way for Savanna to participate > clustering discussions. From our perspective, clustering should be > implemented as additional functionality in underlying services like Nova, > Cinder, Heat and libraries - Oslo, Taskflow, that will help projects like > Savanna, Trove and etc. to provisioning resources for clusters, scale and > terminate them. So, our role in it is to collaborate on such features > implementation. One more interesting idea - clustering API standardization, > it sounds interesting, but it looks like that such APIs could be very > different, for example, our current working API [0] and Trove's draft for > Cluster API [1].
Draft APIs are subject to change :) until we put the code in place I would be ok modifying the API. +1 to working together at the summit to bring the API differences together. We have a trove clustering session and I'd LOVE to have savanna folk at it. Lets unify ideas!! > > I also would like to ensure that Savanna team is 100% behind the idea of > doing full integration with all applicable OpenStack projects during > incubation. > > Thanks. > > [0] > https://savanna.readthedocs.org/en/latest/userdoc/rest_api_v1.0.html#node-group-templates > [1] https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API > > Sincerely yours, > Sergey Lukjanov > Savanna Technical Lead > Mirantis Inc. > >> On Sep 13, 2013, at 22:35, Clint Byrum <cl...@fewbar.com> wrote: >> >> Excerpts from Michael Basnight's message of 2013-09-13 08:26:07 -0700: >>>> On Sep 13, 2013, at 6:56 AM, Alexander Kuznetsov wrote: >>>> On Thu, Sep 12, 2013 at 7:30 PM, Michael Basnight <mbasni...@gmail.com> >>>> wrote: >>>> On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote: >>>> >>>>> Sergey Lukjanov wrote: >>>>> >>>>>> [...] >>>>>> As you can see, resources provisioning is just one of the features and >>>>>> the implementation details are not critical for overall architecture. It >>>>>> performs only the first step of the cluster setup. We’ve been >>>>>> considering Heat for a while, but ended up direct API calls in favor of >>>>>> speed and simplicity. Going forward Heat integration will be done by >>>>>> implementing extension mechanism [3] and [4] as part of Icehouse release. >>>>>> >>>>>> The next part, Hadoop cluster configuration, already extensible and we >>>>>> have several plugins - Vanilla, Hortonworks Data Platform and Cloudera >>>>>> plugin started too. This allow to unify management of different Hadoop >>>>>> distributions under single control plane. The plugins are responsible >>>>>> for correct Hadoop ecosystem configuration at already provisioned >>>>>> resources and use different Hadoop management tools like Ambari to setup >>>>>> and configure all cluster services, so, there are no actual >>>>>> provisioning configs on Savanna side in this case. Savanna and its >>>>>> plugins encapsulate the knowledge of Hadoop internals and default >>>>>> configuration for Hadoop services. >>>>> >>>>> My main gripe with Savanna is that it combines (in its upcoming release) >>>>> what sounds like to me two very different services: Hadoop cluster >>>>> provisioning service (like what Trove does for databases) and a >>>>> MapReduce+ data API service (like what Marconi does for queues). >>>>> >>>>> Making it part of the same project (rather than two separate projects, >>>>> potentially sharing the same program) make discussions about shifting >>>>> some of its clustering ability to another library/project more complex >>>>> than they should be (see below). >>>>> >>>>> Could you explain the benefit of having them within the same service, >>>>> rather than two services with one consuming the other ? >>>> >>>> And for the record, i dont think that Trove is the perfect fit for it >>>> today. We are still working on a clustering API. But when we create it, i >>>> would love the Savanna team's input, so we can try to make a pluggable API >>>> thats usable for people who want MySQL or Cassandra or even Hadoop. Im >>>> less a fan of a clustering library, because in the end, we will both have >>>> API calls like POST /clusters, GET /clusters, and there will be API >>>> duplication between the projects. >>>> >>>> I think that Cluster API (if it would be created) will be helpful not only >>>> for Trove and Savanna. NoSQL, RDBMS and Hadoop are not unique software >>>> which can be clustered. What about different kind of messaging solutions >>>> like RabbitMQ, ActiveMQ or J2EE containers like JBoss, Weblogic and >>>> WebSphere, which often are installed in clustered mode. Messaging, >>>> databases, J2EE containers and Hadoop have their own management cycle. It >>>> will be confusing to make Cluster API a part of Trove which has different >>>> mission - database management and provisioning. >>> >>> Are you suggesting a 3rd program, cluster as a service? Trove is trying to >>> target a generic enough™ API to tackle different technologies with plugins >>> or some sort of extensions. This will include a scheduler to determine rack >>> awareness. Even if we decide that both Savanna and Trove need their own API >>> for building clusters, I still want to understand what makes the Savanna >>> API and implementation different, and how Trove can build an API/system >>> that can encompass multiple datastore technologies. So regardless of how >>> this shakes out, I would urge you to go to the Trove clustering summit >>> session [1] so we can share ideas. >> >> Kudos to Trove for pushing forward on their Heat implementation. I'd >> like to see Savannah go in the same direction. I read the "why not heat" >> and it is all a bug list for Heat. Lets fix those bugs so that the next >> clusterable solution that needs a simplified API can just grab Heat and >> get it done without a special domain specific orchestration backend. >> >> If the backend were shared, would we care so much that there is no common >> "clustering" imperative API for users? >> >> This way Savanna's API is focused on helping users solve their "data >> processing" problems, and Trove is focused on helping users solve their >> "data storage" problems. And if users need to build a cluster of things >> that don't exist yet as a handy simplified API, Heat is there for them >> as a general purpose tool for building clusters. >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev