> On Sep 18, 2013, at 4:53 PM, Sergey Lukjanov <slukja...@mirantis.com> wrote:
> 
> Hi folks,
> 
> I have few comments on Hadoop cluster provisioning in Savanna.
> 
> Now Savanna provisions instances, install management console (like Apache 
> Ambari) on one them and communicate with it using REST API of the installed 
> console to prepare and run all requested services at all instances. So, the 
> only provisioning that we're doing in Savanna is the instance, volumes 
> creation and their initial configuration like /etc/hosts generation for all 
> instances. The most part of these operations or even all of them should be 
> eventually removed by Heat integration during the potential incubation in 
> Icehouse cycle, so, after it we'll be concentrated at EDP (Elastic Data 
> Processing) operations.
> 
> I was surprised how much time was spent on clustering discussion at the last 
> TC meeting and that there was a small amount of other questions. So, I think 
> that it'll be better to separate clustering discussion that is a long-term 
> activity with plans to be discussed during the design summit and Savanna 
> incubation request that should be finally discussed at the next TC meeting. 
> Of course, I think that it's a right way for Savanna to participate 
> clustering discussions. From our perspective, clustering should be 
> implemented as additional functionality in underlying services like Nova, 
> Cinder, Heat and libraries - Oslo, Taskflow, that will help projects like 
> Savanna, Trove and etc. to provisioning resources for clusters, scale and 
> terminate them. So, our role in it is to collaborate on such features 
> implementation. One more interesting idea - clustering API standardization, 
> it sounds interesting, but it looks like that such APIs could be very 
> different, for example, our current working API [0] and Trove's draft for 
> Cluster API [1].

Draft APIs are subject to change :) until we put the code in place I would be 
ok modifying the API. +1 to working together at the summit to bring the API 
differences together. We have a trove clustering session and I'd LOVE to have 
savanna folk at it. Lets unify ideas!!

> 
> I also would like to ensure that Savanna team is 100% behind the idea of 
> doing full integration with all applicable OpenStack projects during 
> incubation.
> 
> Thanks.
> 
> [0] 
> https://savanna.readthedocs.org/en/latest/userdoc/rest_api_v1.0.html#node-group-templates
> [1] https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API
> 
> Sincerely yours,
> Sergey Lukjanov
> Savanna Technical Lead
> Mirantis Inc.
> 
>> On Sep 13, 2013, at 22:35, Clint Byrum <cl...@fewbar.com> wrote:
>> 
>> Excerpts from Michael Basnight's message of 2013-09-13 08:26:07 -0700:
>>>> On Sep 13, 2013, at 6:56 AM, Alexander Kuznetsov wrote:
>>>> On Thu, Sep 12, 2013 at 7:30 PM, Michael Basnight <mbasni...@gmail.com> 
>>>> wrote:
>>>> On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote:
>>>> 
>>>>> Sergey Lukjanov wrote:
>>>>> 
>>>>>> [...]
>>>>>> As you can see, resources provisioning is just one of the features and 
>>>>>> the implementation details are not critical for overall architecture. It 
>>>>>> performs only the first step of the cluster setup. We’ve been 
>>>>>> considering Heat for a while, but ended up direct API calls in favor of 
>>>>>> speed and simplicity. Going forward Heat integration will be done by 
>>>>>> implementing extension mechanism [3] and [4] as part of Icehouse release.
>>>>>> 
>>>>>> The next part, Hadoop cluster configuration, already extensible and we 
>>>>>> have several plugins - Vanilla, Hortonworks Data Platform and Cloudera 
>>>>>> plugin started too. This allow to unify management of different Hadoop 
>>>>>> distributions under single control plane. The plugins are responsible 
>>>>>> for correct Hadoop ecosystem configuration at already provisioned 
>>>>>> resources and use different Hadoop management tools like Ambari to setup 
>>>>>> and configure all cluster  services, so, there are no actual 
>>>>>> provisioning configs on Savanna side in this case. Savanna and its 
>>>>>> plugins encapsulate the knowledge of Hadoop internals and default 
>>>>>> configuration for Hadoop services.
>>>>> 
>>>>> My main gripe with Savanna is that it combines (in its upcoming release)
>>>>> what sounds like to me two very different services: Hadoop cluster
>>>>> provisioning service (like what Trove does for databases) and a
>>>>> MapReduce+ data API service (like what Marconi does for queues).
>>>>> 
>>>>> Making it part of the same project (rather than two separate projects,
>>>>> potentially sharing the same program) make discussions about shifting
>>>>> some of its clustering ability to another library/project more complex
>>>>> than they should be (see below).
>>>>> 
>>>>> Could you explain the benefit of having them within the same service,
>>>>> rather than two services with one consuming the other ?
>>>> 
>>>> And for the record, i dont think that Trove is the perfect fit for it 
>>>> today. We are still working on a clustering API. But when we create it, i 
>>>> would love the Savanna team's input, so we can try to make a pluggable API 
>>>> thats usable for people who want MySQL or Cassandra or even Hadoop. Im 
>>>> less a fan of a clustering library, because in the end, we will both have 
>>>> API calls like POST /clusters, GET /clusters, and there will be API 
>>>> duplication between the projects.
>>>> 
>>>> I think that Cluster API (if it would be created) will be helpful not only 
>>>> for Trove and Savanna.  NoSQL, RDBMS and Hadoop are not unique software 
>>>> which can be clustered. What about different kind of messaging solutions 
>>>> like RabbitMQ, ActiveMQ or J2EE containers like JBoss, Weblogic and 
>>>> WebSphere, which often are installed in clustered mode. Messaging, 
>>>> databases, J2EE containers and Hadoop have their own management cycle. It 
>>>> will be confusing to make Cluster API a part of Trove which has different 
>>>> mission - database management and provisioning.
>>> 
>>> Are you suggesting a 3rd program, cluster as a service? Trove is trying to 
>>> target a generic enough™ API to tackle different technologies with plugins 
>>> or some sort of extensions. This will include a scheduler to determine rack 
>>> awareness. Even if we decide that both Savanna and Trove need their own API 
>>> for building clusters, I still want to understand what makes the Savanna 
>>> API and implementation different, and how Trove can build an API/system 
>>> that can encompass multiple datastore technologies. So regardless of how 
>>> this shakes out, I would urge you to go to the Trove clustering summit 
>>> session [1] so we can share ideas.
>> 
>> Kudos to Trove for pushing forward on their Heat implementation. I'd
>> like to see Savannah go in the same direction. I read the "why not heat"
>> and it is all a bug list for Heat. Lets fix those bugs so that the next
>> clusterable solution that needs a simplified API can just grab Heat and
>> get it done without a special domain specific orchestration backend.
>> 
>> If the backend were shared, would we care so much that there is no common
>> "clustering" imperative API for users?
>> 
>> This way Savanna's API is focused on helping users solve their "data
>> processing" problems, and Trove is focused on helping users solve their
>> "data storage" problems. And if users need to build a cluster of things
>> that don't exist yet as a handy simplified API, Heat is there for them
>> as a general purpose tool for building clusters.
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to