Hadoop Ecosystem is not only datastore technologies. Hadoop has other components: Map Reduce framework, distributed coordinator - Zookepeer, workflow management - Oozie, runtime for scripting languages - Hive and Pig, scalable machine learning library - Apache Mahout. All this components are tightly coupled together and datastore part can't be considered separately, from other component. This is a the main reason why for Hadoop installation and management are required a separate solution, distinct from generic enough™ datastore API. In the other case, this API will contain a huge part, not relating to datastore technologies.
On Fri, Sep 13, 2013 at 8:17 PM, Michael Basnight <mbasni...@gmail.com>wrote: > > On Sep 13, 2013, at 9:05 AM, Alexander Kuznetsov wrote: > > > > > > > > > On Fri, Sep 13, 2013 at 7:26 PM, Michael Basnight <mbasni...@gmail.com> > wrote: > > On Sep 13, 2013, at 6:56 AM, Alexander Kuznetsov wrote: > > > On Thu, Sep 12, 2013 at 7:30 PM, Michael Basnight <mbasni...@gmail.com> > wrote: > > > On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote: > > > > > > > Sergey Lukjanov wrote: > > > > > > > >> [...] > > > >> As you can see, resources provisioning is just one of the features > and the implementation details are not critical for overall architecture. > It performs only the first step of the cluster setup. We’ve been > considering Heat for a while, but ended up direct API calls in favor of > speed and simplicity. Going forward Heat integration will be done by > implementing extension mechanism [3] and [4] as part of Icehouse release. > > > >> > > > >> The next part, Hadoop cluster configuration, already extensible and > we have several plugins - Vanilla, Hortonworks Data Platform and Cloudera > plugin started too. This allow to unify management of different Hadoop > distributions under single control plane. The plugins are responsible for > correct Hadoop ecosystem configuration at already provisioned resources and > use different Hadoop management tools like Ambari to setup and configure > all cluster services, so, there are no actual provisioning configs on > Savanna side in this case. Savanna and its plugins encapsulate the > knowledge of Hadoop internals and default configuration for Hadoop services. > > > > > > > > My main gripe with Savanna is that it combines (in its upcoming > release) > > > > what sounds like to me two very different services: Hadoop cluster > > > > provisioning service (like what Trove does for databases) and a > > > > MapReduce+ data API service (like what Marconi does for queues). > > > > > > > > Making it part of the same project (rather than two separate > projects, > > > > potentially sharing the same program) make discussions about shifting > > > > some of its clustering ability to another library/project more > complex > > > > than they should be (see below). > > > > > > > > Could you explain the benefit of having them within the same service, > > > > rather than two services with one consuming the other ? > > > > > > And for the record, i dont think that Trove is the perfect fit for it > today. We are still working on a clustering API. But when we create it, i > would love the Savanna team's input, so we can try to make a pluggable API > thats usable for people who want MySQL or Cassandra or even Hadoop. Im less > a fan of a clustering library, because in the end, we will both have API > calls like POST /clusters, GET /clusters, and there will be API duplication > between the projects. > > > > > > I think that Cluster API (if it would be created) will be helpful not > only for Trove and Savanna. NoSQL, RDBMS and Hadoop are not unique > software which can be clustered. What about different kind of messaging > solutions like RabbitMQ, ActiveMQ or J2EE containers like JBoss, Weblogic > and WebSphere, which often are installed in clustered mode. Messaging, > databases, J2EE containers and Hadoop have their own management cycle. It > will be confusing to make Cluster API a part of Trove which has different > mission - database management and provisioning. > > > > Are you suggesting a 3rd program, cluster as a service? Trove is trying > to target a generic enough™ API to tackle different technologies with > plugins or some sort of extensions. This will include a scheduler to > determine rack awareness. Even if we decide that both Savanna and Trove > need their own API for building clusters, I still want to understand what > makes the Savanna API and implementation different, and how Trove can build > an API/system that can encompass multiple datastore technologies. So > regardless of how this shakes out, I would urge you to go to the Trove > clustering summit session [1] so we can share ideas. > > > > Generic enough™ API shouldn't contain a database specific calls like > backups and restore (already in Trove). Why we need a backup and restore > operations for J2EE or messaging solutions? > > I dont mean to encompass J2EE or messaging solutions. Let me amend my > email to say "to tackle different datastore technologies". But going with > this point… Do you not need to backup things in a J2EE container? Id assume > a backup is needed by all clusters, personally. I would not like a system > that didnt have a way to backup and restore "things" in my cluster. > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev