Re: Zookeeper, Giraph, and WebProtege

Sergio Fernández Fri, 03 Oct 2014 00:13:33 -0700

Hi Joshua,

On 01/10/14 16:00, Joshua Dunham wrote:

 It looks like there are quite a few options to configure the cluster.


Yes, you have the details at: http://marmotta.apache.org/platform/cloud

Can someone answer,
1. First let me clarify, the clustering options in Marmotta > Core > Settings > 
clustering.{address,backend,enabled,mode} need to be configured when using Zookeeper?

Zookeeper comes to complement the regular configuration for cloud-basedinstallations, where several nodes can read the same configuration.

Marmotta would expect the global configuration at /marmotta/config/* inZooKeeoper, although particular cluster configurations can be specifiedat /marmotta/clusters/:name/config/*. More details in the link providedabove.

For managing configuration in Zookeeper you may find useful this toolwritten by Thomas: https://bitbucket.org/srfgkmt/zoomanager

2. Which is the preferable backend? I’m not familiar with the pros/cons of the 
options but I think looking around at some docs that Hazlecast is a ‘safe’ good 
bet?

We currently support Guava and Ehcache for local caches, Hazelcast, andInfinispan for clusters. AFAIK currently Hazelcast is the most stableand tested one, and it's currently used in production.

3. There are three options for mode. Based on the description I would say that 
distributed is what I want but there is a third option ‘Replicated’ which is 
not described. What exactly does this do?


Yes, it accepts those three values:

* In LOCAL cache mode, the cache is not shared among the servers in acluster. Each machine keeps a local cache. This allows quick startupsand eliminates network traffic in the cluster, but subsequent requeststo different cluster members cannot benefit from the cached data.

* In DISTRIBUTED cache mode, the cluster forms a big hash table used asa cache. This allows to make efficient use of the large amount of memoryavailable.

* In REPLICATED cache model all nodes of the cluster hold a completecache that is automatically replicated. This makes more efficientoperations that require a traversal through the whole graph, such asSPARQL querying.

I think the decision about the mode depends more on the concrete needsand backend used.

4. What is the best way to set the address? I think this would depend on the 
backend mostly and also the network the server is in but I’m not sure what the 
rules are.

The port used for the cluster. Basically that's a mechanisms to avoidaddress clashing. Just use be sure it is available when you configure anew cluster. A value <= 0 will use the default port.

My datasets are too large to run on one instance I think and I would like to 
become familiar with the clustering options Marmotta offers. If I wanted to 
have N number of instances running, each has a portion of the total dataset is 
this possible? Ideally there is some sort of master that I query and it will 
collect the triples regardless of the server the data is on. I’ve seen the 
walkthrough at the Marmotta site but wanted to see if that will get me where 
I’d like to be. :)


That's exactly the idea. Just provide sufficient resources for the database.

I also found the Apache Giraph project which claims to offer native node/edge 
processing for graph databases. Has anyone used this? I would be *very* 
interested to play around if it could connect to Marmotta.

We have a experimental backend that uses Titan DB. If you'd be great ifsomeone could evolve Marmotta in that direction!

Lastly, What are people using to manage there ontologies? I found Protege a 
while back and installed WebProtege to manage ontologies. Is it possible that 
it connects to marmotta to keep the ontology synchronized? Are there any cool 
things WebProtege (or any ontology manager) can do with Marmotta?

Sorry, I'm not familiar with WebProtege. It just needs to implement awriting method compatible with Marmotta (file, REST, SPARQL or LDP), andthen you can you have it.

If you juist need SKOS, this other tool can be relevant for you:https://github.com/tkurz/skosjs . It just needs a SPARQL 1.1 endpoint toedit your thesauri. The same workflow more or less would need to be inplace if you want to use WebProtege.


Hope that helps.

Cheers,

--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernan...@redlink.co
w: http://redlink.co

Re: Zookeeper, Giraph, and WebProtege

Reply via email to