Hi Joshua,

On 01/10/14 16:00, Joshua Dunham wrote:
 It looks like there are quite a few options to configure the cluster.

Yes, you have the details at: http://marmotta.apache.org/platform/cloud

Can someone answer,
1. First let me clarify, the clustering options in Marmotta > Core > Settings > 
clustering.{address,backend,enabled,mode} need to be configured when using Zookeeper?

Zookeeper comes to complement the regular configuration for cloud-based installations, where several nodes can read the same configuration.

Marmotta would expect the global configuration at /marmotta/config/* in ZooKeeoper, although particular cluster configurations can be specified at /marmotta/clusters/:name/config/*. More details in the link provided above.

For managing configuration in Zookeeper you may find useful this tool written by Thomas: https://bitbucket.org/srfgkmt/zoomanager

2. Which is the preferable backend? I’m not familiar with the pros/cons of the 
options but I think looking around at some docs that Hazlecast is a ‘safe’ good 
bet?

We currently support Guava and Ehcache for local caches, Hazelcast, and Infinispan for clusters. AFAIK currently Hazelcast is the most stable and tested one, and it's currently used in production.

3. There are three options for mode. Based on the description I would say that 
distributed is what I want but there is a third option ‘Replicated’ which is 
not described. What exactly does this do?

Yes, it accepts those three values:

* In LOCAL cache mode, the cache is not shared among the servers in a cluster. Each machine keeps a local cache. This allows quick startups and eliminates network traffic in the cluster, but subsequent requests to different cluster members cannot benefit from the cached data.

* In DISTRIBUTED cache mode, the cluster forms a big hash table used as a cache. This allows to make efficient use of the large amount of memory available.

* In REPLICATED cache model all nodes of the cluster hold a complete cache that is automatically replicated. This makes more efficient operations that require a traversal through the whole graph, such as SPARQL querying.

I think the decision about the mode depends more on the concrete needs and backend used.

4. What is the best way to set the address? I think this would depend on the 
backend mostly and also the network the server is in but I’m not sure what the 
rules are.

The port used for the cluster. Basically that's a mechanisms to avoid address clashing. Just use be sure it is available when you configure a new cluster. A value <= 0 will use the default port.

My datasets are too large to run on one instance I think and I would like to 
become familiar with the clustering options Marmotta offers. If I wanted to 
have N number of instances running, each has a portion of the total dataset is 
this possible? Ideally there is some sort of master that I query and it will 
collect the triples regardless of the server the data is on. I’ve seen the 
walkthrough at the Marmotta site but wanted to see if that will get me where 
I’d like to be. :)

That's exactly the idea. Just provide sufficient resources for the database.

I also found the Apache Giraph project which claims to offer native node/edge 
processing for graph databases. Has anyone used this? I would be *very* 
interested to play around if it could connect to Marmotta.

We have a experimental backend that uses Titan DB. If you'd be great if someone could evolve Marmotta in that direction!

Lastly, What are people using to manage there ontologies? I found Protege a 
while back and installed WebProtege to manage ontologies. Is it possible that 
it connects to marmotta to keep the ontology synchronized? Are there any cool 
things WebProtege (or any ontology manager) can do with Marmotta?

Sorry, I'm not familiar with WebProtege. It just needs to implement a writing method compatible with Marmotta (file, REST, SPARQL or LDP), and then you can you have it.

If you juist need SKOS, this other tool can be relevant for you: https://github.com/tkurz/skosjs . It just needs a SPARQL 1.1 endpoint to edit your thesauri. The same workflow more or less would need to be in place if you want to use WebProtege.

Hope that helps.

Cheers,

--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 660 2747 925
e: sergio.fernan...@redlink.co
w: http://redlink.co

Reply via email to