Others on this list may know better, but it might not be good to have all your
clients create a schema on initialization. They can all use it once it has been
created, but creating it should be done by a single entity. One of the issues
that can come up if you make changes to schemas at the same time is schema
disagreement between cassandra instances. Read more about them and how to fix
them here:

http://wiki.apache.org/cassandra/FAQ#schema_disagreement

Is option 1 particularly difficult to do in your particular application?

Faraaz

On Wed, May 22, 2013 at 05:31:08PM -0700, Emalayan Vairavanathan wrote:
> Hi all,
> 
> I am implementing a distributed application which runs on 100s of machines
> concurrently. This application is going to use Cassandra as underlaying
> storage.
> 
> The application creates the schema (name space and column families) during
> initialization phase.  It seems I have two options to create the schema.
> 
> Option - 1 : Using a single node for schema creation.
>         Option - 2: Having all the nodes (> 100) to run the same schema
> creation logic (First, nodes will check whether the schema is already 
> available
> and then try to create the schema if it is not available already).  
> 
> To keep the initialization phase simple, I prefer to go for Option - 2. 
> However
> I am not sure how Cassandra is going to behave if multiple nodes try to create
> the same schema (namespace and column families) concurrently. It would be nice
> if someone can tell me about the implications of Option - 2 with Cassandra
> version 1.2.2.
> 
> Please let me know if you have question.
> 
> Thank you
> VE
> 
> 
> 
> 
>  

Reply via email to