Yeah, I was talking about create a ColumnFamily definition via the API. Not inserting data into an already defined column family. 

The recommened approach to creating your schema is via the build in bin/cassandra-cli command line tool. It has loads of build in help and here is an example of how to create a keyspace http://www.mail-archive.com/user@cassandra.apache.org/msg09146.html

Let me know how you get on. 
Aaron

On 26 Jan, 2011,at 02:28 AM, David McNelis <dmcne...@agentisenergy.com> wrote:

I'm fairly certain Aaron is referring to named families like BlogEntries, not named columns (i-got-a-new-guitar).  

On Tue, Jan 25, 2011 at 4:37 AM, Andy Burgess <andy.burg...@rbsworldpay.com> wrote:
Aaron,

A question about one of your general points, "do not create CF's on the fly" - what, exactly, does this mean? Do you mean named column families, like "BlogEntries" from Sam's example, or do you mean column family keys, like "i-got-a-new-guitar"? If it's the latter, then could you please explain why not to do this? My application is based around creating row keys on the fly, so I'd like to know ahead of time if I'm creating potential trouble for myself.

To be honest, if you do mean specifically column families and not column family keys, then I don't even understand how you would go about creating those on-the-fly anyway. Don't they have to be pre-configured in storage-conf.xml?

Thanks,
Andy.



On 25/01/11 00:39, Aaron Morton wrote:
Sam, 
The best advice is to jump in and try any schema If you are just starting out, start simple you're going to re-write it several times. Worry about scale later, in most cases it's going to work. 

Some general points:

- do not create CF's on the fly. 
- work out your common read requests and denormalise to support these, the writes will be fast enough. 
- try to get each read request to be resolved by reading from a single CF (not a rule, just a guideline)
- avoid big super columns. 


If you are happy with the one in the article start with that and see how it works with you app. See how it works for your read activities. 

Hope that helps. 
Aaron


On 25 Jan, 2011,at 12:47 PM, Sam Hodgson <hodgson_...@hotmail.com> wrote:

Hi all,

Im brand new to Cassandra - im migrating from MySql for a large forum site and would be grateful if anyone can give me some basic pointers on schema design, or any recommended documentation. 

The example used in http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model is very close if not exactly what I need for my main CF:
<!--
    ColumnFamily: BlogEntries
    This is where all the blog entries will go:

    Row Key +> post's slug (the seo friendly portion of the uri)
    Column Name: an attribute for the entry (title, body, etc)
    Column Value: value of the associated attribute

    Access: grab an entry by slug (always fetch all Columns for Row)

    fyi: tags is a denormalization... its a comma separated list of tags.
    im not using json in order to not interfere with our
    notation but obviously you could use anything as long as your app
    knows how to deal w/ it

    BlogEntries : { // CF
        i-got-a-new-guitar : { // row key - the unique "slug" of the entry.
            title: This is a blog entry about my new, awesome guitar,
            body: this is a cool entry. etc etc yada yada
            author: Arin Sarkissian  // a row key into the Authors CF
            tags: life,guitar,music  // comma sep list of tags (basic denormalization)
            pubDate: 1250558004      // unixtime for publish date
            slug: i-got-a-new-guitar
        },
        // all other entries
        another-cool-guitar : {
            ...
            tags: guitar,
            slug: another-cool-guitar
        },
        scream-is-the-best-movie-ever : {
            ..
            tags: movie,horror,
            slug: scream-is-the-best-movie-ever
        }
    }
-->
<ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>

How well would this scale? Say you are storing 5 million posts and looking to scale that up 
would it be better to segment them into several column families and if so to what extent? 

I could create column families to store posts for each category however i'd end up with thousands of CF's.  
Saying that the data would then be stored in a very sorted manner for querying/presenting.

My db is very write heavy and growing fast, Cassandra sounds like the best solution.
Any advice is greatly appreciated!! 

Thanks

Sam


-- 
Andy Burgess
Principal Development Engineer
Application Delivery
WorldPay Ltd.
270-289 Science Park, Milton Road
Cambridge, CB4 0WE, United Kingdom (Depot Code: 024)
Office: +44 (0)1223 706 779| Mobile: +44 (0)7909 534 940
andy.burg...@worldpay.com

WorldPay (UK) Limited, Company No. 07316500. Registered Office: 55 Mansell Street, London E1 8AN

Authorised and regulated by the Financial Services Authority.

‘WorldPay Group’ means WorldPay (UK) Limited and its affiliates from time to time.  A reference to an “affiliate” means any Subsidiary Undertaking, any Parent Undertaking and any Subsidiary Undertaking of any such Parent Undertaking and reference to a “Parent Undertaking” or a “Subsidiary Undertaking” is to be construed in accordance with section 1162 of the Companies Act 2006, as amended.

DISCLAIMER: This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from the WorldPay Group, are confidential and solely for the use of the intended recipient. If you are not the intended recipient (or authorised to receive for the intended recipient), you have received this email in error and any review, use, distribution or disclosure of its content is strictly prohibited. If you have received this email in error please notify the sender immediately by replying to this message. Please then delete this email and destroy any copies of it.

Messages sent to and from the WorldPay Group may be monitored to ensure compliance with internal policies and to protect our business.  Emails are not necessarily secure.  The WorldPay Group does not accept responsibility for changes made to this message after it was sent. Please note that neither the WorldPay Group nor the sender accepts any responsibility for viruses and it is the responsibility of the recipient to ensure that the onward transmission, opening or use of this message and any attachments will not adversely affect its systems or data. Anyone who communicates with us by email is taken to accept these risks. Opinions, conclusions and other information contained in this message that do not relate to the official business of the WorldPay Group shall not be understood as endorsed or given by it.




--
David McNelis
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143

A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.


Reply via email to