Each row can have a maximum of 2 billion columns, which a logging system will probably hit eventually.
More importantly, you'll only have 1 row per set of system logs. Every row is stored on the same machine(s), which you means you'll definitely not be able to distribute your load very well. ________________________________________ From: Bill Speirs [bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 1:23 PM To: user@cassandra.apache.org Subject: Re: Schema Design I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time? 2) maybe it's just a restriction of the CLI, but how do I do issue a slice request? Also, what if start (or end) columns don't exist? I'm guessing it's smart enough to get the columns in that range. Thanks! Bill- On Wed, Jan 26, 2011 at 4:12 PM, David McNelis <dmcne...@agentisenergy.com> wrote: > I would say in that case you might want to try a single column family > where the key to the column is the system name. > Then, you could name your columns as the timestamp. Then when retrieving > information from the data store you can can, in your slice request, specify > your start column as X and end column as Y. > Then you can use the stored column name to know when an event occurred. > > On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs <bill.spe...@gmail.com> wrote: >> >> I'm looking to use Cassandra to store log messages from various >> systems. A log message only has a message (UTF8Type) and a data/time. >> My thought is to create a column family for each system. The row key >> will be a TimeUUIDType. Each row will have 7 columns: year, month, >> day, hour, minute, second, and message. I then have indexes setup for >> each of the date/time columns. >> >> I was hoping this would allow me to answer queries like: "What are all >> the log messages that were generated between X & Y?" The problem is >> that I can ONLY use the equals operator on these column values. For >> example, I cannot issuing: get system_x where month > 1; gives me this >> error: "No indexed columns present in index clause with operator EQ." >> The equals operator works as expected though: get system_x where month >> = 1; >> >> What schema would allow me to get date ranges? >> >> Thanks in advance... >> >> Bill- >> >> * ColumnFamily description * >> ColumnFamily: system_x_msg >> Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >> Row cache size / save period: 0.0/0 >> Key cache size / save period: 200000.0/3600 >> Memtable thresholds: 1.1671875/249/60 >> GC grace seconds: 864000 >> Compaction min/max thresholds: 4/32 >> Read repair chance: 1.0 >> Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572, >> proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468, >> proj_1_msg.7365636f6e64, proj_1_msg.79656172] >> Column Metadata: >> Column Name: year (year) >> Validation Class: org.apache.cassandra.db.marshal.IntegerType >> Index Type: KEYS >> Column Name: month (month) >> Validation Class: org.apache.cassandra.db.marshal.IntegerType >> Index Type: KEYS >> Column Name: second (second) >> Validation Class: org.apache.cassandra.db.marshal.IntegerType >> Index Type: KEYS >> Column Name: minute (minute) >> Validation Class: org.apache.cassandra.db.marshal.IntegerType >> Index Type: KEYS >> Column Name: hour (hour) >> Validation Class: org.apache.cassandra.db.marshal.IntegerType >> Index Type: KEYS >> Column Name: day (day) >> Validation Class: org.apache.cassandra.db.marshal.IntegerType >> Index Type: KEYS > > > > -- > David McNelis > Lead Software Engineer > Agentis Energy > www.agentisenergy.com > o: 630.359.6395 > c: 219.384.5143 > A Smart Grid technology company focused on helping consumers of energy > control an often under-managed resource. > >