RE: Schema Design

Shu Zhang Wed, 26 Jan 2011 15:54:51 -0800

Each row can have a maximum of 2 billion columns, which a logging system will 
probably hit eventually.


More importantly, you'll only have 1 row per set of system logs. Every row is 
stored on the same machine(s), which you means you'll definitely not be able to 
distribute your load very well.
________________________________________
From: Bill Speirs [bill.spe...@gmail.com]
Sent: Wednesday, January 26, 2011 1:23 PM
To: user@cassandra.apache.org
Subject: Re: Schema Design

I like this approach, but I have 2 questions:

1) what is the implications of continually adding columns to a single
row? I'm unsure how Cassandra is able to grow. I realize you can have
a virtually infinite number of columns, but what are the implications
of growing the number of columns over time?

2) maybe it's just a restriction of the CLI, but how do I do issue a
slice request? Also, what if start (or end) columns don't exist? I'm
guessing it's smart enough to get the columns in that range.

Thanks!

Bill-

On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
<dmcne...@agentisenergy.com> wrote:
> I would say in that case you might want  to try a  single column family
> where the key to the column is the system name.
> Then, you could name your columns as the timestamp.  Then when retrieving
> information from the data store you can can, in your slice request, specify
> your start column as  X and end  column as Y.
> Then you can use the stored column name to know when an event  occurred.
>
> On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs <bill.spe...@gmail.com> wrote:
>>
>> I'm looking to use Cassandra to store log messages from various
>> systems. A log message only has a message (UTF8Type) and a data/time.
>> My thought is to create a column family for each system. The row key
>> will be a TimeUUIDType. Each row will have 7 columns: year, month,
>> day, hour, minute, second, and message. I then have indexes setup for
>> each of the date/time columns.
>>
>> I was hoping this would allow me to answer queries like: "What are all
>> the log messages that were generated between X & Y?" The problem is
>> that I can ONLY use the equals operator on these column values. For
>> example, I cannot issuing: get system_x where month > 1; gives me this
>> error: "No indexed columns present in index clause with operator EQ."
>> The equals operator works as expected though: get system_x where month
>> = 1;
>>
>> What schema would allow me to get date ranges?
>>
>> Thanks in advance...
>>
>> Bill-
>>
>> * ColumnFamily description *
>>    ColumnFamily: system_x_msg
>>      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>      Row cache size / save period: 0.0/0
>>      Key cache size / save period: 200000.0/3600
>>      Memtable thresholds: 1.1671875/249/60
>>      GC grace seconds: 864000
>>      Compaction min/max thresholds: 4/32
>>      Read repair chance: 1.0
>>      Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
>> proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
>> proj_1_msg.7365636f6e64, proj_1_msg.79656172]
>>      Column Metadata:
>>        Column Name: year (year)
>>          Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>          Index Type: KEYS
>>        Column Name: month (month)
>>          Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>          Index Type: KEYS
>>        Column Name: second (second)
>>          Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>          Index Type: KEYS
>>        Column Name: minute (minute)
>>          Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>          Index Type: KEYS
>>        Column Name: hour (hour)
>>          Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>          Index Type: KEYS
>>        Column Name: day (day)
>>          Validation Class: org.apache.cassandra.db.marshal.IntegerType
>>          Index Type: KEYS
>
>
>
> --
> David McNelis
> Lead Software Engineer
> Agentis Energy
> www.agentisenergy.com
> o: 630.359.6395
> c: 219.384.5143
> A Smart Grid technology company focused on helping consumers of energy
> control an often under-managed resource.
>
>

RE: Schema Design

Reply via email to