batch mutation : how to delete whole row?

2010-05-26 Thread gabriele renzi
Hi everyone,

in our test code we perform a dummy "clear" by reading all the rows
and deleting them (while waiting for cassandra 0.7 & CASSANDRA-531).
A couple of days ago I updated our code to perform this operation
using batchMutate, but there seem to be no way to perform a deletion
of the whole row, only columns.


The org.apache.cassandra.thrift.Deletion object can be used with a
slice predicate but if I use an empty SlicePredicate there is the
obvious validation error of missing either a range or a list of column
names.

Is it correct that I cannot perform a row delete via batchMutation, or
is there another way (apart from reading all the column names and
adding multiple deletions per row)?
Would it make sense to allow a Deletion object to refer to a row, or
even better a RowDeletion(key) class?


AFAICT there is no underlying technical blocker, but I may be wrong as usual :)


-- 
blog en: http://www.riffraff.info
blog it: http://riffraff.blogsome.com


Re: batch mutation : how to delete whole row?

2010-05-26 Thread Sylvain Lebresne
This has been fixed in 0.7
(https://issues.apache.org/jira/browse/CASSANDRA-1027).
Not sure this has been merged in 0.6 though.

On Wed, May 26, 2010 at 9:05 AM, gabriele renzi  wrote:
> Hi everyone,
>
> in our test code we perform a dummy "clear" by reading all the rows
> and deleting them (while waiting for cassandra 0.7 & CASSANDRA-531).
> A couple of days ago I updated our code to perform this operation
> using batchMutate, but there seem to be no way to perform a deletion
> of the whole row, only columns.
>
>
> The org.apache.cassandra.thrift.Deletion object can be used with a
> slice predicate but if I use an empty SlicePredicate there is the
> obvious validation error of missing either a range or a list of column
> names.
>
> Is it correct that I cannot perform a row delete via batchMutation, or
> is there another way (apart from reading all the column names and
> adding multiple deletions per row)?
> Would it make sense to allow a Deletion object to refer to a row, or
> even better a RowDeletion(key) class?
>
>
> AFAICT there is no underlying technical blocker, but I may be wrong as usual 
> :)
>
>
> --
> blog en: http://www.riffraff.info
> blog it: http://riffraff.blogsome.com
>


Re: batch mutation : how to delete whole row?

2010-05-26 Thread Mishail
You could either use 1 remove(keyspace, key, column_path, timestamp,
consistency_level) per aech key, or wait till
https://issues.apache.org/jira/browse/CASSANDRA-494 fixed (to use
SliceRange in the Deletion)

gabriele renzi wrote:
> 
> Is it correct that I cannot perform a row delete via batchMutation, or
> is there another way (apart from reading all the column names and
> adding multiple deletions per row)?
> Would it make sense to allow a Deletion object to refer to a row, or
> even better a RowDeletion(key) class?
> 
> 
> AFAICT there is no underlying technical blocker, but I may be wrong as usual 
> :)
> 
> 




Re: batch mutation : how to delete whole row?

2010-05-26 Thread gabriele renzi
On Wed, May 26, 2010 at 9:54 AM, Mishail  wrote:
> You could either use 1 remove(keyspace, key, column_path, timestamp,
> consistency_level) per aech key, or wait till
> https://issues.apache.org/jira/browse/CASSANDRA-494 fixed (to use
> SliceRange in the Deletion)

thanks, I'm already doing that but sending 2k commands when one would
suffice is clearly a bit wasteful :)

CASSANDRA-1027 seems good enough, and pretty trivial.

Thanks both for your answers.


Re: Order Preserving Partitioner

2010-05-26 Thread David Boxenhorn
Just in case you don't know: You can do range searches on keys even with
Random Partitioner, you just won't get the results in order. If this is good
enough for you (e.g. if you can order the results on the client, or if you
just need to get the right answer, but not the right order), then you should
use Random Partitioner.

(I bring this up because it confused me until recently.)

On Wed, May 26, 2010 at 5:14 AM, Steve Lihn  wrote:

> I have a question on using Order Preserving Partitioner.
>
> Many rowKeys in my system will be related to dates, so it seems natural to
> use Order Preserving Partitioner instead of the default Random Partitioner.
> However, I have been warned that special attention has to be applied for
> Order Preserving Partitioner to work properly (basically to ensure a good
> key distribution and avoid "hot spot") and reverting it back to Random may
> not be easy. Also not every rowKey is related to dates, for these, using
> Random Partitioner is okay, but there is only one place to set Partitioner.
>
> (Note: The intension of this warning is actually to discredit Cassandra and
> persuade me not to use it.)
>
> It seems the choice of Partitioner is defined in the storage-conf.xml and
> is a global property. My question why does it have to be a global property?
> Is there a future plan to make it customizable per KeySpace (just like you
> would choose hash or range partition for different table/data in RDBMS) ?
>
> Thanks,
> Steve
>


Questions regarding batch mutates and transactions

2010-05-26 Thread Todd Nine
Hey guys,
  I originally asked this on the Hector group, but no one was sure of the
answer.  Can I get some feedback on this.  I'd prefer to avoid having to use
something like Cages if I can for most of our use cases.  Long term I can
see we'll need to use something like Cages, especially when it comes to
complex operations such as billing.  However for a majority of our uses, I
think it's a bit overkill.  I've used transactions heavily in the workplace
on SQL based app developments.  To be honest, a majority of application's
I've built utilize optimistic locking, and only the atomic, consistent, and
durable functionality of transactional ACID properties.

To encapsulate all 3, I essentially need all writes to cassandra for a given
business invocation to occur in a single write.  With Spring, I would
implement my own transaction manager which simply adds all mutates and
delete ops to a batch mutate.  When my transaction commits, I would execute
the mutation on the given keyspace.  Now this would only work if the
following semantics apply.  I've tried searching for details in Cassandra's
batch mutate, but I'm not finding what I need.  Here are 2 use cases as an
example.

Case 1: Successful update : User adds new contact

Transaction Start.
Biz op 1. Row is created in  "contacts" and all data is added via batch
mutation
Biz op 2. Row is created for an SMS message is created for queueing  through
the SMS gateway
return op 2
return op 1
Transaction Commit (batch mutate executed)

Case 2. Failed update: User adds new contact

Biz op 1. Row is created in "contacts"
Biz op 2. Row is created for SMS message queuing.  Fails due to invalid
international phone number format
return op 2
return op 1
Transaction is rolled back (batch mutate never executed)


Now, here is where I can't find what I need in the doc.  In case 1, if my
mutation from biz op 2 were to fail during a batch mutate operation
encapsulating all mutations, does the batch mutation as a whole not get
executed, or would I still have the mutation from op 1 written to cassandra
while the op 2 write fails?

Thanks,


nodetool move looks stuck

2010-05-26 Thread Ran Tavory
I ran nodetool move on one of the nodes and it seems stuck for a few hours
now.

I've been able to run it successfully in the past, but this time it looks
stuck.

Streams shows as if there's work in progress, but the same files have been
at the same position for a few hours.
I've also checked the compaction manager through jmx and it's not compacting
anything.

Anything I can do to continue the move operation? Is this a bug?
thanks


$ nodetool -h 192.168.254.58 -p 9004 streams
Mode: Normal
 Nothing streaming to /192.168.252.124
Streaming to: /192.168.254.57
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvImpressions-170-Filter.db
0/45587965
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvImpressions-170-Data.db
0/18369406636
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvRatings-142-Index.db
0/6163411
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvRatings-142-Filter.db
0/85645
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvRatings-142-Data.db
0/37985032
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvAds-108-Index.db
0/77307418
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvAds-108-Filter.db
0/1436005
   /outbrain/cassandra/data/outbrain_kvdb/stream/KvAds-108-Data.db
0/704736670
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-40-Index.db 0/230280
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-40-Filter.db 0/20605
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-40-Data.db
0/202808000
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-41-Index.db 0/23912
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-41-Filter.db 0/20605
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-41-Data.db 0/21434973
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-42-Index.db 0/965028
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-42-Filter.db 0/20605
   /outbrain/cassandra/data/Keyspace1/stream/Standard1-42-Data.db
0/865380205
Not receiving any streams.

$ nodetool -h 192.168.254.57 -p 9004 streams
Mode: Normal
 Nothing streaming to /192.168.254.58
Streaming from: /192.168.254.58
   outbrain_kvdb:
/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-tmp-1081-Data.db
0/18369406636
   outbrain_kvdb:
/outbrain/cassandra/data/outbrain_kvdb/KvRatings-tmp-167-Index.db 0/6163411
   outbrain_kvdb:
/outbrain/cassandra/data/outbrain_kvdb/KvRatings-tmp-167-Filter.db 0/85645
   outbrain_kvdb:
/outbrain/cassandra/data/outbrain_kvdb/KvRatings-tmp-167-Data.db 0/37985032
   outbrain_kvdb:
/outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-123-Index.db 0/77307418
   outbrain_kvdb:
/outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-123-Filter.db 0/1436005
   outbrain_kvdb:
/outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-123-Data.db 0/704736670
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-21-Index.db
0/230280
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-21-Filter.db
0/20605
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-21-Data.db
0/202808000
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-22-Index.db
0/23912
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-22-Filter.db
0/20605
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-22-Data.db
0/21434973
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-23-Index.db
0/965028
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-23-Filter.db
0/20605
   Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-23-Data.db
0/865380205


Re: Questions regarding batch mutates and transactions

2010-05-26 Thread Ran Tavory
The summary of your question is: is batch_mutate atomic in the general
sense, meaning when used with multiple keys, multiple column families etc,
correct?

On Wed, May 26, 2010 at 12:45 PM, Todd Nine  wrote:

> Hey guys,
>   I originally asked this on the Hector group, but no one was sure of the
> answer.  Can I get some feedback on this.  I'd prefer to avoid having to use
> something like Cages if I can for most of our use cases.  Long term I can
> see we'll need to use something like Cages, especially when it comes to
> complex operations such as billing.  However for a majority of our uses, I
> think it's a bit overkill.  I've used transactions heavily in the workplace
> on SQL based app developments.  To be honest, a majority of application's
> I've built utilize optimistic locking, and only the atomic, consistent, and
> durable functionality of transactional ACID properties.
>
> To encapsulate all 3, I essentially need all writes to cassandra for a
> given business invocation to occur in a single write.  With Spring, I would
> implement my own transaction manager which simply adds all mutates and
> delete ops to a batch mutate.  When my transaction commits, I would execute
> the mutation on the given keyspace.  Now this would only work if the
> following semantics apply.  I've tried searching for details in Cassandra's
> batch mutate, but I'm not finding what I need.  Here are 2 use cases as an
> example.
>
> Case 1: Successful update : User adds new contact
>
> Transaction Start.
> Biz op 1. Row is created in  "contacts" and all data is added via batch
> mutation
>  Biz op 2. Row is created for an SMS message is created for queueing
>  through the SMS gateway
> return op 2
> return op 1
> Transaction Commit (batch mutate executed)
>
> Case 2. Failed update: User adds new contact
>
> Biz op 1. Row is created in "contacts"
> Biz op 2. Row is created for SMS message queuing.  Fails due to invalid
> international phone number format
> return op 2
> return op 1
> Transaction is rolled back (batch mutate never executed)
>
>
> Now, here is where I can't find what I need in the doc.  In case 1, if my
> mutation from biz op 2 were to fail during a batch mutate operation
> encapsulating all mutations, does the batch mutation as a whole not get
> executed, or would I still have the mutation from op 1 written to cassandra
> while the op 2 write fails?
>
> Thanks,
>


Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Utku Can Topçu
Hey All,

Assume I have two ColumnFamilies in the same keyspace and I want to move or
copy a range of columns (defined by a keyrange) into another columnfamily.

Do you think it's somehow possible and doable with the current support of
the API, if so how?

Best Regards,
Utku


RE: Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Dop Sun
There are no single API call to achieve this.

 

It’s read and write, plus a delete (if move) API calls I guess.

 

From: Utku Can Topçu [mailto:u...@topcu.gen.tr] 
Sent: Wednesday, May 26, 2010 9:09 PM
To: user@cassandra.apache.org
Subject: Moving/copying columns in between ColumnFamilies

 

Hey All,

Assume I have two ColumnFamilies in the same keyspace and I want to move or 
copy a range of columns (defined by a keyrange) into another columnfamily.

Do you think it's somehow possible and doable with the current support of the 
API, if so how?

Best Regards,
Utku



Re: Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Utku Can Topçu
Sorry I now realized that I used the wrong terminology.

What I really meant was, moving or copying the ROWS defined by a KeyRange in
between ColumnFamilies.
Do you think it's doable with an efficient way?

On Wed, May 26, 2010 at 3:14 PM, Dop Sun  wrote:

>  There are no single API call to achieve this.
>
>
>
> It’s read and write, plus a delete (if move) API calls I guess.
>
>
>
> *From:* Utku Can Topçu [mailto:u...@topcu.gen.tr]
> *Sent:* Wednesday, May 26, 2010 9:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Moving/copying columns in between ColumnFamilies
>
>
>
> Hey All,
>
> Assume I have two ColumnFamilies in the same keyspace and I want to move or
> copy a range of columns (defined by a keyrange) into another columnfamily.
>
> Do you think it's somehow possible and doable with the current support of
> the API, if so how?
>
> Best Regards,
> Utku
>


RE: Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Dop Sun
In Thrift API, I guess you need to use read/ insert and then delete to 
implement the move action.

 

If you can shut the Cassandra down, maybe you can try to sstable2json to export 
data out, and json2sstable to import back to different column family file? I 
did not do it before, but I guess it may work if they have same schema 
definition. 

 

From: Utku Can Topçu [mailto:u...@topcu.gen.tr] 
Sent: Wednesday, May 26, 2010 9:18 PM
To: user@cassandra.apache.org
Subject: Re: Moving/copying columns in between ColumnFamilies

 

Sorry I now realized that I used the wrong terminology.

What I really meant was, moving or copying the ROWS defined by a KeyRange in 
between ColumnFamilies.
Do you think it's doable with an efficient way?

On Wed, May 26, 2010 at 3:14 PM, Dop Sun  wrote:

There are no single API call to achieve this.

 

It’s read and write, plus a delete (if move) API calls I guess.

 

From: Utku Can Topçu [mailto:u...@topcu.gen.tr] 
Sent: Wednesday, May 26, 2010 9:09 PM
To: user@cassandra.apache.org
Subject: Moving/copying columns in between ColumnFamilies

 

Hey All,

Assume I have two ColumnFamilies in the same keyspace and I want to move or 
copy a range of columns (defined by a keyrange) into another columnfamily.

Do you think it's somehow possible and doable with the current support of the 
API, if so how?

Best Regards,
Utku

 



Re: Avro Example Code

2010-05-26 Thread Jeff Hammerbacher
I've got a mostly working Avro server and client for HBase at
http://github.com/hammer/hbase-trunk-with-avro and
http://github.com/hammer/pyhbase. If you replace "scan" with "slice", it
shouldn't be too much different for Cassandra...

On Mon, May 17, 2010 at 10:31 AM, Wellman, David  wrote:

> I spent the weekend working with avro and some java junit tests.  I still
> have a lot of learning to do, but if others would like to use, add to or
> improve upon the tests then I would appricate the feedback and help.
>
> David Wellman
>
>
> On May 17, 2010, at 10:16 AM, "Eric Evans"  wrote:
>
>  On Fri, 2010-05-14 at 14:52 -0600, David Wellman wrote:
>>
>>> Does anyone have a good link or example code that we can use to spike on
>>> Avro with Cassandra?
>>>
>>
>> If you're using Python, the best place to look is the functional tests
>> (see test/system), otherwise, Patrick's quick start
>> (http://bit.ly/32T6Mk).
>>
>> As Gary mentioned already, it's still very rough. After all the recent
>> changes in trunk (dropping the keyspace arg, binary keys, etc), it's
>> just barely back to a state where you can read/write, and only then via
>> get/insert (i.e. no slicing, batch_mutate, etc).
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>>


Re: Avro Example Code

2010-05-26 Thread David Wellman
Fantastic! Thank you.

On May 26, 2010, at 8:38 AM, Jeff Hammerbacher wrote:

> I've got a mostly working Avro server and client for HBase at 
> http://github.com/hammer/hbase-trunk-with-avro and 
> http://github.com/hammer/pyhbase. If you replace "scan" with "slice", it 
> shouldn't be too much different for Cassandra...
> 
> On Mon, May 17, 2010 at 10:31 AM, Wellman, David  wrote:
> I spent the weekend working with avro and some java junit tests.  I still 
> have a lot of learning to do, but if others would like to use, add to or 
> improve upon the tests then I would appricate the feedback and help.
> 
> David Wellman
> 
> 
> On May 17, 2010, at 10:16 AM, "Eric Evans"  wrote:
> 
> On Fri, 2010-05-14 at 14:52 -0600, David Wellman wrote:
> Does anyone have a good link or example code that we can use to spike on Avro 
> with Cassandra?
> 
> If you're using Python, the best place to look is the functional tests
> (see test/system), otherwise, Patrick's quick start
> (http://bit.ly/32T6Mk).
> 
> As Gary mentioned already, it's still very rough. After all the recent
> changes in trunk (dropping the keyspace arg, binary keys, etc), it's
> just barely back to a state where you can read/write, and only then via
> get/insert (i.e. no slicing, batch_mutate, etc).
> 
> -- 
> Eric Evans
> eev...@rackspace.com
> 
> 



using more than 50% of disk space

2010-05-26 Thread Sean Bridges
We're investigating Cassandra, and we are looking for a way to get Cassandra
use more than 50% of it's data disks.  Is this possible?

For major compactions, it looks like we can use more than 50% of the disk if
we use multiple similarly sized column families.  If we had 10 column
families of the same size, we could use 90% of the disk, since a major
compaction would only need as much free space as the largest column family
(in reality we would use less).  Is that right?

For bootstrapping new nodes, it looks like adding a new node will require
that an existing node does anti-compaction.  This anti-compaction could take
nearly 50% of the disk.  Is there a way around this?

Is there anything else that would prevent us from using more than 50% of the
data disk.

Thanks,

Sean


RE: using more than 50% of disk space

2010-05-26 Thread Stu Hood
See https://issues.apache.org/jira/browse/CASSANDRA-579 for some background 
here: I was just about to start working on this one, but it won't make it in 
until 0.7.


-Original Message-
From: "Sean Bridges" 
Sent: Wednesday, May 26, 2010 11:50am
To: user@cassandra.apache.org
Subject: using more than 50% of disk space

We're investigating Cassandra, and we are looking for a way to get Cassandra
use more than 50% of it's data disks.  Is this possible?

For major compactions, it looks like we can use more than 50% of the disk if
we use multiple similarly sized column families.  If we had 10 column
families of the same size, we could use 90% of the disk, since a major
compaction would only need as much free space as the largest column family
(in reality we would use less).  Is that right?

For bootstrapping new nodes, it looks like adding a new node will require
that an existing node does anti-compaction.  This anti-compaction could take
nearly 50% of the disk.  Is there a way around this?

Is there anything else that would prevent us from using more than 50% of the
data disk.

Thanks,

Sean




Two threads inserting columns into same key followed by read gets unexpected results

2010-05-26 Thread Scott McCarty
Hi,

I'm seeing a problem with inserting columns into one key using multiple
threads and I'm not sure if it's a bug or if it's my misunderstanding of how
insert/get_slice should work.

My setup is that I have two separate client processes, each with a single
thread, writing concurrently to Cassandra and each process is doing the same
thing:  read columns from one key and then use one of the column names as
the basis for another row key, and then insert into that new row key a
column whose name is unique to the process.  Each process then immediately
reads back the columns on that row key.

It's the reading back part that's showing inconsistent behavior:  most of
the time if Thread1 writes Column1 and Thread2 writes Column2 at about the
same time, things are consistent:  Thread1 sees either Column1 or it sees
Column1 and Column 2 (same for Thread2).  Both of those cases are expected
and indeed comprise 99% of the results.

Sometimes, however, Thread1 sees NO columns, and sometimes it sees Column2
(instead of Column1, as I'd expect).

Debug logs show that in failing cases, Thread1 completes its
insert/get_slice sequence BEFORE Thread2 starts its insert/get_slice, but
(in this example) Thread2 would get back NO columns.

I turned on debug level of logging on Cassandra and I saw that in at least
one case the two client requests are coming in at the exact same millisecond
so I'm wondering if there's some concurrency issue on the server.

I'm running linux with Java build 1.6.0_20-b02.  Cassandra is version 0.6.1
and I have one node in the cluster.  The consistency level for writing is
ALL and for reading it's ONE.  The clients are using the Java Hector
interface.  Both of the two client processes are on the same machine as the
Cassandra server (running on a dual-core processor).

This fails with 0.6.2 code also.

Am I wrong in thinking that an insert on a column with consistency level ALL
followed immediately by a get_slice should include that column?

Thanks

Scott McCarty


Re: Order Preserving Partitioner

2010-05-26 Thread Peter Hsu
Correct me if I'm wrong here.  Even though you can get your results with Random 
Partitioner, it's a lot less efficient if you're going across different 
machines to get your results.  If you're doing a lot of range queries, it makes 
sense to have things ordered sequentially so that if you do need to go to disk, 
the reads will be faster, rather than lots of random reads across your system.

It's also my understanding that if you go with the OPP, you could hash your key 
yourself using md5 or sha-1 to effectively get random partitioning.  So it's a 
bit of a pain, but not impossible to do a split between OPP and RP for your 
different columnfamily/keyspaces.

On May 26, 2010, at 2:32 AM, David Boxenhorn wrote:

> Just in case you don't know: You can do range searches on keys even with 
> Random Partitioner, you just won't get the results in order. If this is good 
> enough for you (e.g. if you can order the results on the client, or if you 
> just need to get the right answer, but not the right order), then you should 
> use Random Partitioner. 
> 
> (I bring this up because it confused me until recently.) 
> 
> On Wed, May 26, 2010 at 5:14 AM, Steve Lihn  wrote:
> I have a question on using Order Preserving Partitioner. 
> 
> Many rowKeys in my system will be related to dates, so it seems natural to 
> use Order Preserving Partitioner instead of the default Random Partitioner. 
> However, I have been warned that special attention has to be applied for 
> Order Preserving Partitioner to work properly (basically to ensure a good key 
> distribution and avoid "hot spot") and reverting it back to Random may not be 
> easy. Also not every rowKey is related to dates, for these, using Random 
> Partitioner is okay, but there is only one place to set Partitioner.
> 
> (Note: The intension of this warning is actually to discredit Cassandra and 
> persuade me not to use it.)
> 
> It seems the choice of Partitioner is defined in the storage-conf.xml and is 
> a global property. My question why does it have to be a global property? Is 
> there a future plan to make it customizable per KeySpace (just like you would 
> choose hash or range partition for different table/data in RDBMS) ?  
> 
> Thanks,
> Steve 
> 



Re: using more than 50% of disk space

2010-05-26 Thread Sean Bridges
So after CASSANDRA-579, anti compaction won't be done on the source node,
and we can use more than 50% of the disk space if we use multiple column
families?

Thanks,

Sean

On Wed, May 26, 2010 at 10:01 AM, Stu Hood  wrote:

> See https://issues.apache.org/jira/browse/CASSANDRA-579 for some
> background here: I was just about to start working on this one, but it won't
> make it in until 0.7.
>
>
> -Original Message-
> From: "Sean Bridges" 
> Sent: Wednesday, May 26, 2010 11:50am
> To: user@cassandra.apache.org
> Subject: using more than 50% of disk space
>
> We're investigating Cassandra, and we are looking for a way to get
> Cassandra
> use more than 50% of it's data disks.  Is this possible?
>
> For major compactions, it looks like we can use more than 50% of the disk
> if
> we use multiple similarly sized column families.  If we had 10 column
> families of the same size, we could use 90% of the disk, since a major
> compaction would only need as much free space as the largest column family
> (in reality we would use less).  Is that right?
>
> For bootstrapping new nodes, it looks like adding a new node will require
> that an existing node does anti-compaction.  This anti-compaction could
> take
> nearly 50% of the disk.  Is there a way around this?
>
> Is there anything else that would prevent us from using more than 50% of
> the
> data disk.
>
> Thanks,
>
> Sean
>
>
>


Subscribe

2010-05-26 Thread Nazario Parsacala



Sent from my iPhone


Doing joins between column familes

2010-05-26 Thread Dodong Juan


So I am not sure if you guys are familiar with OCM . Basically it is  
an ORM for Cassandra. Been testing it


So I have created model that has the following object relationship.  
OCM generates the code from this that allows me to do easy  
programmatic query from Java to Cassandra.


Object1-(Many2Many)->Object2-(Many2Many)->Object3-(Many2Many)->Object4- 
(Many2Many)->Node


So my app gets the NODE and tries to query the dependency relationship  
from Node->Object4->Object3->Object2->Object1.


I have compared the performance between Cassandra(with OCM) vs DB2.  
The result is not very encouraging since the DB2 performance is  
showing at least 3X faster than Cassandra. DB2 basically is just a  
single call with a number of  inner joins ..


Looking at the code, I think we might get a better performance if  
somehow we can do the joins between objects within Cassandra server   
rather than the client side. Right now , I am basically doing the  
following.


Node node = new Node(connection,"nodeidentifier");
node.loadInfo();  // going to the wire ...?
node.loadObject4(); // this goes to the wire too ..
object4Keys = node.getObject4().getColumns.keys();
while(object4Keys.hasNextElement)
   {
   object4Key = object4Key.nextElement();
   object4 = node.getObject4().get(object4Key);
   object4.loadInfo();// this goes to the wire  
too ..

   object4.loadObject3();   // this goes to the wire too ..
   object3Keys = object4.getObject3().getColumns.keys();
   while(object3Keys.hasNextElement)
   {
object3Key = object3Key.nextElement();
object3 = node.getObject4().get(object4Key);
   object3.loadInfo();  // this goes to the wire too ..
   object3.loadObject2();// this goes to the wire too ..
   ...
  ..
   ... until you get object1//

   }

I think there is a lot of going back and forth between the client and  
cassandra and if we can only move the relationship joins to Cassandra  
server I think we can minimize the latency and improve the overall  
performance on the query.


Is there a way to do a join across ColumnFamilies in Cassandra ?


 

Re: nodetool move looks stuck

2010-05-26 Thread Jonathan Ellis
Are there any exceptions in the log like the one in
https://issues.apache.org/jira/browse/CASSANDRA-1019 ?

If so you'll need to restart the moving node and try again.

On Wed, May 26, 2010 at 3:54 AM, Ran Tavory  wrote:
> I ran nodetool move on one of the nodes and it seems stuck for a few hours
> now.
> I've been able to run it successfully in the past, but this time it looks
> stuck.
> Streams shows as if there's work in progress, but the same files have been
> at the same position for a few hours.
> I've also checked the compaction manager through jmx and it's not compacting
> anything.
> Anything I can do to continue the move operation? Is this a bug?
> thanks
>
> $ nodetool -h 192.168.254.58 -p 9004 streams
> Mode: Normal
>  Nothing streaming to /192.168.252.124
> Streaming to: /192.168.254.57
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvImpressions-170-Filter.db
> 0/45587965
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvImpressions-170-Data.db
> 0/18369406636
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvRatings-142-Index.db
> 0/6163411
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvRatings-142-Filter.db
> 0/85645
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvRatings-142-Data.db
> 0/37985032
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvAds-108-Index.db
> 0/77307418
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvAds-108-Filter.db
> 0/1436005
>    /outbrain/cassandra/data/outbrain_kvdb/stream/KvAds-108-Data.db
> 0/704736670
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-40-Index.db 0/230280
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-40-Filter.db 0/20605
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-40-Data.db
> 0/202808000
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-41-Index.db 0/23912
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-41-Filter.db 0/20605
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-41-Data.db 0/21434973
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-42-Index.db 0/965028
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-42-Filter.db 0/20605
>    /outbrain/cassandra/data/Keyspace1/stream/Standard1-42-Data.db
> 0/865380205
> Not receiving any streams.
> $ nodetool -h 192.168.254.57 -p 9004 streams
> Mode: Normal
>  Nothing streaming to /192.168.254.58
> Streaming from: /192.168.254.58
>    outbrain_kvdb:
> /outbrain/cassandra/data/outbrain_kvdb/KvImpressions-tmp-1081-Data.db
> 0/18369406636
>    outbrain_kvdb:
> /outbrain/cassandra/data/outbrain_kvdb/KvRatings-tmp-167-Index.db 0/6163411
>    outbrain_kvdb:
> /outbrain/cassandra/data/outbrain_kvdb/KvRatings-tmp-167-Filter.db 0/85645
>    outbrain_kvdb:
> /outbrain/cassandra/data/outbrain_kvdb/KvRatings-tmp-167-Data.db 0/37985032
>    outbrain_kvdb:
> /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-123-Index.db 0/77307418
>    outbrain_kvdb:
> /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-123-Filter.db 0/1436005
>    outbrain_kvdb:
> /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-123-Data.db 0/704736670
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-21-Index.db
> 0/230280
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-21-Filter.db
> 0/20605
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-21-Data.db
> 0/202808000
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-22-Index.db
> 0/23912
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-22-Filter.db
> 0/20605
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-22-Data.db
> 0/21434973
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-23-Index.db
> 0/965028
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-23-Filter.db
> 0/20605
>    Keyspace1: /outbrain/cassandra/data/Keyspace1/Standard1-tmp-23-Data.db
> 0/865380205
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Order Preserving Partitioner

2010-05-26 Thread Jonathan Shook
I don't think that queries on a key range are valid unless you are using OPP.
As far as hashing the key for OPP goes, I take it to be the same a not
using OPP. It's really a matter of where it gets done, but it has much
the same effect.
(I think)

Jonathan

On Wed, May 26, 2010 at 12:51 PM, Peter Hsu  wrote:
> Correct me if I'm wrong here.  Even though you can get your results with
> Random Partitioner, it's a lot less efficient if you're going across
> different machines to get your results.  If you're doing a lot of range
> queries, it makes sense to have things ordered sequentially so that if you
> do need to go to disk, the reads will be faster, rather than lots of random
> reads across your system.
> It's also my understanding that if you go with the OPP, you could hash your
> key yourself using md5 or sha-1 to effectively get random partitioning.  So
> it's a bit of a pain, but not impossible to do a split between OPP and RP
> for your different columnfamily/keyspaces.
> On May 26, 2010, at 2:32 AM, David Boxenhorn wrote:
>
> Just in case you don't know: You can do range searches on keys even with
> Random Partitioner, you just won't get the results in order. If this is good
> enough for you (e.g. if you can order the results on the client, or if you
> just need to get the right answer, but not the right order), then you should
> use Random Partitioner.
>
> (I bring this up because it confused me until recently.)
>
> On Wed, May 26, 2010 at 5:14 AM, Steve Lihn  wrote:
>>
>> I have a question on using Order Preserving Partitioner.
>>
>> Many rowKeys in my system will be related to dates, so it seems natural to
>> use Order Preserving Partitioner instead of the default Random Partitioner.
>> However, I have been warned that special attention has to be applied for
>> Order Preserving Partitioner to work properly (basically to ensure a good
>> key distribution and avoid "hot spot") and reverting it back to Random may
>> not be easy. Also not every rowKey is related to dates, for these, using
>> Random Partitioner is okay, but there is only one place to set Partitioner.
>>
>> (Note: The intension of this warning is actually to discredit Cassandra
>> and persuade me not to use it.)
>>
>> It seems the choice of Partitioner is defined in the storage-conf.xml and
>> is a global property. My question why does it have to be a global property?
>> Is there a future plan to make it customizable per KeySpace (just like you
>> would choose hash or range partition for different table/data in RDBMS) ?
>>
>> Thanks,
>> Steve
>
>
>


Re: Doing joins between column familes

2010-05-26 Thread Charlie Mason
On Wed, May 26, 2010 at 7:45 PM, Dodong Juan  wrote:
>
> So I am not sure if you guys are familiar with OCM . Basically it is an ORM
> for Cassandra. Been testing it
>

In case anyone is interested I have posted a reply on the OCM issue
tracker where this was also raised.

http://github.com/charliem/OCM/issues/closed#issue/5/comment/254717


Charlie M


Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Ran Tavory
If I disable row cache the numbers look good - key cache hit rate is > 0, so
it seems to be related to row cache.

Interestingly, after running for a really long time and with both row and
keys caches I do start to see Key cache hit rate > 0 but the numbers are so
small that it doesn't make sense.
I have capacity for 10M keys and 10M rows, the number of cached keys is ~5M
and very similarly the number of cached rows is also ~5M, however the hit
rates are very different, 0.7 for rows and 0.006 for keys. I'd expect the
keys hit rate to be identical since none of them reached the limit yet.

Key cache capacity: 1000
Key cache size: 5044097
Key cache hit rate: 0.0062089764058896576
Row cache capacity: 1000
Row cache size: 5057231
Row cache hit rate: 0.7361241352465543



On Tue, May 25, 2010 at 3:43 PM, Jonathan Ellis  wrote:

> What happens if you disable row cache?
>
> On Tue, May 25, 2010 at 4:53 AM, Ran Tavory  wrote:
> > It seems there's an error reporting the Key cache hit rate. The value is
> > always 0.0 and I have a feeling it's incorrect. This is seen both by
> using
> > notetool cfstats as well as accessing JMX directly
> >
> (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
> > RecentHitRate)
> >> RowsCached="1000"
> > KeysCached="1000"/>
> > Column Family: KvAds
> > SSTable count: 7
> > Space used (live): 1288942061
> > Space used (total): 1559831566
> > Memtable Columns Count: 73698
> > Memtable Data Size: 17121092
> > Memtable Switch Count: 33
> > Read Count: 3614433
> > Read Latency: 0.068 ms.
> > Write Count: 3503269
> > Write Latency: 0.024 ms.
> > Pending Tasks: 0
> > Key cache capacity: 1000
> > Key cache size: 619624
> > Key cache hit rate: 0.0
> > Row cache capacity: 1000
> > Row cache size: 447154
> > Row cache hit rate: 0.8460295730014572
> > Compacted row minimum size: 387
> > Compacted row maximum size: 31430
> > Compacted row mean size: 631
> > The Row cache hit rate looks good, 0.8 but Key cache hit rate always
> seems
> > to be 0.0 while the number of unique keys stays about 619624 for quite a
> > while.
> > Is it a real caching problem or just a reporting glitch?
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Doing joins between column familes

2010-05-26 Thread Jonathan Shook
I wrote some Iterable<*> methods to do this for column families that
share key structure with OPP.
It is on the hector examples page. Caveat emptor.

It does iterative chunking of the working set for each column family,
so that you can set the nominal transfer size when you construct the
Iterator/Iterable. I've been very happy with the performance of it,
even over large ranges of keys. This is with
OrderPreservingPartitioner because of other requirements, so it may
not be a good example for comparison with a random partitioner, which
is preferred.

Doing joins as such on the server works against the basic design of
Cassandra. The server does a few things very well only because it
isn't overloaded with extra faucets and kitchen sinks. However, I'd
like to be able to load auxiliary classes into the server runtime in a
modular way, just for things like this. Maybe we'll get that someday.

My impression is that there is much more common key structure in a
workable Cassandra storage layout than in a conventional ER model.
This is the nature of the beast when you are organizing your
information more according to access patterns than fully normal
relationships. That is one of the fundamental design trade-offs of
using a hash structure over a schema.

Having something that lets you deploy a fully normal schema on a hash
store can be handy, but it can also obscure the way that your
application indirectly exercises the storage layer. The end-result may
be that the layout is less friendly to the underlying mechanisms of
Cassandra. I'm not saying that it is bad to have a tool to do this,
only that it can make it easy to avoid thinking about Cassandra
storage in terms of what it really is.

There may be ways to optimize the OCM queries, but that takes you down
the road of query optimization, which can be quite nebulous. My gut
instinct is to focus more on the layout, using aggregate keys and
common key structure where you can, so that you can take advantage of
the parallel queries more of the time.

On Wed, May 26, 2010 at 3:13 PM, Charlie Mason  wrote:
> On Wed, May 26, 2010 at 7:45 PM, Dodong Juan  wrote:
>>
>> So I am not sure if you guys are familiar with OCM . Basically it is an ORM
>> for Cassandra. Been testing it
>>
>
> In case anyone is interested I have posted a reply on the OCM issue
> tracker where this was also raised.
>
> http://github.com/charliem/OCM/issues/closed#issue/5/comment/254717
>
>
> Charlie M
>


Best Timestamp?

2010-05-26 Thread Steven Haar
What is the best timestamp to use while using Cassandra with C#? I have been
using DateTime.Now.Ticks, but I have seen others using different things.

Thanks.


Re: Best Timestamp?

2010-05-26 Thread Mark Robson
On 26 May 2010 22:42, Steven Haar  wrote:

> What is the best timestamp to use while using Cassandra with C#? I have
> been using DateTime.Now.Ticks, but I have seen others using different
> things.
>

The standard that most clients seem to use is epoch-microseconds, or
microseconds since midnight GMT 1/1/1970

The trick is to get all clients to use the same value, so it only ever
increases :)

Mark


Re: Best Timestamp?

2010-05-26 Thread Miguel Verde
Right, in C# this would be (not the most efficient way, but you get the
idea):
long timestamp = (DateTime.Now.Ticks - new DateTime(1970, 1, 1).Ticks)/10;

On Wed, May 26, 2010 at 4:50 PM, Mark Robson  wrote:

>  On 26 May 2010 22:42, Steven Haar  wrote:
>
>> What is the best timestamp to use while using Cassandra with C#? I have
>> been using DateTime.Now.Ticks, but I have seen others using different
>> things.
>>
>
> The standard that most clients seem to use is epoch-microseconds, or
> microseconds since midnight GMT 1/1/1970
>
> The trick is to get all clients to use the same value, so it only ever
> increases :)
>
> Mark
>


Re: Best Timestamp?

2010-05-26 Thread Mark Robson
On 26 May 2010 22:56, Miguel Verde  wrote:

> Right, in C# this would be (not the most efficient way, but you get the
> idea):
> long timestamp = (DateTime.Now.Ticks - new DateTime(1970, 1, 1).Ticks)/10;
>
>
> Yeah, you're fine provided:

a) All your client applications (which perform writes) are consistent and
b) Your infrastructure's clocks are fairly in sync.

The latter point is important to note, as a single app server node which has
a clock which is accidentally far in the future could wreck your database.

Mark


Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Jonathan Ellis
It sure sounds like you're seeing the "my row cache contains the
entire hot data set, so the key cache only gets the cold reads"
effect.

On Wed, May 26, 2010 at 2:54 PM, Ran Tavory  wrote:
> If I disable row cache the numbers look good - key cache hit rate is > 0, so
> it seems to be related to row cache.
> Interestingly, after running for a really long time and with both row and
> keys caches I do start to see Key cache hit rate > 0 but the numbers are so
> small that it doesn't make sense.
> I have capacity for 10M keys and 10M rows, the number of cached keys is ~5M
> and very similarly the number of cached rows is also ~5M, however the hit
> rates are very different, 0.7 for rows and 0.006 for keys. I'd expect the
> keys hit rate to be identical since none of them reached the limit yet.
>                 Key cache capacity: 1000
>                 Key cache size: 5044097
>                 Key cache hit rate: 0.0062089764058896576
>                 Row cache capacity: 1000
>                 Row cache size: 5057231
>                 Row cache hit rate: 0.7361241352465543
>
>
> On Tue, May 25, 2010 at 3:43 PM, Jonathan Ellis  wrote:
>>
>> What happens if you disable row cache?
>>
>> On Tue, May 25, 2010 at 4:53 AM, Ran Tavory  wrote:
>> > It seems there's an error reporting the Key cache hit rate. The value is
>> > always 0.0 and I have a feeling it's incorrect. This is seen both by
>> > using
>> > notetool cfstats as well as accessing JMX directly
>> >
>> > (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
>> > RecentHitRate)
>> >       > >                     RowsCached="1000"
>> >                     KeysCached="1000"/>
>> >                 Column Family: KvAds
>> >                 SSTable count: 7
>> >                 Space used (live): 1288942061
>> >                 Space used (total): 1559831566
>> >                 Memtable Columns Count: 73698
>> >                 Memtable Data Size: 17121092
>> >                 Memtable Switch Count: 33
>> >                 Read Count: 3614433
>> >                 Read Latency: 0.068 ms.
>> >                 Write Count: 3503269
>> >                 Write Latency: 0.024 ms.
>> >                 Pending Tasks: 0
>> >                 Key cache capacity: 1000
>> >                 Key cache size: 619624
>> >                 Key cache hit rate: 0.0
>> >                 Row cache capacity: 1000
>> >                 Row cache size: 447154
>> >                 Row cache hit rate: 0.8460295730014572
>> >                 Compacted row minimum size: 387
>> >                 Compacted row maximum size: 31430
>> >                 Compacted row mean size: 631
>> > The Row cache hit rate looks good, 0.8 but Key cache hit rate always
>> > seems
>> > to be 0.0 while the number of unique keys stays about 619624 for quite a
>> > while.
>> > Is it a real caching problem or just a reporting glitch?
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Thoughts on adding complex queries to Cassandra

2010-05-26 Thread Jeremy Davis
Are there any thoughts on adding a more complex query to Cassandra?

At a high level what I'm wondering is: Would it be possible/desirable/in
keeping with the Cassandra plan, to add something like a javascript blob on
to a get range slice etc, that does some further filtering on the results
before returning them. The goal being to trade off some CPU on Cassandra for
network bandwidth.

-JD


RE: Thoughts on adding complex queries to Cassandra

2010-05-26 Thread Nicholas Sun
I'm very curious on this topic as well.  Mainly, I'd like to know is this
functionality handled through Map/Reduce HADOOP operations?

 

Nick 

 

From: Jeremy Davis [mailto:jerdavis.cassan...@gmail.com] 
Sent: Wednesday, May 26, 2010 3:31 PM
To: user@cassandra.apache.org
Subject: Thoughts on adding complex queries to Cassandra

 


Are there any thoughts on adding a more complex query to Cassandra?

At a high level what I'm wondering is: Would it be possible/desirable/in
keeping with the Cassandra plan, to add something like a javascript blob on
to a get range slice etc, that does some further filtering on the results
before returning them. The goal being to trade off some CPU on Cassandra for
network bandwidth. 

-JD



Cassandra's 2GB row limit and indexing

2010-05-26 Thread Richard West
Hi all,

I'm currently looking at new database options for a URL shortener in order
to scale well with increased traffic as we add new features. Cassandra seems
to be a good fit for many of our requirements, but I'm struggling a bit to
find ways of designing certain indexes in Cassandra due to its 2GB row
limit.

The easiest example of this is that I'd like to create an index by the
domain that shortened URLs are linking to, mostly for spam control so it's
easy to grab all the links to any given domain. As far as I can tell the
typical way to do this in Cassandra is something like: -

DOMAIN = { //columnfamily
thing.com { //row key
timestamp: "shorturl567", //column name: value
timestamp: "shorturl144",
timestamp: "shorturl112",
...
}
somethingelse.com {
timestamp: "shorturl817",
...
}
}

The values here are keys for another columnfamily containing various data on
shortened URLs.

The problem with this approach is that a popular domain (e.g. blogspot.com)
could be used in many millions of shortened URLs, so would have that many
columns and hit the row size limit mentioned at
http://wiki.apache.org/cassandra/CassandraLimitations.

Does anyone know an effective way to design this type of one-to-many index
around this limitation (could be something obvious I'm missing)? If not, are
the changes proposed for
https://issues.apache.org/jira/browse/CASSANDRA-16likely to make this
type of design workable?

Thanks in advance for any advice,

Richard


Re: Cassandra's 2GB row limit and indexing

2010-05-26 Thread Jonathan Shook
The example is a little confusing.
.. but ..

1) "sharding"
You can square the capacity by having a 2-level map.
 CF1->row->value->CF2->row->value
 This means finding some natural subgrouping or hash that provides a
good distribution.
2)  "hashing"
You can also use some additional key hashing to spread the rows over a
wider space:
 Find a delimiter that works for you and identify the row that owns it
by "domain" + "delimiter" + hash(domain) modulo some divisor, for
example.
3) "overflow"
You can implement some overflow logic to create overflow rows which
act like (2), but is less sparse
 while count(columns) for candidate row > some threshold, try row +
"delimiter" + subrow++
 This is much easier when you are streaming data in, as opposed to
poking the random value here and there

Just some ideas. I'd go with 2, and find a way to adjust the modulo to
minimize the row spread. 2) isn't guaranteed to provide uniformity,
but 3) isn't guaranteed to provide very good performance. Perhaps a
combination of them both? The count is readily accessible, so it may
provide for some informed choices at run time. I'm assuming your
column sizes are fairly predictable.

Has anybody else tackled this before?


On Wed, May 26, 2010 at 8:52 PM, Richard West  wrote:
> Hi all,
>
> I'm currently looking at new database options for a URL shortener in order
> to scale well with increased traffic as we add new features. Cassandra seems
> to be a good fit for many of our requirements, but I'm struggling a bit to
> find ways of designing certain indexes in Cassandra due to its 2GB row
> limit.
>
> The easiest example of this is that I'd like to create an index by the
> domain that shortened URLs are linking to, mostly for spam control so it's
> easy to grab all the links to any given domain. As far as I can tell the
> typical way to do this in Cassandra is something like: -
>
> DOMAIN = { //columnfamily
>     thing.com { //row key
>         timestamp: "shorturl567", //column name: value
>         timestamp: "shorturl144",
>         timestamp: "shorturl112",
>         ...
>     }
>     somethingelse.com {
>         timestamp: "shorturl817",
>         ...
>     }
> }
>
> The values here are keys for another columnfamily containing various data on
> shortened URLs.
>
> The problem with this approach is that a popular domain (e.g. blogspot.com)
> could be used in many millions of shortened URLs, so would have that many
> columns and hit the row size limit mentioned at
> http://wiki.apache.org/cassandra/CassandraLimitations.
>
> Does anyone know an effective way to design this type of one-to-many index
> around this limitation (could be something obvious I'm missing)? If not, are
> the changes proposed for https://issues.apache.org/jira/browse/CASSANDRA-16
> likely to make this type of design workable?
>
> Thanks in advance for any advice,
>
> Richard
>


Re: Anyone using hadoop/MapReduce integration currently?

2010-05-26 Thread 朱蓝天
2010/5/26 Utku Can Topçu 

> Hi Jeremy,
>
>
> > Why are you using Cassandra versus using data stored in HDFS or HBase?
> - I'm thinking of using it for realtime streaming of user data. While
> streaming the requests, I'm also using Lucandra for indexing the data in
> realtime. It's a better option when you compare it with HBase or the native
> HDFS flat files, because of low latency in writes.


 i'm  interested in realtime index with lucandra. but how to intersect
posting list from multiple terms with cansandra. if through the network, i
think it is very
inefficient

>
>
> > Is there anything holding you back from using it (if you would like to
> use it but currently cannot)?
>
> My answer to this would be:
> - The current integration only supports the whole range of the CF to be
> input for the map phase, it would be way much better if the InputFormat had
> means of support for a KeyRange.
>
> Best Regards,
> Utku
>
>
> On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna 
> wrote:
>
>> I'll be doing a presentation on Cassandra's (0.6+) hadoop integration next
>> week. Is anyone currently using MapReduce or the initial Pig integration?
>>
>> (If you're unaware of such integration, see
>> http://wiki.apache.org/cassandra/HadoopSupport)
>>
>> If so, could you post to this thread on how you're using it or planning on
>> using it (if not covered by the shroud of secrecy)?
>>
>> e.g.
>> What is the use case?
>>
>> Why are you using Cassandra versus using data stored in HDFS or HBase?
>>
>> Are you using a separate Hadoop cluster to run the MR jobs on, or perhaps
>> are you running the Job Tracker and Task Trackers on Cassandra nodes?
>>
>> Is there anything holding you back from using it (if you would like to use
>> it but currently cannot)?
>>
>> Thanks!
>
>
>


Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Ran Tavory
so the row cache contains both rows and keys and if I have large enough row
cache (in particular if row cache size equals key cache size) then it's just
wasteful to keep another key cache and I should eliminate the key-cache,
correct?

On Thu, May 27, 2010 at 1:21 AM, Jonathan Ellis  wrote:

> It sure sounds like you're seeing the "my row cache contains the
> entire hot data set, so the key cache only gets the cold reads"
> effect.
>
> On Wed, May 26, 2010 at 2:54 PM, Ran Tavory  wrote:
> > If I disable row cache the numbers look good - key cache hit rate is > 0,
> so
> > it seems to be related to row cache.
> > Interestingly, after running for a really long time and with both row and
> > keys caches I do start to see Key cache hit rate > 0 but the numbers are
> so
> > small that it doesn't make sense.
> > I have capacity for 10M keys and 10M rows, the number of cached keys is
> ~5M
> > and very similarly the number of cached rows is also ~5M, however the hit
> > rates are very different, 0.7 for rows and 0.006 for keys. I'd expect the
> > keys hit rate to be identical since none of them reached the limit yet.
> > Key cache capacity: 1000
> > Key cache size: 5044097
> > Key cache hit rate: 0.0062089764058896576
> > Row cache capacity: 1000
> > Row cache size: 5057231
> > Row cache hit rate: 0.7361241352465543
> >
> >
> > On Tue, May 25, 2010 at 3:43 PM, Jonathan Ellis 
> wrote:
> >>
> >> What happens if you disable row cache?
> >>
> >> On Tue, May 25, 2010 at 4:53 AM, Ran Tavory  wrote:
> >> > It seems there's an error reporting the Key cache hit rate. The value
> is
> >> > always 0.0 and I have a feeling it's incorrect. This is seen both by
> >> > using
> >> > notetool cfstats as well as accessing JMX directly
> >> >
> >> >
> (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
> >> > RecentHitRate)
> >> >>> > RowsCached="1000"
> >> > KeysCached="1000"/>
> >> > Column Family: KvAds
> >> > SSTable count: 7
> >> > Space used (live): 1288942061
> >> > Space used (total): 1559831566
> >> > Memtable Columns Count: 73698
> >> > Memtable Data Size: 17121092
> >> > Memtable Switch Count: 33
> >> > Read Count: 3614433
> >> > Read Latency: 0.068 ms.
> >> > Write Count: 3503269
> >> > Write Latency: 0.024 ms.
> >> > Pending Tasks: 0
> >> > Key cache capacity: 1000
> >> > Key cache size: 619624
> >> > Key cache hit rate: 0.0
> >> > Row cache capacity: 1000
> >> > Row cache size: 447154
> >> > Row cache hit rate: 0.8460295730014572
> >> > Compacted row minimum size: 387
> >> > Compacted row maximum size: 31430
> >> > Compacted row mean size: 631
> >> > The Row cache hit rate looks good, 0.8 but Key cache hit rate always
> >> > seems
> >> > to be 0.0 while the number of unique keys stays about 619624 for quite
> a
> >> > while.
> >> > Is it a real caching problem or just a reporting glitch?
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Continuously increasing RAM usage

2010-05-26 Thread James Golick
We're seeing RAM usage continually climb until eventually, cassandra becomes
unresponsive.

The JVM isn't OOM'ing. It has only committed 14/24GB of memory. So, I am
assuming that the memory usage is related to mmap'd IO. Fair assumption?

I tried setting the IO mode to standard, but it seemed to be a little slower
and couldn't get the machine to come back online with adequate read
performance, so I set it back. I'll have to write a solid cache warming
script if I'm going to try that again.

Any other ideas for what might be causing the issue? Is there something I
should monitor or look at next time it happens?

Thanks


Re: Anyone using hadoop/MapReduce integration currently?

2010-05-26 Thread gabriele renzi
On Tue, May 25, 2010 at 6:35 PM, Jeremy Hanna
 wrote:


> What is the use case?

we end up with messed up data in the database, we run a mapreduce job
to find irregular data from time to time.


> Why are you using Cassandra versus using data stored in HDFS or HBase?

as of now our mapreduce task is only used for "fixing" cassandra so
the question is useless :)


> Are you using a separate Hadoop cluster to run the MR jobs on, or perhaps are 
> you running the Job Tracker and Task Trackers on Cassandra nodes?

separate

> Is there anything holding you back from using it (if you would like to use it 
> but currently cannot)?

It would be nice if the output of the mapreduce job was a
MutationOutputFormat in which we could write insert/delete, but I
recall there is something on jira already albeit not sure if it was
merged.