Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun

Hello,

I'm running Cassandra 0.6.0 on a cluster and have an application that 
needs to read all rows from a column family using the Cassandra Thrift 
API. Ideally, I'd like to be able to do this by having all nodes in the 
cluster read in parallel (i.e., each node reads a disjoint set of rows 
that are stored locally). I should also mention that I'm using the 
RandomPartitioner.


Here's what I was thinking:

  1. Have one node invoke describe_ring to find the token range on the 
ring that each node is responsible for.


  2. For each token range, have the node that owns that portion of the 
ring read the rows in that range using a sequence of get_range_slices 
calls (using start/end tokens, not keys).


This type of functionality seems to already be there in the tree with 
the recent Cassandra/Hadoop integration.


...
KeyRange keyRange = new KeyRange(batchRowCount)
.setStart_token(startToken)
.setEnd_token(split.getEndToken());
try
{
rows = client.get_range_slices(new ColumnParent(cfName),
   predicate,
   keyRange,
   ConsistencyLevel.ONE);
 ...

// prepare for the next slice to be read
KeySlice lastRow = rows.get(rows.size() - 1);
IPartitioner p = DatabaseDescriptor.getPartitioner();
byte[] rowkey = lastRow.getKey();
startToken = p.getTokenFactory().toString(p.getToken(rowkey));
...

The above snippet from ColumnFamilyRecordReader.java seems to suggest it 
is possible to scan an entire column family by reading disjoint sets of 
rows using token-based range queries (as opposed to key-based range 
queries). Is this possible in 0.6.0? (Note: for the next startToken, I 
was just planning on computing the MD5 digest of the last key directly 
since I'm accessing Cassandra through Thrift.)


Thoughts?

bnc


Question about hinted handoff

2010-07-08 Thread ChingShen
Hi all,

  Please consider this case: (RF=1, CL=ONE)

  1. I have A, B and C nodes.
  2. A node is a coordinator node, it sends a request to B node to do write
operation.
  3. B node is down during write operation, so return failure message to
client, and write a hint to C node.
  4. B node comes back up, then C node forwards the data to it.
  5. B node own data right now, although the write operation is failure.

  Correctly?

Thanks.

Shen


Re: Query on delete a column inside a super column

2010-07-08 Thread Moses Dinakaran
As per my knowledge in phpCassa I didnt find any option to remove a
column from the supercolumn, The remove method removes the whole super
column from the key, will check with thrift api.

Through mutation object insert/update happens but removing a column
dosent happen.

Thank you all.

Regards
Moses.


On Wed, Jul 7, 2010 at 7:41 PM, Jonathan Ellis  wrote:
>
> the thrift api allows you to optionally specify column and subcolumn
> as well.  no idea how or if phpCassa exposes this though.
>
> On Wed, Jul 7, 2010 at 1:51 AM, Moses Dinakaran
>  wrote:
> > Hi,
> >
> > Thanks for the reply,
> >
> > The remove method
> > $cassandraInstance->remove('cache_pages_key_hash', 'hash_1' )
> >
> > which will remove the whole key, But I don't want to do that, I need to
> > remove one column inside that key
> >
> > Can you please tell me how to use the remove method in this case.
> >
> >
> > Regards,
> > Moses.
> >
> >
> > On Wed, Jul 7, 2010 at 12:16 AM, Jonathan Ellis  wrote:
> >>
> >> insert is insert-or-update.  leaving out a column from an update
> >> doesn't delete it, you need to use the remove method for that.
> >>
> >> On Tue, Jul 6, 2010 at 7:41 AM, Moses Dinakaran
> >>  wrote:
> >> > Hi All,
> >> >
> >> > I have a query related to deleting a column inside a super column
> >> >
> >> > The following is my cassandra schema
> >> >
> >> > [cache_pages_key_hash] => Array
> >> >        (
> >> >            [hash_1] => Array
> >> >                (
> >> >                    [1] => 4c330e95195f9
> >> >                    [2] => 4c330e951f18b
> >> >                    [3] => 4c330e9521f3d
> >> >                )
> >> >
> >> >        )
> >> >
> >> >
> >> > No I wanted to remove the index [1] => 4c330e95195f9 from the
> >> > supercolumn [hash_1]
> >> >
> >> > Through phpCassa I am doing the following
> >> >
> >> > $updatedRecord   =  array("hash_1" => Array
> >> >                                (
> >> >                                    2 => "4c330e951f18b"
> >> >                                    3 => "4c330e9521f3d"
> >> >                               )
> >> >                             )
> >> >
> >> >
> >> > $cassandraInstance->insert('cache_pages_key_hash',$updateRecord );
> >> >
> >> > But while I fetch the record again
> >> >
> >> > I was getting the original records ie the column 1 is not removed from
> >> >
> >> >
> >> > [cache_pages_key_hash] => Array
> >> >        (
> >> >            [hash_1] => Array
> >> >                (
> >> >                    [1] => 4c330e95195f9
> >> >                    [2] => 4c330e951f18b
> >> >                    [3] => 4c330e9521f3d
> >> >                )
> >> >
> >> >        )
> >> >
> >> >
> >> > But at the same time If I am updating the index 1
> >> >
> >> > ie
> >> > $updateRecord   =  array("hash_1" => Array
> >> >                                (
> >> >                                   1  => ' able to update'
> >> >                                    2 => "4c330e951f18b"
> >> >                                    3 => "4c330e9521f3d"
> >> >                               )
> >> >                             )
> >> >
> >> > $cassandraInstance->insert('cache_pages_key_hash',$updateRecord );
> >> >
> >> > The records is being updated, Only problem is that  deleting dosent
> >> > happens.
> >> >
> >> > My question is that is this behavior is expected as explained in the
> >> > article
> >> > Distributed deletes in the Cassandra database
> >> > http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
> >> >
> >> > or I am doing wrong.
> >> >
> >> >
> >> > Thanks,
> >> > Moses.
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


How to stop Cassandra running in embeded mode

2010-07-08 Thread Andriy Kopachevsky
Hi, we are trying to set up intergation testing for Cassanrda, so we need to
run and stop it as embeded service. Don't have any problem to start
cassandra:

import
org.apache.cassandra.contrib.utils.service.CassandraServiceDataCleaner;

class SomeTestClass {

@Before
public void setup() throws TTransportException, IOException,
InterruptedException {


// make a tmp dir and copy storag-conf.xml and log4j.properties to
it
copy("/storage-conf.xml", TMP);
copy("/log4j.properties", TMP);
System.setProperty("storage-config", TMP);

cassandra = new EmbeddedCassandraService();
cassandra.init();
t = new Thread(cassandra);
t.setDaemon(true);
t.start();
   }
}

But this is real problem to stop it, even if you execute t.stop() all other
threads started up internally still alive. Is there are any way to force
cassandra to stop? Maybe invoke some internal cassandra api function or
anything. Thanks.
Andrey.


Re: Why so many commitlogs ?

2010-07-08 Thread Anty
Hi:Jonathan
I have found out what's going wrong.
I change the configuration
1440
which prevent memtables of LocaitonInfo and HintsColumnFamily to flush if
there are a few hint records writen to many commitlog segment.
On Thu, Jul 8, 2010 at 9:43 AM, Anty  wrote:

>
>
> On Thu, Jul 8, 2010 at 9:21 AM, Jonathan Ellis  wrote:
>
>> you're not out of disk space, are you?
>>
>> No.
>
>> if not you could try restarting, that should clear them out if nothing
>> else does
>>
> Yes. I restarted the node, then the commitlogs were removed.
> But recover so many commitlog take so much time.
>
>>
>> On Wed, Jul 7, 2010 at 8:07 PM, Anty  wrote:
>> > Thx Jonathan.
>> >
>> >
>> > On Wed, Jul 7, 2010 at 11:58 PM, Jonathan Ellis 
>> wrote:
>> >>
>> >> number of memtables waiting to flush has a pretty low bound (# of data
>> >> file directories in 0.6.3)
>> >>
>> > O ,I seen
>> >>
>> >> did you check your log for exceptions?
>> >
>> > Yes ,but no exceptions.
>> >
>> >
>> >
>> >>
>> >> On Wed, Jul 7, 2010 at 10:35 AM, Anty  wrote:
>> >> > yes, i know. I only insert records into one CF.
>> >> >
>> >> > when a memtable flush complete, commitlog  will check if there are
>> some
>> >> > obsolete commitlog segments.
>> >> > I don't known why there are so many commitlog file out there.
>> >> > is there a possibility that too many memtables is waiting for
>> flushing,
>> >> > which prevent many commitlog files from  being removed.
>> >> >
>> >> > On Wed, Jul 7, 2010 at 10:13 PM, Jonathan Ellis 
>> >> > wrote:
>> >> >>
>> >> >> commitlogs can be removed after _all_ the CFs they have data for
>> have
>> >> >> been flushed.
>> >> >>
>> >> >> On Wed, Jul 7, 2010 at 5:21 AM, Anty  wrote:
>> >> >> > Hi:all
>> >> >> > In my little cluter ,after i insert many many records into
>> cassandra,
>> >> >> > there
>> >> >> > are hundreds of commit log files in commitlog log directory.
>> >> >> > is it normal ?
>> >> >> > I read the source code of commitlog , there shouldn't be so many
>> >> >> > commitlog
>> >> >> > log files .
>> >> >> > any clue will be appreciate.
>> >> >> >
>> >> >> > --
>> >> >> > Best Regards
>> >> >> > Anty Rao
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jonathan Ellis
>> >> >> Project Chair, Apache Cassandra
>> >> >> co-founder of Riptano, the source for professional Cassandra support
>> >> >> http://riptano.com
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best Regards
>> >> > Anty Rao
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>> >
>> > --
>> > Best Regards
>> > Anty Rao
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>
>
> --
> Best Regards
> Anty Rao
>



-- 
Best Regards
Anty Rao


Re: Question about hinted handoff

2010-07-08 Thread Anty
On Thu, Jul 8, 2010 at 4:11 PM, ChingShen  wrote:

> Hi all,
>
>   Please consider this case: (RF=1, CL=ONE)
>
>   1. I have A, B and C nodes.
>   2. A node is a coordinator node, it sends a request to B node to do write
> operation.
>
No ,will not choose B , write the data locally in Node A.
if RF=2
may choose C as hint  and replica node.

>   3. B node is down during write operation, so return failure message to
> client, and write a hint to C node.
>   4. B node comes back up, then C node forwards the data to it.
>   5. B node own data right now, although the write operation is failure.
>
>   Correctly?
>
> Thanks.
>
> Shen
>



-- 
Best Regards
Anty Rao


Gossip round time

2010-07-08 Thread ChingShen
Hi,

  I found the http://www.slideshare.net/adorepump/cassandra-nosql ppt, that
mentioned "State disseminated in* O(logN)* rounds where N is the number of
nodes in the cluster."  about gossip on page 11. Is it wrong to draw on page
15? does it need round 4?

Thanks.

Shen


Re: Question about hinted handoff

2010-07-08 Thread Anty
Sorry I am wrong .Miss the CF=one.

On Thu, Jul 8, 2010 at 5:27 PM, Anty  wrote:

>
>
> On Thu, Jul 8, 2010 at 4:11 PM, ChingShen  wrote:
>
>> Hi all,
>>
>>   Please consider this case: (RF=1, CL=ONE)
>>
>>   1. I have A, B and C nodes.
>>   2. A node is a coordinator node, it sends a request to B node to do
>> write operation.
>>
> No ,will not choose B , write the data locally in Node A.
> if RF=2
> may choose C as hint  and replica node.
>
>>   3. B node is down during write operation, so return failure message to
>> client, and write a hint to C node.
>>   4. B node comes back up, then C node forwards the data to it.
>>   5. B node own data right now, although the write operation is failure.
>>
>>   Correctly?
>>
>> Thanks.
>>
>> Shen
>>
>
>
>
> --
> Best Regards
> Anty Rao
>



-- 
Best Regards
Anty Rao


Re: Question about hinted handoff

2010-07-08 Thread ChingShen
So, am I correctly?

Shen

On Thu, Jul 8, 2010 at 5:33 PM, Anty  wrote:

> Sorry I am wrong .Miss the CF=one.
>
>
> On Thu, Jul 8, 2010 at 5:27 PM, Anty  wrote:
>
>>
>>
>> On Thu, Jul 8, 2010 at 4:11 PM, ChingShen wrote:
>>
>>> Hi all,
>>>
>>>   Please consider this case: (RF=1, CL=ONE)
>>>
>>>   1. I have A, B and C nodes.
>>>   2. A node is a coordinator node, it sends a request to B node to do
>>> write operation.
>>>
>> No ,will not choose B , write the data locally in Node A.
>> if RF=2
>> may choose C as hint  and replica node.
>>
>>>   3. B node is down during write operation, so return failure message to
>>> client, and write a hint to C node.
>>>   4. B node comes back up, then C node forwards the data to it.
>>>   5. B node own data right now, although the write operation is failure.
>>>
>>>   Correctly?
>>>
>>> Thanks.
>>>
>>> Shen
>>>
>>
>>
>>
>> --
>> Best Regards
>> Anty Rao
>>
>
>
>
> --
> Best Regards
> Anty Rao
>


Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-08 Thread Julie
Jonathan Ellis  gmail.com> writes:

> "SSTables that are obsoleted by a compaction are deleted
> asynchronously when the JVM performs a GC. You can force a GC from
> jconsole if necessary, but Cassandra will force one itself if it
> detects that it is low on space. A compaction marker is also added to
> obsolete sstables so they can be deleted on startup if the server does
> not perform a GC before being restarted.
> 
> "CFStoreMBean exposes sstable space used as getLiveDiskSpaceUsed (only
> includes size of non-obsolete files) and getTotalDiskSpaceUsed
> (includes everything)."
> 

Thank you so much for your help.
If I'm reading this right, it sounds like the extra 76 GB of disk space being
used could be due to SSTables that are obsolete due to compaction but not yet
deleted.  But would I be able to see a big difference in cfstats then between
Space used (live) and Space used (total)?  Here's what is being reported for
this node:
Space used (live): 113946099884
Space used (total): 113946099884






Re: High CPU usage on all nodes without any read or write

2010-07-08 Thread Olivier Rosello
Hi,

Thank you for your help.

I don't know if data is writing too fast to the cluster, but I don't think so 
(nodes are heavy, big CPU, 12GB RAM...) and there is no so much data (2000 
inserts/sec for about 300 KB/sec of raw data).


I trashed all data yesterday 6pm (GMT+2) and launched all again.

All was fine since now : one node (total of 4) begins to make timeouts on write 
(see cfstats and tpstats below).

CPU is between 100% and 250% (writing on cluster continues).


r...@cassandra-2:~# iostat -x -k 5
Linux 2.6.31-22-server (cassandra-2)07/08/2010  _x86_64_(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   3.350.000.600.020.00   96.03

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.9296.850.342.3544.31   396.77   328.83 
0.08   28.18   1.09   0.29

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  17.950.001.270.050.00   80.73

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.0076.000.60   10.60 2.40   346.4062.29 
0.000.36   0.36   0.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  17.400.000.150.000.00   82.45

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.0023.000.001.20 0.0096.80   161.33 
0.000.00   0.00   0.00



But in Cassandra output log :
r...@cassandra-2:~#  tail -f /var/log/cassandra/output.log 
 INFO 15:32:05,390 GC for ConcurrentMarkSweep: 1359 ms, 4295787600 reclaimed 
leaving 1684169392 used; max is 6563430400
 INFO 15:32:09,875 GC for ConcurrentMarkSweep: 1363 ms, 4296991416 reclaimed 
leaving 1684201560 used; max is 6563430400
 INFO 15:32:14,370 GC for ConcurrentMarkSweep: 1341 ms, 4295467880 reclaimed 
leaving 1684879440 used; max is 6563430400
 INFO 15:32:18,906 GC for ConcurrentMarkSweep: 1343 ms, 4296386408 reclaimed 
leaving 1685489208 used; max is 6563430400
 INFO 15:32:23,564 GC for ConcurrentMarkSweep: 1511 ms, 4296407088 reclaimed 
leaving 1685488744 used; max is 6563430400
 INFO 15:32:28,068 GC for ConcurrentMarkSweep: 1347 ms, 4295383216 reclaimed 
leaving 1686469448 used; max is 6563430400
 INFO 15:32:32,617 GC for ConcurrentMarkSweep: 1376 ms, 4295689192 reclaimed 
leaving 1687908304 used; max is 6563430400
 INFO 15:32:37,283 GC for ConcurrentMarkSweep: 1468 ms, 4296056176 reclaimed 
leaving 1687916880 used; max is 6563430400
 INFO 15:32:41,811 GC for ConcurrentMarkSweep: 1358 ms, 4296412232 reclaimed 
leaving 1688437064 used; max is 6563430400
 INFO 15:32:46,436 GC for ConcurrentMarkSweep: 1368 ms, 4296105472 reclaimed 
leaving 1691050032 used; max is 6563430400
 INFO 15:32:51,180 GC for ConcurrentMarkSweep: 1545 ms, 4297439832 reclaimed 
leaving 1691033816 used; max is 6563430400
 INFO 15:32:55,703 GC for ConcurrentMarkSweep: 1379 ms, 4295491928 reclaimed 
leaving 1692891456 used; max is 6563430400
 INFO 15:33:00,328 GC for ConcurrentMarkSweep: 1378 ms, 4296657208 reclaimed 
leaving 1694981528 used; max is 6563430400

(this don't appears to other nodes, which are currently ok)


Is this value could be linked to the problem :
Compacted row maximum size: 1202492950
 ?

I supposed that uncompacted, the row may be bigger than 2^31 bytes as written 
here :
http://wiki.apache.org/cassandra/CassandraLimitations?highlight=(related%20limitation)

Keyspace: system
Read Count: 184
Read Latency: 156.3704456521739 ms.
Write Count: 591571
Write Latency: 0.8834233777517829 ms.
Pending Tasks: 0
Column Family: HintsColumnFamily
SSTable count: 13
Space used (live): 4974145
Space used (total): 4974145
Memtable Columns Count: 30430
Memtable Data Size: 289085
Memtable Switch Count: 55
Read Count: 181
Read Latency: 158,899 ms.
Write Count: 591564
Write Latency: 0,883 ms.
Pending Tasks: 0
Key cache capacity: 16
Key cache size: 2
Key cache hit rate: 0.75
Row cache: disabled
Compacted row minimum size: 1630
Compacted row maximum size: 336507
Compacted row mean size: 155823

Column Family: LocationInfo
SSTable count: 2
Space used (live): 1225
Space used (total): 1225
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 2
Read Count: 3
Read Latency: 3,844 ms.
Write Count: 7
Write Latency: 0,222 ms.
Pending Tasks: 0
  

Re: Question about hinted handoff

2010-07-08 Thread Anty
On Thu, Jul 8, 2010 at 4:11 PM, ChingShen  wrote:

> Hi all,
>
>   Please consider this case: (RF=1, CL=ONE)
>
>   1. I have A, B and C nodes.
>   2. A node is a coordinator node, it sends a request to B node to do write
> operation.
>   3. B node is down during write operation, so return failure message to
> client, and write a hint to C node.
>
I think node A will return failure message to client.
and will not write a hint to C node.


>   4. B node comes back up, then C node forwards the data to it.
>   5. B node own data right now, although the write operation is failure.
>
>   Correctly?
>
> Thanks.
>
> Shen
>



-- 
Best Regards
Anty Rao


Re: Question about hinted handoff

2010-07-08 Thread ChingShen
If so, when does hinted handoff work?

On Thu, Jul 8, 2010 at 9:55 PM, Anty  wrote:

>
>
> On Thu, Jul 8, 2010 at 4:11 PM, ChingShen  wrote:
>
>> Hi all,
>>
>>   Please consider this case: (RF=1, CL=ONE)
>>
>>   1. I have A, B and C nodes.
>>   2. A node is a coordinator node, it sends a request to B node to do
>> write operation.
>>   3. B node is down during write operation, so return failure message to
>> client, and write a hint to C node.
>>
> I think node A will return failure message to client.
> and will not write a hint to C node.
>
>
>>   4. B node comes back up, then C node forwards the data to it.
>>   5. B node own data right now, although the write operation is failure.
>>
>>   Correctly?
>>
>> Thanks.
>>
>> Shen
>>
>
>
>
> --
> Best Regards
> Anty Rao
>


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 12:45 AM, ChingShen  wrote:
> hmm... I'm really confused.
> The http://wiki.apache.org/cassandra/API document mentioned that if write
> ConsistencyLevel=ANY that "Ensure the write has been written to at least 1
> node, including hinted recipients.", I couldn't imagine this case. :(
>
> If I have A,B,C and D nodes(RF=1), and write ConsistencyLevel=ANY, so A
> coordinator node sends a write request to another node(e.g. B node), but B
> node is down during write operation, what happend? return failure message to
> client immediately? or write a hint to another node(e.g. C node)

It will write a hint and report success.

But if you were writing at CL.ONE it would fail the write because a
hinted write isn't readable until it can be forwarded to the "right"
node (here, B).

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Reading all rows in a column family in parallel

2010-07-08 Thread Jonathan Ellis
"CFRR does this.  Is this possible?"

I guess I don't understand the question. :)

On Thu, Jul 8, 2010 at 2:21 AM, Brent N. Chun  wrote:
> Hello,
>
> I'm running Cassandra 0.6.0 on a cluster and have an application that needs
> to read all rows from a column family using the Cassandra Thrift API.
> Ideally, I'd like to be able to do this by having all nodes in the cluster
> read in parallel (i.e., each node reads a disjoint set of rows that are
> stored locally). I should also mention that I'm using the RandomPartitioner.
>
> Here's what I was thinking:
>
>  1. Have one node invoke describe_ring to find the token range on the ring
> that each node is responsible for.
>
>  2. For each token range, have the node that owns that portion of the ring
> read the rows in that range using a sequence of get_range_slices calls
> (using start/end tokens, not keys).
>
> This type of functionality seems to already be there in the tree with the
> recent Cassandra/Hadoop integration.
>
> ...
> KeyRange keyRange = new KeyRange(batchRowCount)
>        .setStart_token(startToken)
>        .setEnd_token(split.getEndToken());
> try
> {
>    rows = client.get_range_slices(new ColumnParent(cfName),
>           predicate,
>           keyRange,
>           ConsistencyLevel.ONE);
>     ...
>
>    // prepare for the next slice to be read
>    KeySlice lastRow = rows.get(rows.size() - 1);
>    IPartitioner p = DatabaseDescriptor.getPartitioner();
>    byte[] rowkey = lastRow.getKey();
>    startToken = p.getTokenFactory().toString(p.getToken(rowkey));
> ...
>
> The above snippet from ColumnFamilyRecordReader.java seems to suggest it is
> possible to scan an entire column family by reading disjoint sets of rows
> using token-based range queries (as opposed to key-based range queries). Is
> this possible in 0.6.0? (Note: for the next startToken, I was just planning
> on computing the MD5 digest of the last key directly since I'm accessing
> Cassandra through Thrift.)
>
> Thoughts?
>
> bnc
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread ChingShen
Thanks Jonathan Ellis,

  I want to make sure that after A return failure message to client at
CL.ONE, *does A write a hint to C?* If so, although the write operation is
failed, but the data is still stored in C? if B comes back up, then C
forwards to B?

Shen

On Thu, Jul 8, 2010 at 10:08 PM, Jonathan Ellis  wrote:

> On Thu, Jul 8, 2010 at 12:45 AM, ChingShen 
> wrote:
> > hmm... I'm really confused.
> > The http://wiki.apache.org/cassandra/API document mentioned that if
> write
> > ConsistencyLevel=ANY that "Ensure the write has been written to at least
> 1
> > node, including hinted recipients.", I couldn't imagine this case. :(
> >
> > If I have A,B,C and D nodes(RF=1), and write ConsistencyLevel=ANY, so A
> > coordinator node sends a write request to another node(e.g. B node), but
> B
> > node is down during write operation, what happend? return failure message
> to
> > client immediately? or write a hint to another node(e.g. C node)
>
> It will write a hint and report success.
>
> But if you were writing at CL.ONE it would fail the write because a
> hinted write isn't readable until it can be forwarded to the "right"
> node (here, B).
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 10:23 AM, ChingShen  wrote:
> Thanks Jonathan Ellis,
>
>   I want to make sure that after A return failure message to client at
> CL.ONE, does A write a hint to C?

No.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Use of multiple Keyspaces

2010-07-08 Thread Dwight Smith
Hi

 

I am new to Cassandra and am preparing a data model for use in a
production environment, and need to decide if using multiple keyspaces
has any benefit.  

 

There are basically two types of data; the first,  large numbers (
~1750K) of entries which are written, very few reads, and then removed
after several seconds to several days. The keys are MD5 generated from
the content being written.  The second type, ~ 60K, entries written,
accessed with get_range_slices, then based on the time indicated in the
content, perform an action, then delete the specific entry from
Cassandra.  There are three columns for the second type, time to action
Key ( MD5 of action information ) - column TimeToScheduleAction, action
key to time - column ScheduledActionToTime, and finally action key to
action information - ActionToScheduledAction.

 

Currently these are members of two separate keyspaces.  Separate
keyspaces were chosen since the data volume was significantly different,
and as I understand, the memtables are dependent upon the data volume,
if KeysCached is not zero. Separate keyspaces would speed up the
memtable access for both.  In addition, it seems the compaction would
benefit.
 
Comments please
 
Thanks much
 
Dwight 

 



---
CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain 
confidential and proprietary information of Alcatel-Lucent and/or its 
affiliated entities. Access by the intended recipient only is authorized. Any 
liability arising from any party acting, or refraining from acting, on any 
information contained in this e-mail is hereby excluded. If you are not the 
intended recipient, please notify the sender immediately, destroy the original 
transmission and its attachments and do not disclose the contents to any other 
person, use it for any purpose, or store or copy the information in any medium. 
Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or 
its affiliated entities.



Re: Backing up the data stored in cassandra

2010-07-08 Thread Jonathan Ellis
see http://wiki.apache.org/cassandra/Operations

On Thu, Jul 8, 2010 at 12:50 AM, Dave Viner  wrote:
> Hi all,
> What is the recommended strategy for backing up the data stored inside
> cassandra?
> I realized that Cass. is a distributed database, and with a decent
> replication factor, backups are "already done" in some sense.  But, as a
> relatively new user, I'm always concerned that the data is only within the
> system and not stored *anywhere* else.
> In an earlier email in the list, the recommendation was:
>
> Until tickets 193 and 520 are done, the easiest thing is to copy all
> the sstables from the other nodes that have replicas for the ranges it
> is responsible for (e.g. for replication factor of 3 on rack unaware
> partitioner, the nodes before it and the node after it on the right
> would suffice), and then run nodeprobe cleanup to clear out the
> excess.
>
> Is this still the recommended approach?  If I backed up the files in
> DataDirectories/*, is it possible to restore a node using those files?
> (That is, bring up a new node, copy the backed up files from the crashed
> node onto the new node, then have the new node join the cluster?)
>
> Thanks
>
> Dave Viner
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Reading all rows in a column family in parallel

2010-07-08 Thread Thomas Heller
Hey,

>  Is
> this possible in 0.6.0? (Note: for the next startToken, I was just planning
> on computing the MD5 digest of the last key directly since I'm accessing
> Cassandra through Thrift.)

Can't speak for 0.6.0 but it works for 0.6.3.

Just implemented this in ruby (minus the parallel part).

Cheers,
/thomas


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread ChingShen
Hmm.. as you mentioned that it will *write a hint *and report success at
CL.ANY, does the hinted handoff only work at CL.ANY?

Thanks.

On Thu, Jul 8, 2010 at 11:29 PM, Jonathan Ellis  wrote:

> On Thu, Jul 8, 2010 at 10:23 AM, ChingShen 
> wrote:
> > Thanks Jonathan Ellis,
> >
> >   I want to make sure that after A return failure message to client at
> > CL.ONE, does A write a hint to C?
>
> No.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Understanding atomicity in Cassandra

2010-07-08 Thread Stuart Langridge
Hi, Cassandra people!

We're looking at Cassandra as a possible replacement for some parts of
our database structures, and on an early look I'm a bit confused about
atomicity guarantees and rollbacks and such, so I wanted to ask what
standard practice is for dealing with the sorts of situation I outline
below.

Imagine that we're storing information about files. Each file has a path
and a uuid, and sometimes we need to look up stuff about a file by its
path and sometimes by its uuid. The best way to do this, as I understand
it, is to store the data in Cassandra twice: once indexed by nodeid and
once by path. So, I have two ColumnFamilies, one indexed by uuid:

{
  "some-uuid-1": {
"path": "/a/b/c",
"size": 10
  },
  "some-uuid-2" {
...
  },
  ...
}

and one indexed by path

{
  "/a/b/c": {
"uuid": "some-uuid-1",
"size": 10
  },
  "/d/e/f" {
...
  },
  ...
}

So, first, do please correct me if I've misunderstood the terminology
here (and I've shown a "short form" of ColumnFamily here, as per
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).

The thing I don't quite get is: what happens when I want to add a new
file? I need to add it to both these ColumnFamilies, but there's no "add
it to both" atomic operation. What's the way that people handle the
situation where I add to the first CF and then my program crashes, so I
never added to the second? (Assume that there is lots more data than
I've outlined above, so that "put it all in one SuperColumnFamily,
because that can be updated atomically" won't work because it would end
up with our entire database in one SCF). Should we add to one, and then
if we fail to add to the other for some reason continually retry until
it works? Have a "garbage collection" procedure which finds
discrepancies between indexes like this and fixes them up and run it
from cron? We'd love to hear some advice on how to do this, or if we're
modelling the data in the wrong way and there's a better way which
avoids these problems!

sil




Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Benjamin Black
On Thu, Jul 8, 2010 at 9:02 AM, ChingShen  wrote:
> Hmm.. as you mentioned that it will write a hint and report success at
> CL.ANY, does the hinted handoff only work at CL.ANY?
>

Still no.  Hints are written when nodes are down, regardless of CL,
unless HH is disabled.  CL does not influence whether hints are
written, it influences whether success is reported to the client.  For
CL.ANY a hint is a success, for CL.ONE it is a failure.

b


Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
There is a memtable per CF, regardless of how many keyspaces you have.
 I'd pay more
attention to the delete/compaction side of things if you are going to
be doing that many
deletions.

Also, your mail client's formatting is broken.


b

On Thu, Jul 8, 2010 at 8:45 AM, Dwight Smith
 wrote:
> Hi
>
>
>
> I am new to Cassandra and am preparing a data model for use in a production
> environment, and need to decide if using multiple keyspaces has any
> benefit.
>
>
>
> There are basically two types of data; the first,  large numbers ( ~1750K)
> of entries which are written, very few reads, and then removed after several
> seconds to several days. The keys are MD5 generated from the content being
> written.  The second type, ~ 60K, entries written, accessed with
> get_range_slices, then based on the time indicated in the content, perform
> an action, then delete the specific entry from Cassandra.  There are three
> columns for the second type, time to action Key ( MD5 of action information
> ) – column TimeToScheduleAction, action key to time – column
> ScheduledActionToTime, and finally action key to action information -
> ActionToScheduledAction.
>
>
>
> Currently these are members of two separate keyspaces.  Separate keyspaces
> were chosen since the data volume was significantly different, and as I
> understand, the memtables are dependent upon the data volume, if KeysCached
> is not zero. Separate keyspaces would speed up the memtable access for
> both.  In addition, it seems the compaction would benefit.
>
>
>
> Comments please
>
>
>
> Thanks much
>
>
>
> Dwight
>
>
>
>
>
> CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain
> confidential and proprietary information of Alcatel-Lucent and/or its
> affiliated entities. Access by the intended recipient only is authorized.
> Any liability arising from any party acting, or refraining from acting, on
> any information contained in this e-mail is hereby excluded. If you are not
> the intended recipient, please notify the sender immediately, destroy the
> original transmission and its attachments and do not disclose the contents
> to any other person, use it for any purpose, or store or copy the
> information in any medium. Copyright in this e-mail and any attachments
> belongs to Alcatel-Lucent and/or its affiliated entities.


Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
(and I'm sure someone will correct me if I am wrong on that)

On Thu, Jul 8, 2010 at 11:24 AM, Benjamin Black  wrote:
> There is a memtable per CF, regardless of how many keyspaces you have.


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 1:19 PM, Benjamin Black  wrote:
> On Thu, Jul 8, 2010 at 9:02 AM, ChingShen  wrote:
>> Hmm.. as you mentioned that it will write a hint and report success at
>> CL.ANY, does the hinted handoff only work at CL.ANY?
>>
>
> Still no.  Hints are written when nodes are down, regardless of CL,
> unless HH is disabled.  CL does not influence whether hints are
> written, it influences whether success is reported to the client.  For
> CL.ANY a hint is a success, for CL.ONE it is a failure.

If the coordinator knows it can't achieve the requested CL it won't do
any writes, hinted or otherwise, and will immediately report
UnavailableException to the client.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


RE: Use of multiple Keyspaces

2010-07-08 Thread Dwight Smith
Thanks - I found on Wiki that the memtables and sstables are on a per CF
basis. 

Sorry about the mail client formatting - I have no choice - corporate
controlled:)

Now I am concerned about the deletions - what areas should I investigate
to understand the concerns you raise?

Thanks again

-Original Message-
From: Benjamin Black [mailto:b...@b3k.us] 
Sent: Thursday, July 08, 2010 11:28 AM
To: user@cassandra.apache.org
Subject: Re: Use of multiple Keyspaces

(and I'm sure someone will correct me if I am wrong on that)

On Thu, Jul 8, 2010 at 11:24 AM, Benjamin Black  wrote:
> There is a memtable per CF, regardless of how many keyspaces you have.


---
CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain 
confidential and proprietary information of Alcatel-Lucent and/or its 
affiliated entities. Access by the intended recipient only is authorized. Any 
liability arising from any party acting, or refraining from acting, on any 
information contained in this e-mail is hereby excluded. If you are not the 
intended recipient, please notify the sender immediately, destroy the original 
transmission and its attachments and do not disclose the contents to any other 
person, use it for any purpose, or store or copy the information in any medium. 
Copyright in this e-mail and any attachments belongs to Alcatel-Lucent and/or 
its affiliated entities.



Re: Use of multiple Keyspaces

2010-07-08 Thread Benjamin Black
as rcoli just reminded me, i should be more clear that it is 1
_active_ memtable per CF, but there may be several pending flush.

space from deletions is only reclaimed after GCGraceSeconds has
elapsed AND a major compaction is run.  default for the former is 10
days.  the latter is not automatic.

On Thu, Jul 8, 2010 at 11:32 AM, Dwight Smith
 wrote:
> Thanks - I found on Wiki that the memtables and sstables are on a per CF
> basis.
>
> Sorry about the mail client formatting - I have no choice - corporate
> controlled:)
>
> Now I am concerned about the deletions - what areas should I investigate
> to understand the concerns you raise?
>
> Thanks again
>
> -Original Message-
> From: Benjamin Black [mailto:b...@b3k.us]
> Sent: Thursday, July 08, 2010 11:28 AM
> To: user@cassandra.apache.org
> Subject: Re: Use of multiple Keyspaces
>
> (and I'm sure someone will correct me if I am wrong on that)
>
> On Thu, Jul 8, 2010 at 11:24 AM, Benjamin Black  wrote:
>> There is a memtable per CF, regardless of how many keyspaces you have.
>
>
> ---
> CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain 
> confidential and proprietary information of Alcatel-Lucent and/or its 
> affiliated entities. Access by the intended recipient only is authorized. Any 
> liability arising from any party acting, or refraining from acting, on any 
> information contained in this e-mail is hereby excluded. If you are not the 
> intended recipient, please notify the sender immediately, destroy the 
> original transmission and its attachments and do not disclose the contents to 
> any other person, use it for any purpose, or store or copy the information in 
> any medium. Copyright in this e-mail and any attachments belongs to 
> Alcatel-Lucent and/or its affiliated entities.
>
>


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Benjamin Black
Important safety tip, I did not know that.

On Thu, Jul 8, 2010 at 11:31 AM, Jonathan Ellis  wrote:
> On Thu, Jul 8, 2010 at 1:19 PM, Benjamin Black  wrote:
>> On Thu, Jul 8, 2010 at 9:02 AM, ChingShen  wrote:
>>> Hmm.. as you mentioned that it will write a hint and report success at
>>> CL.ANY, does the hinted handoff only work at CL.ANY?
>>>
>>
>> Still no.  Hints are written when nodes are down, regardless of CL,
>> unless HH is disabled.  CL does not influence whether hints are
>> written, it influences whether success is reported to the client.  For
>> CL.ANY a hint is a success, for CL.ONE it is a failure.
>
> If the coordinator knows it can't achieve the requested CL it won't do
> any writes, hinted or otherwise, and will immediately report
> UnavailableException to the client.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Benjamin Black
To clarify, this requires the coordinator know nodes are down.  If the
nodes are marked UP, but do not confirm the writes, this behavior does
not seem possible.

On Thu, Jul 8, 2010 at 11:31 AM, Jonathan Ellis  wrote:
> On Thu, Jul 8, 2010 at 1:19 PM, Benjamin Black  wrote:
>> On Thu, Jul 8, 2010 at 9:02 AM, ChingShen  wrote:
>>> Hmm.. as you mentioned that it will write a hint and report success at
>>> CL.ANY, does the hinted handoff only work at CL.ANY?
>>>
>>
>> Still no.  Hints are written when nodes are down, regardless of CL,
>> unless HH is disabled.  CL does not influence whether hints are
>> written, it influences whether success is reported to the client.  For
>> CL.ANY a hint is a success, for CL.ONE it is a failure.
>
> If the coordinator knows it can't achieve the requested CL it won't do
> any writes, hinted or otherwise, and will immediately report
> UnavailableException to the client.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Coke Products at Digg?

2010-07-08 Thread malcolm smith
I thought it was NoCola solutions or NotOnlyCola rather than UnCola.


On Wed, Jul 7, 2010 at 11:55 AM, Miguel Verde wrote:

> Dr. Pepper has recently been picked up by Coca Cola as well.  I wonder if
> the UnCola solutions like 7Up and Fanta are just a fad?
>
>
> On Wed, Jul 7, 2010 at 10:50 AM, Mike Malone  wrote:
>
>> On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans  wrote:
>>
>>>
>>> I heard a rumor that Digg was moving away from Coca-Cola products in all
>>> of its vending machines and break rooms. Can anyone from Digg comment on
>>> this?
>>>
>>> My near-term beverage consumption strategy is based largely on my
>>> understanding of Digg's, so if there has been a change, I may need to
>>> reevaluate.
>>>
>>
>> Not sure about Digg, but I heard Twitter is switching over to Fanta. It's
>> been adopted by Coke so it must be fairly stable. There's not as much
>> flexibility in the product lineup, but what they do offer is extremely
>> delicious. Just my $0.02.
>>
>> Mike
>>
>
>


Re: Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun

Hi Jonathan,

The code snippet below was from the repository. I mentioned 0.6.0 
specifically just to confirm that reading a CF using token-based range 
queries with the RandomPartitioner should (or shouldn't) also work in 
that version. I've seen discussions about whether range queries are now 
supported with the RandomPartitioner for example. Moreover, those 
discussions mostly seem to involve key-based range queries, though, not 
token-based range queries like CFRR uses. If you're saying that this 
functionality essentially works for everyone but me in 0.6.0, then that 
implies I have a bug in my code which would be great news for me. What 
I'm essentially seeing is either all rows, all rows + duplicate rows, or 
missing rows even when using a single node. Which of these I get is 
entirely deterministic. If I delete all the data and insert the same 
rows, the ranges returned by describe_ring changes but the end result of 
reading the CF is then one of those three cases.


Thanks,
bnc

Jonathan Ellis wrote:

"CFRR does this.  Is this possible?"

I guess I don't understand the question. :)


http://scale.metaoptimize.com/

2010-07-08 Thread Ran Tavory
Just found this site and thought it might be interesting to folks on this
list.
http://scale.metaoptimize.com/
It's a stack-overflow style qna site, in their words:

> A community interested in scalability, high availability, data stores,
> NoSQL, distributed computing, parallel computing, cloud computing, elastic
> computing, HPC, grid computing, AWS, crawling, failover, redundancy, and
> concurrency.


Visual Tools for Cassandra

2010-07-08 Thread Torla, William
Does anybody know of any recently developed UI based tools for Cassandra? 
Ideally a tool capable of seeing nodes across a cluster would be preferred.


The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above. If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited. If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files. 


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
Right, if the nodes are marked up but do not confirm the writes, it
will result in a TimedOutException.  (It still won't generate hinted
writes).

To summarize: hinted writes are only generated when Cassandra (a)
knows a target is down ahead of time and (b) still has enough UP
targets to satisfy the requested CL.

On Thu, Jul 8, 2010 at 1:48 PM, Benjamin Black  wrote:
> To clarify, this requires the coordinator know nodes are down.  If the
> nodes are marked UP, but do not confirm the writes, this behavior does
> not seem possible.
>
> On Thu, Jul 8, 2010 at 11:31 AM, Jonathan Ellis  wrote:
>> On Thu, Jul 8, 2010 at 1:19 PM, Benjamin Black  wrote:
>>> On Thu, Jul 8, 2010 at 9:02 AM, ChingShen  wrote:
 Hmm.. as you mentioned that it will write a hint and report success at
 CL.ANY, does the hinted handoff only work at CL.ANY?

>>>
>>> Still no.  Hints are written when nodes are down, regardless of CL,
>>> unless HH is disabled.  CL does not influence whether hints are
>>> written, it influences whether success is reported to the client.  For
>>> CL.ANY a hint is a success, for CL.ONE it is a failure.
>>
>> If the coordinator knows it can't achieve the requested CL it won't do
>> any writes, hinted or otherwise, and will immediately report
>> UnavailableException to the client.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Reading all rows in a column family in parallel

2010-07-08 Thread Jonathan Ellis
There have been a number of bug fixes to this since 0.6.0 -- as Thomas
said, it works in 0.6.3.  (Although there is one related bug scheduled
to be fixed in 0.6.4,
https://issues.apache.org/jira/browse/CASSANDRA-1042)

On Thu, Jul 8, 2010 at 2:06 PM, Brent N. Chun  wrote:
> Hi Jonathan,
>
> The code snippet below was from the repository. I mentioned 0.6.0
> specifically just to confirm that reading a CF using token-based range
> queries with the RandomPartitioner should (or shouldn't) also work in that
> version. I've seen discussions about whether range queries are now supported
> with the RandomPartitioner for example. Moreover, those discussions mostly
> seem to involve key-based range queries, though, not token-based range
> queries like CFRR uses. If you're saying that this functionality essentially
> works for everyone but me in 0.6.0, then that implies I have a bug in my
> code which would be great news for me. What I'm essentially seeing is either
> all rows, all rows + duplicate rows, or missing rows even when using a
> single node. Which of these I get is entirely deterministic. If I delete all
> the data and insert the same rows, the ranges returned by describe_ring
> changes but the end result of reading the CF is then one of those three
> cases.
>
> Thanks,
> bnc
>
> Jonathan Ellis wrote:
>>
>> "CFRR does this.  Is this possible?"
>>
>> I guess I don't understand the question. :)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Visual Tools for Cassandra

2010-07-08 Thread Eben Hewitt
Suguru Namura's Web Console may have some of what you need:

http://github.com/suguru/cassandra-webconsole

Eben

On Thu, Jul 8, 2010 at 1:00 PM, Torla, William wrote:

>  Does anybody know of any recently developed UI based tools for Cassandra?
> Ideally a tool capable of seeing nodes across a cluster would be preferred.
>
> --
> The information contained in this communication may be CONFIDENTIAL and is
> intended only for the use of the recipient(s) named above. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication, or any of its contents, is
> strictly prohibited. If you have received this communication in error,
> please notify the sender and delete/destroy the original message and any
> copy of it from your computer or paper files.
>
>


-- 
"In science there are no 'depths'; there is surface everywhere."
--Rudolph Carnap


Re: Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun

Jonathan Ellis wrote:

There have been a number of bug fixes to this since 0.6.0 -- as Thomas
said, it works in 0.6.3.  (Although there is one related bug scheduled
to be fixed in 0.6.4,
https://issues.apache.org/jira/browse/CASSANDRA-1042)


Ah, this is exactly one of the cases I've been seeing! Thanks, Jonathan.

bnc


Re: Coke Products at Digg?

2010-07-08 Thread Daniel Jue
We've developed a beverage API called Koozie which allows drinkers to
remain soda agnostic.
It supports all popular canned liquids and Drink Injection thought its
integrated Inversion Of Can container.

On Thu, Jul 8, 2010 at 2:55 PM, malcolm smith
 wrote:
> I thought it was NoCola solutions or NotOnlyCola rather than UnCola.
>
> On Wed, Jul 7, 2010 at 11:55 AM, Miguel Verde 
> wrote:
>>
>> Dr. Pepper has recently been picked up by Coca Cola as well.  I wonder if
>> the UnCola solutions like 7Up and Fanta are just a fad?
>>
>> On Wed, Jul 7, 2010 at 10:50 AM, Mike Malone  wrote:
>>>
>>> On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans  wrote:

 I heard a rumor that Digg was moving away from Coca-Cola products in all
 of its vending machines and break rooms. Can anyone from Digg comment on
 this?

 My near-term beverage consumption strategy is based largely on my
 understanding of Digg's, so if there has been a change, I may need to
 reevaluate.
>>>
>>> Not sure about Digg, but I heard Twitter is switching over to Fanta. It's
>>> been adopted by Coke so it must be fairly stable. There's not as much
>>> flexibility in the product lineup, but what they do offer is extremely
>>> delicious. Just my $0.02.
>>> Mike
>
>


Re: Reading all rows in a column family in parallel

2010-07-08 Thread Brent N. Chun

Thomas Heller wrote:

Hey,


 Is
this possible in 0.6.0? (Note: for the next startToken, I was just planning
on computing the MD5 digest of the last key directly since I'm accessing
Cassandra through Thrift.)


Can't speak for 0.6.0 but it works for 0.6.3.

Just implemented this in ruby (minus the parallel part).

Cheers,
/thomas


Hm, I must be doing something fundamentally wrong then. I just tried 0.6.3, same 
result. In this example, I have a 1 node system and have 100 rows in a single 
CF. When trying to read it back using token-based range queries and a 
RandomPartitioner, I get the following below (only 33/100 rows returned).


Now the 100 rows have keys that hash to random points on the ring. In the 
example below, I'm reading rows in chunks of 20.


In the first range query, the initial range is the entire ring. The 20 rows 
returned have MD5 hashes in no particular order it seems and could be anywhere 
on the ring. Taking the MD5 hash of the last row's key, I start the second range 
query.


In the second range query ( 292996472659622939455744264432842142924, 
34571752641348786448680284622901156834 ], what's being returned below seems like 
exactly what it suggests: return rows in the above range of MD5 hashes. But some 
of the remaining 80 rows we want may be outside that range. Hence, only 33 rows 
below.


If the rows were being returned in the token-based range queries were in in MD5 
hash order (and handled wraps ideally), then it seems like this interface could 
work. But others seem to be using this functionality successfully, so that 
suggests this is somehow unnecessary. Can someone help me out here?


Thanks,
bnc



Scanning range 0 ( 34571752641348786448680284622901156834, 
34571752641348786448680284622901156834 ]
Scanning chunk ( 34571752641348786448680284622901156834, 
34571752641348786448680284622901156834 ] in range 0

Read 20 rows
Read row 0, token 336932469034906281211924193433194809371, key 0_my_key62
Read row 1, token 5919946189209861803345840641668714978, key G_my_key16
Read row 2, token 6676056754427192599913432294390467082, key N_my_key85
Read row 3, token 330974738873996707017206868970060026330, key 6_my_key6
Read row 4, token 9595097897929687061907189837471352784, key E_my_key14
Read row 5, token 16575788966172751729835323651471549632, key a_my_key98
Read row 6, token 20927090112620661198733690835293074593, key 5_my_key67
Read row 7, token 28411545431179372696834683157677733478, key B_my_key73
Read row 8, token 29636277939148773659952116897998650776, key Q_my_key26
Read row 9, token 31186550159320208451777665196866508345, key j_my_key45
Read row 10, token 309081729348188654502493750295907191249, key D_my_key75
Read row 11, token 308480936859450293438865473928962136114, key W_my_key32
Read row 12, token 33060929359846763792204741553927689627, key Q_my_key88
Read row 13, token 36834373239213294576855495985365240744, key D_my_key13
Read row 14, token 302818545694924710056493830778421143168, key C_my_key12
Read row 15, token 39723252966237722984897584840501933181, key I_my_key18
Read row 16, token 297899763604776667052026292305780186395, key 2_my_key2
Read row 17, token 45994786947573748381278100108617428931, key U_my_key92
Read row 18, token 294076607175826631726358986726954934589, key T_my_key29
Read row 19, token 292996472659622939455744264432842142924, key M_my_key84
Scanning chunk ( 292996472659622939455744264432842142924, 
34571752641348786448680284622901156834 ] in range 0

Read 13 rows
Read row 20, token 336932469034906281211924193433194809371, key 0_my_key62
Read row 21, token 5919946189209861803345840641668714978, key G_my_key16
Read row 22, token 6676056754427192599913432294390467082, key N_my_key85
Read row 23, token 330974738873996707017206868970060026330, key 6_my_key6
Read row 24, token 9595097897929687061907189837471352784, key E_my_key14
Read row 25, token 16575788966172751729835323651471549632, key a_my_key98
Read row 26, token 20927090112620661198733690835293074593, key 5_my_key67
Read row 27, token 28411545431179372696834683157677733478, key B_my_key73
Read row 28, token 29636277939148773659952116897998650776, key Q_my_key26
Read row 29, token 31186550159320208451777665196866508345, key j_my_key45
Read row 30, token 309081729348188654502493750295907191249, key D_my_key75
Read row 31, token 308480936859450293438865473928962136114, key W_my_key32
Read row 32, token 33060929359846763792204741553927689627, key Q_my_key88
Scanning chunk ( 33060929359846763792204741553927689627, 
34571752641348786448680284622901156834 ] in range 0

Read 0 rows




get_range_slices

2010-07-08 Thread Jonathan Shook
Should I ever expect multiples of the same key (with non-empty column
sets) from the same get_range_slices call?
I've verified that the column data is identical byte-for-byte, as
well, including column timestamps?


Why is cassandra named cassandra?

2010-07-08 Thread ChingShen
Hi,

  Why is cassandra named cassandra?

Thanks.

Shen


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread ChingShen
On Fri, Jul 9, 2010 at 4:32 AM, Jonathan Ellis  wrote:
> If the coordinator knows it can't achieve the requested CL it won't do
> any writes, hinted or otherwise, and will immediately report
> UnavailableException to the client.

> To summarize: hinted writes are only generated when Cassandra (a)
> knows a target is down ahead of time and (b) still has enough UP
> targets to satisfy the requested CL.

Ok, If so, I suppose that A sends requests to B, C and D nodes(RF=3) at *
CL.QUORUM*, but D is down, then return success message to the client, and* A
write a hint to E node*? until D comes back up then E forwards the data to
D?

Thanks.

Shen


Re: get_range_slices

2010-07-08 Thread Mike Malone
I think the answer to your question is no, you shouldn't.

I'm feeling far too lazy to do even light research on the topic, but I
remember there being a bug where replicas weren't consolidated and you'd get
a result set that included data from each replica that was consulted for a
query. That could be what you're seeing. Are you running the most recent
release? Trying dropping to CL.ONE and see if you only get one copy. If that
fixes it, I'd suggest searching JIRA.

Mike

On Thu, Jul 8, 2010 at 6:40 PM, Jonathan Shook  wrote:

> Should I ever expect multiples of the same key (with non-empty column
> sets) from the same get_range_slices call?
> I've verified that the column data is identical byte-for-byte, as
> well, including column timestamps?
>


Re: Digg 4 Preview on TWiT

2010-07-08 Thread Jeremy Davis
That is an interesting statistic. 1 TB per node?
Care to share any more info on the specs of this cluster? Drive types/Cores
per node/etc...
-JD


On Tue, Jul 6, 2010 at 12:01 PM, Prashant Malik  wrote:

> This is a ridiculous statement by some newbie I guess , We today have a 150
> node Cassandra cluster running Inbox search supporting close to 500M users
> and over 150TB of data  growing rapidly everyday.
>
> I am on pager for this monster :) so its pretty funny to hear this
> statement.
>
> - Prashant
>
>
> On Tue, Jul 6, 2010 at 6:21 AM, Avinash Lakshman <
> avinash.laksh...@gmail.com> wrote:
>
>> FB Inbox Search still runs on Cassandra and will continue to do so. I
>> should know since I maintain it :).
>>
>> Cheers
>> Avinash
>>
>> On Tue, Jul 6, 2010 at 3:34 AM, David Strauss wrote:
>>
>>> On 2010-07-05 15:40, Eric Evans wrote:
>>> > On Sun, 2010-07-04 at 13:14 +0100, Bill de hÓra wrote:
>>> >> This person's understanding is that Facebook 'no longer contributes to
>>> >> nor uses Cassandra.':
>>> >>
>>> >> http://redmonk.com/sogrady/2010/05/17/beyond-cassandra/
>>> >
>>> > Last I heard, Facebook was still using Cassandra for what they had
>>> > always used it for, Inbox Search. Last I heard, there were no plans in
>>> > place to change that.
>>>
>>> I had the opportunity to talk with some Facebook infrastructure
>>> engineers in San Francisco over the past few weeks. They are no longer
>>> using Cassandra, even for inbox search.
>>>
>>> Inbox search was intended to be an initial push for using Cassandra more
>>> broadly, not the primary target of the Cassandra design. Unfortunately,
>>> Facebook's engineers later decided that Cassandra wasn't the right
>>> answer to the right question for Facebook's purposes.
>>>
>>> That decision isn't an indictment of Cassandra's capability; it's
>>> confirmation that Cassandra isn't everything to everyone. But we already
>>> knew that. :-)
>>>
>>> --
>>> David Strauss
>>>   | da...@fourkitchens.com
>>>   | +1 512 577 5827 [mobile]
>>> Four Kitchens
>>>   | http://fourkitchens.com
>>>   | +1 512 454 6659 [office]
>>>   | +1 512 870 8453 [direct]
>>>
>>>
>>
>


Re: Some questions about the write operation and hinted handoff

2010-07-08 Thread Jonathan Ellis
On Thu, Jul 8, 2010 at 10:45 PM, ChingShen  wrote:
> Ok, If so, I suppose that A sends requests to B, C and D nodes(RF=3) at
> CL.QUORUM, but D is down, then return success message to the client, and A
> write a hint to E node? until D comes back up then E forwards the data to D?

If it knows that D is down before it starts, then it will tag the
write to B or C with a "replay this row to D" hint.  Otherwise, it
will attempt to write to D, which will fail.  Either way a QUORUM
write would succeed.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: High CPU usage on all nodes without any read or write

2010-07-08 Thread Peter Schuller
> But in Cassandra output log :
> r...@cassandra-2:~#  tail -f /var/log/cassandra/output.log
>  INFO 15:32:05,390 GC for ConcurrentMarkSweep: 1359 ms, 4295787600 reclaimed 
> leaving 1684169392 used; max is 6563430400
>  INFO 15:32:09,875 GC for ConcurrentMarkSweep: 1363 ms, 4296991416 reclaimed 
> leaving 1684201560 used; max is 6563430400
>  INFO 15:32:14,370 GC for ConcurrentMarkSweep: 1341 ms, 4295467880 reclaimed 
> leaving 1684879440 used; max is 6563430400
>  INFO 15:32:18,906 GC for ConcurrentMarkSweep: 1343 ms, 4296386408 reclaimed 
> leaving 1685489208 used; max is 6563430400
>  INFO 15:32:23,564 GC for ConcurrentMarkSweep: 1511 ms, 4296407088 reclaimed 
> leaving 1685488744 used; max is 6563430400
>  INFO 15:32:28,068 GC for ConcurrentMarkSweep: 1347 ms, 4295383216 reclaimed 
> leaving 1686469448 used; max is 6563430400
>  INFO 15:32:32,617 GC for ConcurrentMarkSweep: 1376 ms, 4295689192 reclaimed 
> leaving 1687908304 used; max is 6563430400
>  INFO 15:32:37,283 GC for ConcurrentMarkSweep: 1468 ms, 4296056176 reclaimed 
> leaving 1687916880 used; max is 6563430400
>  INFO 15:32:41,811 GC for ConcurrentMarkSweep: 1358 ms, 4296412232 reclaimed 
> leaving 1688437064 used; max is 6563430400
>  INFO 15:32:46,436 GC for ConcurrentMarkSweep: 1368 ms, 4296105472 reclaimed 
> leaving 1691050032 used; max is 6563430400
>  INFO 15:32:51,180 GC for ConcurrentMarkSweep: 1545 ms, 4297439832 reclaimed 
> leaving 1691033816 used; max is 6563430400
>  INFO 15:32:55,703 GC for ConcurrentMarkSweep: 1379 ms, 4295491928 reclaimed 
> leaving 1692891456 used; max is 6563430400
>  INFO 15:33:00,328 GC for ConcurrentMarkSweep: 1378 ms, 4296657208 reclaimed 
> leaving 1694981528 used; max is 6563430400

Note that those are ConcurrentMarkSweep GC:s rather than ParNew:s, so
should be running concurrently with the application and should not
correlate to 1.3 second pauses for the application.

> (this don't appears to other nodes, which are currently ok)

As for the discrepancy between nodes, are all nodes handling a similar
amount of traffic? I briefly checked your original post and you said
you're doing TimeUUID insertions. I don't remember off hand, and a
quick google didn't tell me, whether there is something specialy about
the TimeUUID type that would prevent it - but normally if you're using
an OrderedPartitioner you may simply be writing all your data to a
single node for token space division reasons and the fact that
timestamps are highly ordered.

I'm sure someone can comment here.

How big a latency are we talking about in the cases where you're
timing out (i.e., what's the timeout)? Were the timeouts on reads,
writes or both?

-- 
/ Peter Schuller