Re: Does anybody know why Twitter stop integrate Cassandra as Twitter store?

2011-10-05 Thread ruslan usifov
Big thanks for all your replies


Re: CQL select not working for CF defined programatically with Hector API

2011-10-05 Thread Alexandru Sicoe
Perfectly right. Sorry for not paying attention!
Thanks Eric,
Alex

On Tue, Oct 4, 2011 at 4:19 AM, Eric Evans  wrote:

> On Mon, Oct 3, 2011 at 12:02 PM, Alexandru Sicoe 
> wrote:
> > Hi,
> >  I am using Cassandra 0.8.5, Hector 0.8.0-2 and cqlsh (cql 1.0.3). If I
> > define a CF with comparator LongType like this:
> >
> > BasicColumnFamilyDefinition columnFamilyDefinition = new
> > BasicColumnFamilyDefinition();
> > columnFamilyDefinition.setKeyspaceName("XXX");
> > columnFamilyDefinition.setName("YYY");
> > columnFamilyDefinition.setDefaultValidationClass(_BYTESTYPE);
> > columnFamilyDefinition.setMemtableOperationsInMillions(0.1);
> > columnFamilyDefinition.setMemtableThroughputInMb(40);
> >
> columnFamilyDefinition.setComparatorType(ComparatorType.LONGTYPE);
> > try {
> > cluster.addColumnFamily(new
> > ThriftCfDef(columnFamilyDefinition));
> > } catch(HectorException e) {
> > throw e;
> > }
> >
> > Then I put some data in the CF.
> >
> > The I try to do the following queries in cqlsh:
> >
> >   use XXX;
> >   select * from YYY where KEY='aaa';
> >
> > nothing is returned!
> >
> > If I however do:
> >   select * from YYY;
> >
> > all the results are returned propperly!
> >
> > So I have 2 questios:
> > 1) Can I read with CQL if CFs were defined using the basic API? (the fact
> > that select * from YYY; works suggests that this is possible)
> > 2) If yes, what is the correct query to use to read data with CQL? (I
> > suspect KEY is wrong...is there a default?)
>
> I suspect that you did not select a key validation class, and ended up
> with a default of BytesType.  CQL requires that your terms be hex
> encoded when using BytesType.
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>


Why is mutation stage increasing ??

2011-10-05 Thread Philippe
Hello,
I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light as
to what is going on ?
It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats
showed huge increasing MutationStages (in the hundreds of thousands).
I restarted one node and it took a while to reply GB of commitlog. I've
shutdown all clients that write to the cluster and it's just weird

All nodes are still showing huge MutationStages including the new one and
it's either increasing or stable. The pending count is stuck at 32.
Compactionstats shows no compaction on 2 nodes and dozens of Scrub
compactions (all at 100%) on the 3rd one. This is a scrub I did last week
when I encountered assertion errors.
Netstats shows no streams being exchanged at any node but each on is
expecting a few Responses.

Any ideas ?
Thanks

For example (increased to 567062 while I was writing this email)
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 018372664517
0 0
RequestResponseStage  0 010731370183
0 0
MutationStage32565879  295492216
0 0
ReadRepairStage   0 0  23654
0 0
ReplicateOnWriteStage 0 07733659
0 0
GossipStage   0 03502922
0 0
AntiEntropyStage  0 0   1631
0 0
MigrationStage0 0  0
0 0
MemtablePostFlusher   0 0   5716
0 0
StreamStage   0 0 10
0 0
FlushWriter   0 0   5714
0   499
FILEUTILS-DELETE-POOL 0 0773
0 0
MiscStage 0 0   1266
0 0
FlushSorter   0 0  0
0 0
AntiEntropySessions   0 0 18
0 0
InternalResponseStage 0 0  0
0 0
HintedHandoff 0 0   1798
0 0


Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0 1223769753
Responses   n/a 4 1627481305


Re: Shrinking cluster with counters ...

2011-10-05 Thread aaron morton
Is the cluster still in use ? 

The safe way would be to reduce the RF to 1, and then nodetool decomission the 
nodes one at a time. This will cause them to stream data to the remaining 
node(s), and at the same time the node that takes ownership will be receiving 
writes for the new token range. It does mean you have some data transfers which 
are wasted. 

If the cluster is not in use a faster way would be to ensure repair completes 
on node and then use nodetool removetoken (from a node other than the one you 
want to remove) to remove it from the ring without streaming data. 

I would go with the first option. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2011, at 11:56 AM, Ian Danforth wrote:

> All,
> 
>  If I have a 3 node cluster storing counters and RF3, is it possible to 
> shrink back down to a single node cluster? If so should I change replication 
> factor, disable a node, wait for streaming to complete, and repeat for the 
> other node? Should I assume that the cluster will be unavailable during this 
> process?
> 
> Thanks in advance!
> 
> Ian
> 



Re: Weird problem with empty CF

2011-10-05 Thread aaron morton
No. 

It's generally only an issue with heavy delete workloads, and it's sometimes 
possible to design around it. 

cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2011, at 1:18 PM, Daning wrote:

> Thanks.  Do you have plan to improve this? I think tombstone should be 
> separated with live data since it serves different purpose, built in separate 
> SSTable or indexed differently. It is pretty costly to do filtering while 
> reading.
> 
> Daning
> 
> On 10/04/2011 01:34 PM, aaron morton wrote:
>> 
>> I would not get gc_grace seconds to 0, set to to something small. 
>> 
>> gc_grace_seconds or ttl is only the minimum amount of time the column will 
>> stay in the data files. The columns are only purged when compaction runs 
>> some time after that timespan has ended. 
>> 
>> If you are seeing issues where a heavy delete workload is having an 
>> noticeably adverse effect on read performance then you should look at the 
>> data model. Consider ways to spread the write / read / delete workload over 
>> multiple rows.
>> 
>> If you cannot get away from it then experiment with reducing the 
>> min_compactioon_threshold of the CF's so that compaction kicks in quicker, 
>> and (potentially) tombstones are purged faster. 
>> 
>> Chees
>> 
>>  
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 5/10/2011, at 6:03 AM, Daning wrote:
>> 
>>> Thanks Aaron.  How about I set the gc_grace_seconds to 0 or like 2 hours? I 
>>> like to clean up tomebstone sooner, I don't care losing some data and all 
>>> my columns have ttl. 
>>> 
>>> If one node is down longer than gc_grace_seconds, and I got tombstone 
>>> removed, once the node is up, from my understanding deleted data will be 
>>> synced back. In this case my data will be processed twice and it will not 
>>> be a big deal to me.
>>> 
>>> Thanks,
>>> 
>>> Daning
>>> 
>>> 
>>> On 10/04/2011 01:27 AM, aaron morton wrote:
 
 Yes that's the slice query skipping past the tombstone columns. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 4/10/2011, at 4:24 PM, Daning Wang wrote:
 
> Lots of SliceQueryFilter in the log, is that handling tombstone?
> 
> DEBUG [ReadStage:49] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317582939743663:true:4@1317582939933000
> DEBUG [ReadStage:50] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317573253148778:true:4@1317573253354000
> DEBUG [ReadStage:43] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317669552951428:true:4@1317669553018000
> DEBUG [ReadStage:33] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317581886709261:true:4@1317581886957000
> DEBUG [ReadStage:52] 2011-10-03 20:15:07,942 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317568165152246:true:4@1317568165482000
> DEBUG [ReadStage:36] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317567265089211:true:4@1317567265405000
> DEBUG [ReadStage:53] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317674324843122:true:4@1317674324946000
> DEBUG [ReadStage:38] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317571990078721:true:4@1317571990141000
> DEBUG [ReadStage:57] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317671855234221:true:4@1317671855239000
> DEBUG [ReadStage:54] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317558305262954:true:4@1317558305337000
> DEBUG [RequestResponseStage:11] 2011-10-03 20:15:07,941 
> ResponseVerbHandler.java (line 48) Processing response on a callback from 
> 12347@/10.210.101.104
> DEBUG [RequestResponseStage:9] 2011-10-03 20:15:07,941 
> AbstractRowResolver.java (line 66) Preprocessed data response
> DEBUG [RequestResponseStage:13] 2011-10-03 20:15:07,941 
> AbstractRowResolver.java (line 66) Preprocessed digest response
> DEBUG [ReadStage:58] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317581337972739:true:4@1317581338044000
> DEBUG [ReadStage:64] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317582656796332:true:4@131758265697
> DEBUG [ReadStage:55] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317569432886284:true:4@1317569432984000
> DEBUG [ReadStage:45] 2011-10-03 20:15:07,941 SliceQueryFilter.java (line 
> 123) collecting 0 of 1: 1317572658687019:true:4@1317572658718000
> DEBUG [ReadStage:47] 2011-10-03 20:15

Re: Why is mutation stage increasing ??

2011-10-05 Thread Yi Yang
Well what client are you using? And can you give a hint to your node hardware?

從我的 BlackBerry® 無線裝置

-Original Message-
From: Philippe 
Date: Wed, 5 Oct 2011 10:33:21 
To: user
Reply-To: user@cassandra.apache.org
Subject: Why is mutation stage increasing ??

Hello,
I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light as
to what is going on ?
It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats
showed huge increasing MutationStages (in the hundreds of thousands).
I restarted one node and it took a while to reply GB of commitlog. I've
shutdown all clients that write to the cluster and it's just weird

All nodes are still showing huge MutationStages including the new one and
it's either increasing or stable. The pending count is stuck at 32.
Compactionstats shows no compaction on 2 nodes and dozens of Scrub
compactions (all at 100%) on the 3rd one. This is a scrub I did last week
when I encountered assertion errors.
Netstats shows no streams being exchanged at any node but each on is
expecting a few Responses.

Any ideas ?
Thanks

For example (increased to 567062 while I was writing this email)
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 018372664517
0 0
RequestResponseStage  0 010731370183
0 0
MutationStage32565879  295492216
0 0
ReadRepairStage   0 0  23654
0 0
ReplicateOnWriteStage 0 07733659
0 0
GossipStage   0 03502922
0 0
AntiEntropyStage  0 0   1631
0 0
MigrationStage0 0  0
0 0
MemtablePostFlusher   0 0   5716
0 0
StreamStage   0 0 10
0 0
FlushWriter   0 0   5714
0   499
FILEUTILS-DELETE-POOL 0 0773
0 0
MiscStage 0 0   1266
0 0
FlushSorter   0 0  0
0 0
AntiEntropySessions   0 0 18
0 0
InternalResponseStage 0 0  0
0 0
HintedHandoff 0 0   1798
0 0


Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0 1223769753
Responses   n/a 4 1627481305



Re: ByteOrderedPartitioner token generation

2011-10-05 Thread aaron morton
When considering such an expedition it is important to quantify the relative 
account of the terms "large" and "short". The modern gentleman may also find 
respite through the application of the Bulk Loader 
http://www.datastax.com/dev/blog/bulk-loading

I would avoid using the BOP unless you are sure you want to manually shard and 
balance your data. Also every write is sent to all RF nodes, and the handy 
thing is if some of the nodes fail to apply the write the cluster may still 
think it successful .

Writes can be *very* fast, have you tested your setup to see if there are any 
issues ? Even if you have a local workstation have a look at the CF Write 
Latency stats from nodetool cfstats. If you are having issues with write speed 
and you cannot add more nodes, or more clients, I would look at disabling the 
commit log (or increasing the periodic sync) if your problem domain permits it. 

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2011, at 7:30 PM, Masoud Moshref Javadi wrote:

> I need to insert a large amount of data to Cassandra cluster in a short time. 
> So I want the interaction among Cassandra servers be minimum. I think that 
> the best way to do this is to use ByteOrderedPartitioner and generate ID of 
> new data based on the InitialToken of servers and send data to the 
> corresponding server from the webserver. Am I right?
> 
> Now my question is if I have some data ranging from 1-100 and want to put 
> 1-25 in server1, 26-50 in server 2 and so on, what should be the Initial 
> Token of the servers?
> 
> Thanks in advance



Re: Why is mutation stage increasing ??

2011-10-05 Thread aaron morton
Lots of hinted handoff can give you mutations…

> HintedHandoff 0 0   1798 0
>  0


1798 is somewhat high. This is the HH tasks on this node though, can you see HH 
running on other nodes in the cluster? What has been happening on this node ? 

HH is throttled to avoid this sort of thing, what version are you on ? 

Also looks like the disk IO could not keep up with the flushing….

FlushWriter   0 0   5714 0  
 499
 
You need to provide some more info on what was happening to nodes before hand. 
And check the logs on all machines for errors etc. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2011, at 9:52 PM, Yi Yang wrote:

> Well what client are you using? And can you give a hint to your node hardware?
> 從我的 BlackBerry® 無線裝置
> 
> From: Philippe 
> Date: Wed, 5 Oct 2011 10:33:21 +0200
> To: user
> ReplyTo: user@cassandra.apache.org
> Subject: Why is mutation stage increasing ??
> 
> Hello,
> I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light as 
> to what is going on ?
> It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats 
> showed huge increasing MutationStages (in the hundreds of thousands).
> I restarted one node and it took a while to reply GB of commitlog. I've 
> shutdown all clients that write to the cluster and it's just weird
> 
> All nodes are still showing huge MutationStages including the new one and 
> it's either increasing or stable. The pending count is stuck at 32.
> Compactionstats shows no compaction on 2 nodes and dozens of Scrub 
> compactions (all at 100%) on the 3rd one. This is a scrub I did last week 
> when I encountered assertion errors.
> Netstats shows no streams being exchanged at any node but each on is 
> expecting a few Responses.
> 
> Any ideas ?
> Thanks
> 
> For example (increased to 567062 while I was writing this email)
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 018372664517 0
>  0
> RequestResponseStage  0 010731370183 0
>  0
> MutationStage32565879  295492216 0
>  0
> ReadRepairStage   0 0  23654 0
>  0
> ReplicateOnWriteStage 0 07733659 0
>  0
> GossipStage   0 03502922 0
>  0
> AntiEntropyStage  0 0   1631 0
>  0
> MigrationStage0 0  0 0
>  0
> MemtablePostFlusher   0 0   5716 0
>  0
> StreamStage   0 0 10 0
>  0
> FlushWriter   0 0   5714 0
>499
> FILEUTILS-DELETE-POOL 0 0773 0
>  0
> MiscStage 0 0   1266 0
>  0
> FlushSorter   0 0  0 0
>  0
> AntiEntropySessions   0 0 18 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> HintedHandoff 0 0   1798 0
>  0
> 
> 
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool NameActive   Pending  Completed
> Commandsn/a 0 1223769753
> Responses   n/a 4 1627481305
> 



Re: Why is mutation stage increasing ??

2011-10-05 Thread Philippe
Thanks for the quick responses.

@Yi
Using Hector 0.8.0-1
Hardware is :

   - AMD Opteron 4174 6x 2.30+ GHz
   - 32 Go DDR3
   - 1 Gbps Lossless


@aaron
I'm running 0.8.6 on all nodes, straight from the debian packages.
I get hinted handoffs from time to time because of flapping, I've planning
to increase the phi as per another thread but haven't yet.
Here are the HH per node::
HintedHandoff 0 0437
0 0
HintedHandoff 0 0  2
0 0
HintedHandoff 0 0   1798
0 0
Not seeing any iowait from my munin CPU graph, at least not more than the
past couple of weeks. There is a little more on the 2nd node because it's
also holding a mysql database that gets hit hard.
Munin iostats graph shows an average 10-20kilo blocks/s read/write.

Nothing special was happening besides a weekly repair on node two starting
yesterday at 4am. That one failed with
ERROR [AntiEntropySessions:5] 2011-10-04 04:03:56,676
AbstractCassandraDaemon.java (line 139) Fatal exception in thread
Thread[AntiEntropySessions:5,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Problem during repair
session manual-repair-0d4125f0-aab7-403b-a083-7c19ef6579b1, endpoint
/xx.xx.xx.180 died
Then the next planned repairs failed before starting
INFO [AntiEntropySessions:8] 2011-10-04 04:03:57,137 AntiEntropyService.java
(line 658) Could not proceed on repair because a neighbor (/xx.xx.xx.180) is
dead: manual-repair-5dc4c221-3a15-4031-9aa8-0931e41816cd failed

Looking at the logs on that node shows no Exception. And I was about to say,
"nothing special happening at that time" except that it looks like at 4am,
the GC started working hard and got the heap down to 9GB and then it shot
straight up to almost 16GB so I guess ParNew couldn't keep up and
ConcurrentMarkSweep had to step in and basically hang the server ? It took
another 2 minutes until I get the "Heap is 0.75 full" message, I get a lot
of StatusLogger messages before that.
So it looks like computing the Merkle tree was very expensive this time... I
wonder why ? Anything I can do to handle this ?

INFO [ScheduledTasks:1] 2011-10-04 04:03:12,874 GCInspector.java (line 122)
GC for ParNew: 427 ms for 2 collections, 16227154488 used; max is
16838033408
 INFO [GossipTasks:1] 2011-10-04 04:04:24,092 Gossiper.java (line 697)
InetAddress /xx.xx.xx.97 is now dead.
 INFO [ScheduledTasks:1] 2011-10-04 04:04:24,093 GCInspector.java (line 122)
GC for ParNew: 26376 ms for 2 collections, 8832125416 used; max is
16838033408
(not GC logs until)
 INFO [ScheduledTasks:1] 2011-10-04 04:04:24,251 GCInspector.java (line 122)
GC for ConcurrentMarkSweep: 16250 ms for 3 collections, 9076209720 used; max
is 16838033408
 WARN [ScheduledTasks:1] 2011-10-04 04:06:52,777 GCInspector.java (line 143)
Heap is 0.752707974197173 full.  You may need to reduce memtable and/or
cache sizes.  Cassandra will now flush up to the two largest memtables to
free up memory.  Adjust flush_largest_memtables_at threshold in
cassandra.yaml if you don't want Cassandra to do this automatically



2011/10/5 aaron morton 

> Lots of hinted handoff can give you mutations…
>
> HintedHandoff 0 0   1798
> 0 0
>
>
> 1798 is somewhat high. This is the HH tasks on this node though, can you
> see HH running on other nodes in the cluster? What has been happening on
> this node ?
>
> HH is throttled to avoid this sort of thing, what version are you on ?
>
> Also looks like the disk IO could not keep up with the flushing….
>
> FlushWriter   0 0   5714
> 0   499
>
> You need to provide some more info on what was happening to nodes before
> hand. And check the logs on all machines for errors etc.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/10/2011, at 9:52 PM, Yi Yang wrote:
>
> Well what client are you using? And can you give a hint to your node
> hardware?
>
> 從我的 BlackBerry® 無線裝置
> --
> *From: * Philippe 
> *Date: *Wed, 5 Oct 2011 10:33:21 +0200
> *To: *user
> *ReplyTo: * user@cassandra.apache.org
> *Subject: *Why is mutation stage increasing ??
>
> Hello,
> I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light
> as to what is going on ?
> It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats
> showed huge increasing MutationStages (in the hundreds of thousands).
> I restarted one node and it took a while to reply GB of commitlog. I've
> shutdown all clients that write to the cluster and it's just weird
>
> All nodes are still showing huge MutationStages including the new one and
> it's either increasing or stable. The pending count is stuck at 32.
> Compactionstats shows no compaction on 2 nodes and dozens of Scrub
> compactions (all at 100%)

Re: Token != DecoratedKey assertion

2011-10-05 Thread Philippe
A little feedback,
I scrubbed on each server and I haven't seen this error again. The load on
each server eems to be correct.
nodetool compactionstats shows  boat-load of "Scrub" at 100% on my 3rd node
but not on the 2 others.
I left it that way and haven't restart yet.

2011/9/26 aaron morton 

> Looks like a mismatch between the key the index says should be at a certain
> position in the date file and the key that is actually there.
>
> I've not checked but scrub *may* fix this this. Try it and see.
>
> (repair is for repairing consistency between nodes, scrub fixes local
> issues with data. )
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26/09/2011, at 12:53 PM, Philippe wrote:
>
> Juste did
> Could there be data corruption or will repairs do this?
>
> Thanks
> Le 25 sept. 2011 15:30, "Jonathan Ellis"  a écrit :
> > Assertion errors are bugs, so that should worry you.
> >
> > However, I'd upgrade before filing a ticket. There were a lot of
> > fixes in 0.8.5.
> >
> > On Sun, Sep 25, 2011 at 2:27 AM, Philippe  wrote:
> >> Hello,
> >> I've seen a couple of these in my logs, running 0.8.4.
> >> This is a RF=3, 3-node cluster. 2 nodes including this one are on 0.8.4
> and
> >> one is on 0.8.5
> >>
> >> The node is still functionning hours later. Should I be worried ?
> >>
> >> Thanks
> >>
> >> ERROR [ReadStage:94911] 2011-09-24 22:40:30,043
> AbstractCassandraDaemon.java
> >> (line 134) Fatal exception in thread Thread[ReadStage:94911,5,main]
> >> java.lang.AssertionError:
> >>
> DecoratedKey(Token(bytes[224ceb80b5fb11e0848783ceb9bf0002ff33]),
> >> 224ceb80b5fb11e0848783ceb9bf0002ff33) !=
> >> DecoratedKey(Token(bytes[038453154cb0005f14]),
> 038453154cb0005f14)
> >> in /var/lib/cassandra/data/X/PUBLIC_MONTHLY_20-g-10634-Data.db
> >> at
> >>
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:59)
> >> at
> >>
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
> >> at
> >>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1315)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1182)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1222)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169)
> >> at org.apache.cassandra.db.Table.getRow(Table.java:385)
> >> at
> >>
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58)
> >> at
> >>
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:642)
> >> at
> >>
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1107)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> at java.lang.Thread.run(Thread.java:662)
> >> ERROR [ReadStage:94936] 2011-09-24 22:40:30,042
> AbstractCassandraDaemon.java
> >> (line 134) Fatal exception in thread Thread[ReadStage:94936,5,main]
> >> java.lang.AssertionError: DecoratedKey(Token(bytes[]), ) !=
> >> DecoratedKey(Token(bytes[038453154c90005f14]),
> 038453154c90005f14)
> >> in /var/lib/cassandra/data/X/PUBLIC_MONTHLY_20-g-10634-Data.db
> >> at
> >>
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:59)
> >> at
> >>
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
> >> at
> >>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1315)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1182)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1222)
> >> at
> >>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169)
> >> at org.apache.cassandra.db.Table.getRow(Table.java:385)
> >> at
> >>
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:58)
> >> at
> >>
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:642)
> >> at
> >>
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1107)
> >> at
> >>
> java.util.concurrent.T

Re: Why is mutation stage increasing ??

2011-10-05 Thread Philippe
Followup,
I was mistaken in saying there weren't writes to the cluster. There's a
process that's doing a couple mutations per second.

I just restarted node #3 and found this message on node #1
 INFO [HintedHandoff:1] 2011-10-05 12:25:08,173 HintedHandOffManager.java
(line 314) Endpoint /xx.xx.xx.180 died before hint delivery, aborting

Could HH have stuck the nodes on the receiving end ? Is there any way to
throttle this ?
If it can't be throttled and you confirm HH is a suspect I may simply
disable it as I'm running a repair 3 times a week (once per node) so I guess
my cluster won't be too out of sync.

Thanks

2011/10/5 Philippe 

> Thanks for the quick responses.
>
> @Yi
> Using Hector 0.8.0-1
> Hardware is :
>
>- AMD Opteron 4174 6x 2.30+ GHz
>- 32 Go DDR3
>- 1 Gbps Lossless
>
>
> @aaron
> I'm running 0.8.6 on all nodes, straight from the debian packages.
> I get hinted handoffs from time to time because of flapping, I've planning
> to increase the phi as per another thread but haven't yet.
> Here are the HH per node::
> HintedHandoff 0 0437
> 0 0
> HintedHandoff 0 0  2
> 0 0
>
> HintedHandoff 0 0   1798
> 0 0
> Not seeing any iowait from my munin CPU graph, at least not more than the
> past couple of weeks. There is a little more on the 2nd node because it's
> also holding a mysql database that gets hit hard.
> Munin iostats graph shows an average 10-20kilo blocks/s read/write.
>
> Nothing special was happening besides a weekly repair on node two starting
> yesterday at 4am. That one failed with
> ERROR [AntiEntropySessions:5] 2011-10-04 04:03:56,676
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> Thread[AntiEntropySessions:5,5,RMI Runtime]
> java.lang.RuntimeException: java.io.IOException: Problem during repair
> session manual-repair-0d4125f0-aab7-403b-a083-7c19ef6579b1, endpoint
> /xx.xx.xx.180 died
> Then the next planned repairs failed before starting
> INFO [AntiEntropySessions:8] 2011-10-04 04:03:57,137
> AntiEntropyService.java (line 658) Could not proceed on repair because a
> neighbor (/xx.xx.xx.180) is dead:
> manual-repair-5dc4c221-3a15-4031-9aa8-0931e41816cd failed
>
> Looking at the logs on that node shows no Exception. And I was about to
> say, "nothing special happening at that time" except that it looks like at
> 4am, the GC started working hard and got the heap down to 9GB and then it
> shot straight up to almost 16GB so I guess ParNew couldn't keep up and
> ConcurrentMarkSweep had to step in and basically hang the server ? It took
> another 2 minutes until I get the "Heap is 0.75 full" message, I get a lot
> of StatusLogger messages before that.
> So it looks like computing the Merkle tree was very expensive this time...
> I wonder why ? Anything I can do to handle this ?
>
> INFO [ScheduledTasks:1] 2011-10-04 04:03:12,874 GCInspector.java (line 122)
> GC for ParNew: 427 ms for 2 collections, 16227154488 used; max is
> 16838033408
>  INFO [GossipTasks:1] 2011-10-04 04:04:24,092 Gossiper.java (line 697)
> InetAddress /xx.xx.xx.97 is now dead.
>  INFO [ScheduledTasks:1] 2011-10-04 04:04:24,093 GCInspector.java (line
> 122) GC for ParNew: 26376 ms for 2 collections, 8832125416 used; max is
> 16838033408
> (not GC logs until)
>  INFO [ScheduledTasks:1] 2011-10-04 04:04:24,251 GCInspector.java (line
> 122) GC for ConcurrentMarkSweep: 16250 ms for 3 collections, 9076209720used; 
> max is 16838033408
>  WARN [ScheduledTasks:1] 2011-10-04 04:06:52,777 GCInspector.java (line
> 143) Heap is 0.752707974197173 full.  You may need to reduce memtable and/or
> cache sizes.  Cassandra will now flush up to the two largest memtables to
> free up memory.  Adjust flush_largest_memtables_at threshold in
> cassandra.yaml if you don't want Cassandra to do this automatically
>
>
>
> 2011/10/5 aaron morton 
>
>> Lots of hinted handoff can give you mutations…
>>
>> HintedHandoff 0 0   1798
>> 0 0
>>
>>
>> 1798 is somewhat high. This is the HH tasks on this node though, can you
>> see HH running on other nodes in the cluster? What has been happening on
>> this node ?
>>
>>   HH is throttled to avoid this sort of thing, what version are you on ?
>>
>> Also looks like the disk IO could not keep up with the flushing….
>>
>> FlushWriter   0 0   5714
>> 0   499
>>
>> You need to provide some more info on what was happening to nodes before
>> hand. And check the logs on all machines for errors etc.
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 5/10/2011, at 9:52 PM, Yi Yang wrote:
>>
>> Well what client are you using? And can you give a hint to your node
>> hardware?
>>
>> 從我的 BlackBerry® 無線裝置
>> --

cluster repair script

2011-10-05 Thread Radim Kolar

simple script for running cluster - wide repairs

#! /bin/sh
set -e
if test $# -eq 0; then
echo "$0 "
exit 1;
fi
for i in `nodetool -h $1 ring | cut -d ' ' -f 1 | grep -e '^[0-9]'`; do
nodetool -h $i repair
done



Re: dedicated gossip lan

2011-10-05 Thread Radim Kolar

Dne 4.10.2011 22:05, Sorin Julean napsal(a):

Sorry for not being clear.
Indeed I mean a separate LAN and interfaces for "listen_address".

It needs to be 1GBit LAN, 100Mbit Ethernet is way too slow for cassandra.


RE: invalid column name length 0

2011-10-05 Thread Desimpel, Ignace
Did the test again, empty database, with replication factor 3, Cassandra 
running in it's own jvm.
All data is now stored using a separate program that connects to the database 
using THRIFT.
At least this results in a lot less Dead/Up messages (I guess the GC had too 
much work handling the non-cassandra memory objects), but it is still there.

Also the exception 'invalid column name length 0' is there again. Below is a 
log of machine x.x.x.59 starting after 00:00 hour. One hour before 00:00 I 
stopped all storing, so that the machines had nothing else to do besides 
compacting and cleaning up and ... (still compacting and discarding obsolete 
commit logs).

Checked the log files on all machines, and no exception nor assert related to 
column names could be found.

2011-10-05 00:06:25.172 InetAddress /x.x.x.60 is now dead.
2011-10-05 00:06:25.179 InetAddress /x.x.x.60 is now UP
2011-10-05 00:46:47.091 Saved KsFullIdx-ForwardStringValues-KeyCache (94 items) 
in 19 ms
2011-10-05 00:46:47.334 Saved KsFullIdx-ReverseLongValues-KeyCache (98732 
items) in 117 ms
2011-10-05 00:46:47.797 Saved KsFullIdx-ReverseLabelValues-KeyCache (273425 
items) in 259 ms
2011-10-05 00:46:48.645 Saved KsFullIdx-ReverseStringValues-KeyCache (50 
items) in 472 ms
2011-10-05 01:00:52.691 ColumnFamilyStore(table='system', 
columnFamily='HintsColumnFamily') liveRatio is 28.375375375375377 (just-counted 
was 28.375375375375377).  calculation took 4ms for 56 columns
2011-10-05 01:07:02.052 InetAddress /x.x.x.60 is now dead.
2011-10-05 01:07:02.058 InetAddress /x.x.x.60 is now UP
2011-10-05 01:07:02.060 InetAddress /x.x.x.61 is now dead.
2011-10-05 01:07:02.060 InetAddress /x.x.x.61 is now UP
2011-10-05 02:07:33.785 InetAddress /x.x.x.60 is now dead.
2011-10-05 02:07:33.791 InetAddress /x.x.x.60 is now UP
2011-10-05 02:41:12.528 Fatal exception in thread Thread[HintedHandoff:1,5,main]
java.io.IOError: 
org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column 
name length 0
at 
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 ~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281) 
~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236) 
~[apache-cassandra-0.8.6.jar:0.8.6]
at 
java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 ~[na:1.6.0_24]
at 
java.util.concurrent.ConcurrentSkipListMap.(ConcurrentSkipListMap.java:1443)
 ~[na:1.6.0_24]
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:445) 
~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:428) 
~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:418) 
~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:380) 
~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:179)
 ~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:121)
 ~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:49)
 ~[apache-cassandra-0.8.6.jar:0.8.6]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 ~[guava-r08.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) 
~[guava-r08.jar:na]
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
 ~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
 ~[commons-collections-3.2.1.jar:3.2.1]
at 
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
 ~[commons-collections-3.2.1.jar:3.2.1]
at 
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
 ~[commons-collections-3.2.1.jar:3.2.1]
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
 ~[apache-cassandra-0.8.6.jar:0.8.6]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 ~[guava-r08.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) 
~[guava-r08.jar:na]
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
 ~[apache-cassandra-0.8.6.jar:0.8.6]
at 
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilt

Thrift transport error

2011-10-05 Thread M Vieira
I'm using Thrift 0.7 with Cassandra 0.8.6 and "Cassandra Cluster
Admin"
to work around my single node [testing] cluster.

All seams to work fine, but I'm getting a contant error message
"CustomTThreadPoolServer.java (line 197) Thrift transport error occurred
during processing of message."


Could anyone help shed some light on the issue?


Below is an example of the error message in context

[root@merlot /]# cat /var/log/cassandra/system.log
[...showing the error only...]
DEBUG [ScheduledTasks:1] 2011-10-05 11:59:00,170 StorageLoadBalancer.java
(line 336) Disseminating load info ...
DEBUG [ScheduledTasks:1] 2011-10-05 12:00:00,172 StorageLoadBalancer.java
(line 336) Disseminating load info ...
DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,364 CassandraServer.java (line
1060) checking schema agreement
DEBUG [MigrationStage:1] 2011-10-05 12:00:49,365 SchemaCheckVerbHandler.java
(line 36) Received schema check request.
DEBUG [InternalResponseStage:7] 2011-10-05 12:00:49,367
ResponseVerbHandler.java (line 48) Processing response on a callback from 2@
/192.168.100.30
DEBUG [InternalResponseStage:7] 2011-10-05 12:00:49,367 StorageProxy.java
(line 789) Received schema check response from 192.168.100.30
DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,368 StorageProxy.java (line 820)
My version is --1000--
DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,368 StorageProxy.java (line 850)
Schemas are in agreement.
DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,374 CustomTThreadPoolServer.java
(line 197) Thrift transport error occurred during processing of message.
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,376 ClientState.java (line 94)
logged out: #
DEBUG [ScheduledTasks:1] 2011-10-05 12:01:00,174 StorageLoadBalancer.java
(line 336) Disseminating load info ...
[...showing the error only...]
[root@merlot /]#


Re: invalid column name length 0

2011-10-05 Thread Sylvain Lebresne
Ok. Quick other question then. Did you issue deletion and/or used TTLs
for that test ?

Also, it's probably worth creating a ticket on
https://issues.apache.org/jira/browse/CASSANDRA
if you don't mind.

--
Sylvain

On Wed, Oct 5, 2011 at 2:42 PM, Desimpel, Ignace
 wrote:
> Did the test again, empty database, with replication factor 3, Cassandra 
> running in it's own jvm.
> All data is now stored using a separate program that connects to the database 
> using THRIFT.
> At least this results in a lot less Dead/Up messages (I guess the GC had too 
> much work handling the non-cassandra memory objects), but it is still there.
>
> Also the exception 'invalid column name length 0' is there again. Below is a 
> log of machine x.x.x.59 starting after 00:00 hour. One hour before 00:00 I 
> stopped all storing, so that the machines had nothing else to do besides 
> compacting and cleaning up and ... (still compacting and discarding obsolete 
> commit logs).
>
> Checked the log files on all machines, and no exception nor assert related to 
> column names could be found.
>
> 2011-10-05 00:06:25.172 InetAddress /x.x.x.60 is now dead.
> 2011-10-05 00:06:25.179 InetAddress /x.x.x.60 is now UP
> 2011-10-05 00:46:47.091 Saved KsFullIdx-ForwardStringValues-KeyCache (94 
> items) in 19 ms
> 2011-10-05 00:46:47.334 Saved KsFullIdx-ReverseLongValues-KeyCache (98732 
> items) in 117 ms
> 2011-10-05 00:46:47.797 Saved KsFullIdx-ReverseLabelValues-KeyCache (273425 
> items) in 259 ms
> 2011-10-05 00:46:48.645 Saved KsFullIdx-ReverseStringValues-KeyCache (50 
> items) in 472 ms
> 2011-10-05 01:00:52.691 ColumnFamilyStore(table='system', 
> columnFamily='HintsColumnFamily') liveRatio is 28.375375375375377 
> (just-counted was 28.375375375375377).  calculation took 4ms for 56 columns
> 2011-10-05 01:07:02.052 InetAddress /x.x.x.60 is now dead.
> 2011-10-05 01:07:02.058 InetAddress /x.x.x.60 is now UP
> 2011-10-05 01:07:02.060 InetAddress /x.x.x.61 is now dead.
> 2011-10-05 01:07:02.060 InetAddress /x.x.x.61 is now UP
> 2011-10-05 02:07:33.785 InetAddress /x.x.x.60 is now dead.
> 2011-10-05 02:07:33.791 InetAddress /x.x.x.60 is now UP
> 2011-10-05 02:41:12.528 Fatal exception in thread 
> Thread[HintedHandoff:1,5,main]
> java.io.IOError: 
> org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid 
> column name length 0
>        at 
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281) 
> ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236) 
> ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>  ~[na:1.6.0_24]
>        at 
> java.util.concurrent.ConcurrentSkipListMap.(ConcurrentSkipListMap.java:1443)
>  ~[na:1.6.0_24]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:445)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:428)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:418)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:380)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:179)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:121)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:49)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>  ~[guava-r08.jar:na]
>        at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) 
> ~[guava-r08.jar:na]
>        at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
>  ~[commons-collections-3.2.1.jar:3.2.1]
>        at 
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
>  ~[commons-collections-3.2.1.jar:3.2.1]
>        at 
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
>  ~[commons-collections-3.2.1.jar:3.2.1]
>        at 
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
>  ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> com.

Re: Thrift transport error

2011-10-05 Thread Sylvain Lebresne
Cassandra uses Thrift 0.6, so without being a specialist of thrift
internals, that
could be the source of the problem you're seeing (for info, we'll
update to thrift
0.7 at some point, but not in the very close future; see
https://issues.apache.org/jira/browse/CASSANDRA-3213).

--
Sylvain

On Wed, Oct 5, 2011 at 3:29 PM, M Vieira  wrote:
>
> I'm using Thrift 0.7 with Cassandra 0.8.6 and "Cassandra Cluster Admin" to
> work around my single node [testing] cluster.
>
> All seams to work fine, but I'm getting a contant error message
> "CustomTThreadPoolServer.java (line 197) Thrift transport error occurred
> during processing of message."
>
>
> Could anyone help shed some light on the issue?
>
>
> Below is an example of the error message in context
>
> [root@merlot /]# cat /var/log/cassandra/system.log
> [...showing the error only...]
> DEBUG [ScheduledTasks:1] 2011-10-05 11:59:00,170 StorageLoadBalancer.java
> (line 336) Disseminating load info ...
> DEBUG [ScheduledTasks:1] 2011-10-05 12:00:00,172 StorageLoadBalancer.java
> (line 336) Disseminating load info ...
> DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,364 CassandraServer.java (line
> 1060) checking schema agreement
> DEBUG [MigrationStage:1] 2011-10-05 12:00:49,365 SchemaCheckVerbHandler.java
> (line 36) Received schema check request.
> DEBUG [InternalResponseStage:7] 2011-10-05 12:00:49,367
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 2@/192.168.100.30
> DEBUG [InternalResponseStage:7] 2011-10-05 12:00:49,367 StorageProxy.java
> (line 789) Received schema check response from 192.168.100.30
> DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,368 StorageProxy.java (line 820)
> My version is --1000--
> DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,368 StorageProxy.java (line 850)
> Schemas are in agreement.
> DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,374 CustomTThreadPoolServer.java
> (line 197) Thrift transport error occurred during processing of message.
> org.apache.thrift.transport.TTransportException
>     at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>     at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>     at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>     at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>     at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>     at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
>     at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     at java.lang.Thread.run(Thread.java:636)
> DEBUG [pool-2-thread-2] 2011-10-05 12:00:49,376 ClientState.java (line 94)
> logged out: #
> DEBUG [ScheduledTasks:1] 2011-10-05 12:01:00,174 StorageLoadBalancer.java
> (line 336) Disseminating load info ...
> [...showing the error only...]
> [root@merlot /]#
>
>


RE: invalid column name length 0

2011-10-05 Thread Desimpel, Ignace
TTLs : no
Deletion : yes ; but I think I can avoid this and thus running the same test 
without deletion, just to eliminate possibilities.

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: woensdag 5 oktober 2011 15:34
To: user@cassandra.apache.org
Subject: Re: invalid column name length 0

Ok. Quick other question then. Did you issue deletion and/or used TTLs for that 
test ?

Also, it's probably worth creating a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA
if you don't mind.

--
Sylvain

On Wed, Oct 5, 2011 at 2:42 PM, Desimpel, Ignace  
wrote:
> Did the test again, empty database, with replication factor 3, Cassandra 
> running in it's own jvm.
> All data is now stored using a separate program that connects to the database 
> using THRIFT.
> At least this results in a lot less Dead/Up messages (I guess the GC had too 
> much work handling the non-cassandra memory objects), but it is still there.
>
> Also the exception 'invalid column name length 0' is there again. Below is a 
> log of machine x.x.x.59 starting after 00:00 hour. One hour before 00:00 I 
> stopped all storing, so that the machines had nothing else to do besides 
> compacting and cleaning up and ... (still compacting and discarding obsolete 
> commit logs).
>
> Checked the log files on all machines, and no exception nor assert related to 
> column names could be found.
>
> 2011-10-05 00:06:25.172 InetAddress /x.x.x.60 is now dead.
> 2011-10-05 00:06:25.179 InetAddress /x.x.x.60 is now UP
> 2011-10-05 00:46:47.091 Saved KsFullIdx-ForwardStringValues-KeyCache 
> (94 items) in 19 ms
> 2011-10-05 00:46:47.334 Saved KsFullIdx-ReverseLongValues-KeyCache 
> (98732 items) in 117 ms
> 2011-10-05 00:46:47.797 Saved KsFullIdx-ReverseLabelValues-KeyCache 
> (273425 items) in 259 ms
> 2011-10-05 00:46:48.645 Saved KsFullIdx-ReverseStringValues-KeyCache 
> (50 items) in 472 ms
> 2011-10-05 01:00:52.691 ColumnFamilyStore(table='system', 
> columnFamily='HintsColumnFamily') liveRatio is 28.375375375375377 
> (just-counted was 28.375375375375377).  calculation took 4ms for 56 
> columns
> 2011-10-05 01:07:02.052 InetAddress /x.x.x.60 is now dead.
> 2011-10-05 01:07:02.058 InetAddress /x.x.x.60 is now UP
> 2011-10-05 01:07:02.060 InetAddress /x.x.x.61 is now dead.
> 2011-10-05 01:07:02.060 InetAddress /x.x.x.61 is now UP
> 2011-10-05 02:07:33.785 InetAddress /x.x.x.60 is now dead.
> 2011-10-05 02:07:33.791 InetAddress /x.x.x.60 is now UP
> 2011-10-05 02:41:12.528 Fatal exception in thread 
> Thread[HintedHandoff:1,5,main]
> java.io.IOError: 
> org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: 
> invalid column name length 0
>        at 
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSort
> edMap.java:265) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:
> 281) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:
> 236) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentS
> kipListMap.java:1493) ~[na:1.6.0_24]
>        at 
> java.util.concurrent.ConcurrentSkipListMap.(ConcurrentSkipListMa
> p.java:1443) ~[na:1.6.0_24]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.
> java:445) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.
> java:428) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.
> java:418) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.
> java:380) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlock
> Fetcher.getNextBlock(IndexedSliceReader.java:179) 
> ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(
> IndexedSliceReader.java:121) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(
> IndexedSliceReader.java:49) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIt
> erator.java:140) ~[guava-r08.jar:na]
>        at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.ja
> va:135) ~[guava-r08.jar:na]
>        at 
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SS
> TableSliceIterator.java:108) ~[apache-cassandra-0.8.6.jar:0.8.6]
>        at 
> org.apache.commons.collections.iterators.CollatingIterator.set(Collati
> ngIterator.java:283) ~[commons-collections-3.2.1.jar:3.2.1]
>        at 
> org.apache.commons.collections.iterators.CollatingIterator.least(Colla
> tingIterator.java:326) ~[co

Re: Can't connect to MX4J endpoint on Ubuntu

2011-10-05 Thread Bart Swedrowski

On 23/09/2011 23:55, Iwona Bialynicka-Birula wrote:

I am trying to monitor Cassandra 8.0 using MX4J


I was going through this stuff recently, as well.

Have a look at Jolokia[1] and Jmx4Perl[2].  It's quite trivial in 
installation and will give you access too all of the stuff as MX4J does.


Especially Jmx4Perl has cacti and nagios plugins which are working 
surprisingly well ;-)


[1]: http://www.jolokia.org/
[2]: http://search.cpan.org/dist/jmx4perl/


Thrift transport error

2011-10-05 Thread M Vieira
@Sylvain thanks for the eye opening hint on 3213
There are some critical issues with Thrift 0.6 that were fixed in 0.7

Thrift 0.6 critical issues
https://issues.apache.org/jira/browse/THRIFT-788
https://issues.apache.org/jira/browse/THRIFT-1067


@Jonathan you're right, the error message is related to closing connection.
I'm testing Cassandra+Thrift+PHP and added some debugging messages
around the code and found that the error message comes up after the
end of PHP execution, when PHP core is wrapping up closing the
connection.


Consistency level and ReadRepair

2011-10-05 Thread Ramesh Natarajan
I have a 12 node cassandra cluster running with RF=3.  I have severl
clients ( all running on a single node ) connecting to the cluster (
fixed client - node mapping ) and try to do a insert, update , select
and delete. Each client has a fixed mapping of the row-keys and always
connect to the same node. The timestamp on the client node is used for
all operations.  All operations are done using CL QUORUM.

When  I run a tpstats I see the ReadRepair count consistently
increasing. i need to figure out why ReadRepair is happening..

One scenario I can think of is, it could happen when there is a delay
in updating the nodes to reach eventual consistency..

Let's say I have 3 nodes (RF=3)  A,B,C. I insert   with timestamp
 to A and the call will return as soon as it inserts the record
to A and B. At some later point this information is sent to C...

A while later A,B,C have the same data with the same timestamp.

A 
B  and
C 

When I update  on A with timestamp  to A, the call will
return as soon as it inserts the record to A and B.
Now the data is

A 
B 
C 

Assuming I query for   A,C respond and since there is no QUORUM,
it waits for B to respond and when A,B match, the response is returned
to the client and ReadRepair is sent to C.

This could happen only when C is running behind in catching up the
updates to A,B.  Are there any stats that would let me know if the
system is in a consistent state?

thanks
Ramesh


tpstats_2011-10-05_12:50:01:ReadRepairStage   0
 0   43569781 0 0
tpstats_2011-10-05_12:55:01:ReadRepairStage   0
 0   43646420 0 0
tpstats_2011-10-05_13:00:02:ReadRepairStage   0
 0   43725850 0 0
tpstats_2011-10-05_13:05:01:ReadRepairStage   0
 0   43790047 0 0
tpstats_2011-10-05_13:10:02:ReadRepairStage   0
 0   43869704 0 0
tpstats_2011-10-05_13:15:01:ReadRepairStage   0
 0   43945635 0 0
tpstats_2011-10-05_13:20:01:ReadRepairStage   0
 0   44020406 0 0
tpstats_2011-10-05_13:25:02:ReadRepairStage   0
 0   44093227 0 0
tpstats_2011-10-05_13:30:01:ReadRepairStage   0
 0   44167455 0 0
tpstats_2011-10-05_13:35:02:ReadRepairStage   0
 0   44247519 0 0
tpstats_2011-10-05_13:40:01:ReadRepairStage   0
 0   44312726 0 0
tpstats_2011-10-05_13:45:01:ReadRepairStage   0
 0   44387633 0 0
tpstats_2011-10-05_13:50:01:ReadRepairStage   0
 0   3683 0 0
tpstats_2011-10-05_13:55:02:ReadRepairStage   0
 0   44499487 0 0
tpstats_2011-10-05_14:00:01:ReadRepairStage   0
 0   44578656 0 0
tpstats_2011-10-05_14:05:01:ReadRepairStage   0
 0   44647555 0 0
tpstats_2011-10-05_14:10:02:ReadRepairStage   0
 0   44716730 0 0
tpstats_2011-10-05_14:15:01:ReadRepairStage   0
 0   44776644 0 0
tpstats_2011-10-05_14:20:01:ReadRepairStage   0
 0   44840237 0 0
tpstats_2011-10-05_14:25:01:ReadRepairStage   0
 0   44891444 0 0
tpstats_2011-10-05_14:30:01:ReadRepairStage   0
 0   44931105 0 0
tpstats_2011-10-05_14:35:02:ReadRepairStage   0
 0   44976801 0 0
tpstats_2011-10-05_14:40:01:ReadRepairStage   0
 0   45042220 0 0
tpstats_2011-10-05_14:45:01:ReadRepairStage   0
 0   45112141 0 0
tpstats_2011-10-05_14:50:02:ReadRepairStage   0
 0   45177816 0 0
tpstats_2011-10-05_14:55:02:ReadRepairStage   0
 0   45246675 0 0
tpstats_2011-10-05_15:00:01:ReadRepairStage   0
 0   45309533 0 0
tpstats_2011-10-05_15:05:01:ReadRepairStage   0
 0   45357575 0 0
tpstats_2011-10-05_15:10:01:ReadRepairStage   0
 0   45405943 0 0
tpstats_2011-10-05_15:15:01:ReadRepairStage   0
 0   45458435 0 0
tpstats_2011-10-05_15:20:01:ReadRepairStage   0
 2   45508253 0 0
tpstats_2011-10-05_15:25:01:ReadRepairStage   0
 0   45570375 0 0
tpstats_2011-10-05_15:30:01:ReadRepairStage  

Re: nodetool cfstats on 1.0.0-rc1 throws an exception

2011-10-05 Thread Ramesh Natarajan
I don't have access to the test system anymore. We did move to lower
number of CFs and dont see this problem any more.
I remember when I noticed the size in system.log it was little more
than UINT_MAX (4294967295). I was able to recreate it multiple times.
So I am wondering if there are any stats counters in the system which
is set to unsigned int instead of unsigned long?

thanks
Ramesh

On Tue, Oct 4, 2011 at 3:20 AM, aaron morton  wrote:
> That row has a size of 819 peta bytes, so something is odd there. The error
> is a result of that value been so huge. When you rant he same script on
> 0.8.6 what was the max size of the Migrations CF ?
> As Jonathan says, it's unlikely anyone would have tested creating 5000 CF's.
> Most people only create a few 10's of CF's at most.
> either use fewer CF's or…
> * dump the Migrations CF using sstable2json to take a look around
> * work out steps to reproduce and report it on Jira
> Hope that helps.
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 4/10/2011, at 11:30 AM, Ramesh Natarajan wrote:
>
> We recreated the schema using the same input file on both clusters and they
> are running identical load.
> Isn't the exception thrown in the system CF?
> this line looks strange:
> Compacted row maximum size: 9223372036854775807
> thanks
> Ramesh
>
> On Mon, Oct 3, 2011 at 5:26 PM, Jonathan Ellis  wrote:
>>
>> Looks like you have unexpectedly large rows in your 1.0 cluster but
>> not 0.8.  I guess you could use sstable2json to manually check your
>> row sizes.
>>
>> On Mon, Oct 3, 2011 at 5:20 PM, Ramesh Natarajan 
>> wrote:
>> > It happens all the time on 1.0. It doesn't happen on 0.8.6.  Is there
>> > any
>> > thing I can do to check?
>> > thanks
>> > Ramesh
>> >
>> > On Mon, Oct 3, 2011 at 5:15 PM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> My suspicion would be that it has more to do with "rare case when
>> >> running with 5000 CFs" than "1.0 regression."
>> >>
>> >> On Mon, Oct 3, 2011 at 5:00 PM, Ramesh Natarajan 
>> >> wrote:
>> >> > We have about 5000 column family and when we run the nodetool cfstats
>> >> > it
>> >> > throws out this exception...  this is running 1.0.0-rc1
>> >> > This seems to work on 0.8.6.  Is this a bug in 1.0.0?
>> >> >
>> >> > thanks
>> >> > Ramesh
>> >> > Keyspace: system
>> >> >         Read Count: 28
>> >> >         Read Latency: 5.8675 ms.
>> >> >         Write Count: 3
>> >> >         Write Latency: 0.166 ms.
>> >> >         Pending Tasks: 0
>> >> >                 Column Family: Schema
>> >> >                 SSTable count: 4
>> >> >                 Space used (live): 4293758276
>> >> >                 Space used (total): 4293758276
>> >> >                 Number of Keys (estimate): 5376
>> >> >                 Memtable Columns Count: 0
>> >> >                 Memtable Data Size: 0
>> >> >                 Memtable Switch Count: 0
>> >> >                 Read Count: 3
>> >> >                 Read Latency: NaN ms.
>> >> >                 Write Count: 0
>> >> >                 Write Latency: NaN ms.
>> >> >                 Pending Tasks: 0
>> >> >                 Key cache capacity: 53
>> >> >                 Key cache size: 2
>> >> >                 Key cache hit rate: NaN
>> >> >                 Row cache: disabled
>> >> >                 Compacted row minimum size: 104
>> >> >                 Compacted row maximum size: 1955666
>> >> >                 Compacted row mean size: 1508515
>> >> >                 Column Family: HintsColumnFamily
>> >> >                 SSTable count: 0
>> >> >                 Space used (live): 0
>> >> >                 Space used (total): 0
>> >> >                 Number of Keys (estimate): 0
>> >> >                 Memtable Columns Count: 0
>> >> >                 Memtable Data Size: 0
>> >> >                 Memtable Switch Count: 0
>> >> >                 Read Count: 5
>> >> >                 Read Latency: NaN ms.
>> >> >                 Write Count: 0
>> >> >                 Write Latency: NaN ms.
>> >> >                 Pending Tasks: 0
>> >> >                 Key cache capacity: 1
>> >> >                 Key cache size: 0
>> >> >                 Key cache hit rate: NaN
>> >> >                 Row cache: disabled
>> >> >                 Compacted row minimum size: 0
>> >> >                 Compacted row maximum size: 0
>> >> >                 Compacted row mean size: 0
>> >> >                 Column Family: LocationInfo
>> >> >                 SSTable count: 1
>> >> >                 Space used (live): 6947
>> >> >                 Space used (total): 6947
>> >> >                 Number of Keys (estimate): 128
>> >> >                 Memtable Columns Count: 0
>> >> >                 Memtable Data Size: 0
>> >> >                 Memtable Switch Count: 2
>> >> >                 Read Count: 20
>> >> >                 Read Latency: NaN ms.
>> >> >                 Write Count: 3
>> >> >                 Write Latency: NaN ms.
>> >> >          

0.7.9 RejectedExecutionException

2011-10-05 Thread Ashley Martens
I'm getting the following exception on a 0.7.9 node before the node crashes.
I don't have this problem with the other nodes running 0.7.8. Does anyone
know what the problem is?

ERROR [Thread-47] 2011-10-05 05:07:03,840 AbstractCassandraDaemon.java (line
133) Fatal exception in thread Thread[Thread-47,5,main]
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
down
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
at
org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)


Memtable Switch Count

2011-10-05 Thread Ramesh Natarajan
What is Memtable Switch Count in the cfstats output?

thanks
Ramesh


Question about sharding of rows and atomicity

2011-10-05 Thread Don Smith
Does Cassandra shard the columns of a single row across multiple nodes 
so that to read the columns of the row it may need access to multiple 
nodes?   I'd say "no."   Will a read from a given node ever return 
partial results or is the write to a node of a row atomic?


 Thanks, Don




Re: Consistency level and ReadRepair

2011-10-05 Thread Jonathan Ellis
Start with http://wiki.apache.org/cassandra/ReadRepair.  Read repair
count increasing just means you were doing reads at < CL.ALL, and had
the CF configured to perform RR.

On Wed, Oct 5, 2011 at 12:37 PM, Ramesh Natarajan  wrote:
> I have a 12 node cassandra cluster running with RF=3.  I have severl
> clients ( all running on a single node ) connecting to the cluster (
> fixed client - node mapping ) and try to do a insert, update , select
> and delete. Each client has a fixed mapping of the row-keys and always
> connect to the same node. The timestamp on the client node is used for
> all operations.  All operations are done using CL QUORUM.
>
> When  I run a tpstats I see the ReadRepair count consistently
> increasing. i need to figure out why ReadRepair is happening..
>
> One scenario I can think of is, it could happen when there is a delay
> in updating the nodes to reach eventual consistency..
>
> Let's say I have 3 nodes (RF=3)  A,B,C. I insert   with timestamp
>  to A and the call will return as soon as it inserts the record
> to A and B. At some later point this information is sent to C...
>
> A while later A,B,C have the same data with the same timestamp.
>
> A 
> B  and
> C 
>
> When I update  on A with timestamp  to A, the call will
> return as soon as it inserts the record to A and B.
> Now the data is
>
> A 
> B 
> C 
>
> Assuming I query for   A,C respond and since there is no QUORUM,
> it waits for B to respond and when A,B match, the response is returned
> to the client and ReadRepair is sent to C.
>
> This could happen only when C is running behind in catching up the
> updates to A,B.  Are there any stats that would let me know if the
> system is in a consistent state?
>
> thanks
> Ramesh
>
>
> tpstats_2011-10-05_12:50:01:ReadRepairStage                   0
>  0       43569781         0                 0
> tpstats_2011-10-05_12:55:01:ReadRepairStage                   0
>  0       43646420         0                 0
> tpstats_2011-10-05_13:00:02:ReadRepairStage                   0
>  0       43725850         0                 0
> tpstats_2011-10-05_13:05:01:ReadRepairStage                   0
>  0       43790047         0                 0
> tpstats_2011-10-05_13:10:02:ReadRepairStage                   0
>  0       43869704         0                 0
> tpstats_2011-10-05_13:15:01:ReadRepairStage                   0
>  0       43945635         0                 0
> tpstats_2011-10-05_13:20:01:ReadRepairStage                   0
>  0       44020406         0                 0
> tpstats_2011-10-05_13:25:02:ReadRepairStage                   0
>  0       44093227         0                 0
> tpstats_2011-10-05_13:30:01:ReadRepairStage                   0
>  0       44167455         0                 0
> tpstats_2011-10-05_13:35:02:ReadRepairStage                   0
>  0       44247519         0                 0
> tpstats_2011-10-05_13:40:01:ReadRepairStage                   0
>  0       44312726         0                 0
> tpstats_2011-10-05_13:45:01:ReadRepairStage                   0
>  0       44387633         0                 0
> tpstats_2011-10-05_13:50:01:ReadRepairStage                   0
>  0       3683         0                 0
> tpstats_2011-10-05_13:55:02:ReadRepairStage                   0
>  0       44499487         0                 0
> tpstats_2011-10-05_14:00:01:ReadRepairStage                   0
>  0       44578656         0                 0
> tpstats_2011-10-05_14:05:01:ReadRepairStage                   0
>  0       44647555         0                 0
> tpstats_2011-10-05_14:10:02:ReadRepairStage                   0
>  0       44716730         0                 0
> tpstats_2011-10-05_14:15:01:ReadRepairStage                   0
>  0       44776644         0                 0
> tpstats_2011-10-05_14:20:01:ReadRepairStage                   0
>  0       44840237         0                 0
> tpstats_2011-10-05_14:25:01:ReadRepairStage                   0
>  0       44891444         0                 0
> tpstats_2011-10-05_14:30:01:ReadRepairStage                   0
>  0       44931105         0                 0
> tpstats_2011-10-05_14:35:02:ReadRepairStage                   0
>  0       44976801         0                 0
> tpstats_2011-10-05_14:40:01:ReadRepairStage                   0
>  0       45042220         0                 0
> tpstats_2011-10-05_14:45:01:ReadRepairStage                   0
>  0       45112141         0                 0
> tpstats_2011-10-05_14:50:02:ReadRepairStage                   0
>  0       45177816         0                 0
> tpstats_2011-10-05_14:55:02:ReadRepairStage                   0
>  0       45246675         0                 0
> tpstats_2011-10-05_15:00:01:ReadRepairStage                   0
>  0       45309533         0                 0
> tpstats_2011-10-05_15:05:01:ReadRepairStage                   0
>  0       45357575         0                 0
> tpstats_2011-10-05_15:10:01:ReadRepairStage                   0
> 

Re: 0.7.9 RejectedExecutionException

2011-10-05 Thread Jonathan Ellis
"I can't schedule this task because I'm shutting down" is a symptom of
your node crashing, not a cause.  Is it being OOMkilled, perhaps?

On Wed, Oct 5, 2011 at 12:42 PM, Ashley Martens  wrote:
> I'm getting the following exception on a 0.7.9 node before the node crashes.
> I don't have this problem with the other nodes running 0.7.8. Does anyone
> know what the problem is?
>
> ERROR [Thread-47] 2011-10-05 05:07:03,840 AbstractCassandraDaemon.java (line
> 133) Fatal exception in thread Thread[Thread-47,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut
> down
>     at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:76)
>     at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
>     at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
>     at
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:385)
>     at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:114)
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Question about sharding of rows and atomicity

2011-10-05 Thread Jonathan Ellis
On Wed, Oct 5, 2011 at 1:09 PM, Don Smith  wrote:
> Does Cassandra shard the columns of a single row across multiple nodes so
> that to read the columns of the row it may need access to multiple nodes?
> I'd say "no."

Correct.

>   Will a read from a given node ever return partial results or
> is the write to a node of a row atomic?

http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Question about sharding of rows and atomicity

2011-10-05 Thread Konstantin Naryshkin
Cassandra does not break apart a row. All of the columns of a row are kept on 
the same nodes.

I believe that writing multiple columns of the same row is transactional, but 
not atomic. By which I mean that if one column is written all the other ones 
will be written as well, but if a read happens while the write is being done it 
is possible that only some of the columns will have the new values.

- Original Message -
From: "Don Smith" 
To: user@cassandra.apache.org
Sent: Wednesday, October 5, 2011 2:09:36 PM
Subject: Question about sharding of rows and atomicity

Does Cassandra shard the columns of a single row across multiple nodes 
so that to read the columns of the row it may need access to multiple 
nodes?   I'd say "no."   Will a read from a given node ever return 
partial results or is the write to a node of a row atomic?

  Thanks, Don




Could not reach schema agreement

2011-10-05 Thread Ben Ashton
Hi Guys,

How would I go about fixing this? (running 0.8.4)

[default@unknown] connect 10.58.135.19/9160;
Connected to: "Test Cluster" on 10.58.135.19/9160
[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
66bd76c0-ee97-11e0--242d50cf1fbf: [10.234.119.110]
777ae000-cfd5-11e0--242d50cf1fbf: [10.58.135.19,
10.48.234.31, 10.224.55.162]

ERROR [HintedHandoff:2] 2011-10-05 18:39:36,896
AbstractCassandraDaemon.java (line 134) Fatal exception in thread
Thread[HintedHandoff:2,1,main]
java.lang.RuntimeException: java.lang.RuntimeException: Could not
reach schema agreement with /10.234.119.110 in 6ms
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.RuntimeException: Could not reach schema
agreement with /10.234.119.110 in 6ms
at 
org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)
at 
org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)
at 
org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 3 more


Re: Could not reach schema agreement

2011-10-05 Thread Jonathan Ellis
Did you try wiki.apache.org/cassandra/FAQ#schema_disagreement ?

On Wed, Oct 5, 2011 at 1:47 PM, Ben Ashton  wrote:
> Hi Guys,
>
> How would I go about fixing this? (running 0.8.4)
>
> [default@unknown] connect 10.58.135.19/9160;
> Connected to: "Test Cluster" on 10.58.135.19/9160
> [default@unknown] describe cluster;
> Cluster Information:
>   Snitch: org.apache.cassandra.locator.SimpleSnitch
>   Partitioner: org.apache.cassandra.dht.RandomPartitioner
>   Schema versions:
>        66bd76c0-ee97-11e0--242d50cf1fbf: [10.234.119.110]
>        777ae000-cfd5-11e0--242d50cf1fbf: [10.58.135.19,
> 10.48.234.31, 10.224.55.162]
>
> ERROR [HintedHandoff:2] 2011-10-05 18:39:36,896
> AbstractCassandraDaemon.java (line 134) Fatal exception in thread
> Thread[HintedHandoff:2,1,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Could not
> reach schema agreement with /10.234.119.110 in 6ms
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>        at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:636)
> Caused by: java.lang.RuntimeException: Could not reach schema
> agreement with /10.234.119.110 in 6ms
>        at 
> org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293)
>        at 
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)
>        at 
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)
>        at 
> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397)
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        ... 3 more
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Consistency level and ReadRepair

2011-10-05 Thread Ramesh Natarajan
Lets assume we have 3 nodes all up and running at all times with no
failures or communication problems.
1. If I have a RF=3 and writing with QUORUM,  2 nodes the change gets
committed, what is the delay we should expect before the 3rd replica
gets written
2. In this scenario ( no failures e.t.c )  if we do a read with a
QUORUM read what situation can lead to read repair? I didn't expect
any ReadRepair because all 3 must have the same value.


On Wed, Oct 5, 2011 at 1:11 PM, Jonathan Ellis  wrote:
> Start with http://wiki.apache.org/cassandra/ReadRepair.  Read repair
> count increasing just means you were doing reads at < CL.ALL, and had
> the CF configured to perform RR.
>
> On Wed, Oct 5, 2011 at 12:37 PM, Ramesh Natarajan  wrote:
>> I have a 12 node cassandra cluster running with RF=3.  I have severl
>> clients ( all running on a single node ) connecting to the cluster (
>> fixed client - node mapping ) and try to do a insert, update , select
>> and delete. Each client has a fixed mapping of the row-keys and always
>> connect to the same node. The timestamp on the client node is used for
>> all operations.  All operations are done using CL QUORUM.
>>
>> When  I run a tpstats I see the ReadRepair count consistently
>> increasing. i need to figure out why ReadRepair is happening..
>>
>> One scenario I can think of is, it could happen when there is a delay
>> in updating the nodes to reach eventual consistency..
>>
>> Let's say I have 3 nodes (RF=3)  A,B,C. I insert   with timestamp
>>  to A and the call will return as soon as it inserts the record
>> to A and B. At some later point this information is sent to C...
>>
>> A while later A,B,C have the same data with the same timestamp.
>>
>> A 
>> B  and
>> C 
>>
>> When I update  on A with timestamp  to A, the call will
>> return as soon as it inserts the record to A and B.
>> Now the data is
>>
>> A 
>> B 
>> C 
>>
>> Assuming I query for   A,C respond and since there is no QUORUM,
>> it waits for B to respond and when A,B match, the response is returned
>> to the client and ReadRepair is sent to C.
>>
>> This could happen only when C is running behind in catching up the
>> updates to A,B.  Are there any stats that would let me know if the
>> system is in a consistent state?
>>
>> thanks
>> Ramesh
>>
>>
>> tpstats_2011-10-05_12:50:01:ReadRepairStage                   0
>>  0       43569781         0                 0
>> tpstats_2011-10-05_12:55:01:ReadRepairStage                   0
>>  0       43646420         0                 0
>> tpstats_2011-10-05_13:00:02:ReadRepairStage                   0
>>  0       43725850         0                 0
>> tpstats_2011-10-05_13:05:01:ReadRepairStage                   0
>>  0       43790047         0                 0
>> tpstats_2011-10-05_13:10:02:ReadRepairStage                   0
>>  0       43869704         0                 0
>> tpstats_2011-10-05_13:15:01:ReadRepairStage                   0
>>  0       43945635         0                 0
>> tpstats_2011-10-05_13:20:01:ReadRepairStage                   0
>>  0       44020406         0                 0
>> tpstats_2011-10-05_13:25:02:ReadRepairStage                   0
>>  0       44093227         0                 0
>> tpstats_2011-10-05_13:30:01:ReadRepairStage                   0
>>  0       44167455         0                 0
>> tpstats_2011-10-05_13:35:02:ReadRepairStage                   0
>>  0       44247519         0                 0
>> tpstats_2011-10-05_13:40:01:ReadRepairStage                   0
>>  0       44312726         0                 0
>> tpstats_2011-10-05_13:45:01:ReadRepairStage                   0
>>  0       44387633         0                 0
>> tpstats_2011-10-05_13:50:01:ReadRepairStage                   0
>>  0       3683         0                 0
>> tpstats_2011-10-05_13:55:02:ReadRepairStage                   0
>>  0       44499487         0                 0
>> tpstats_2011-10-05_14:00:01:ReadRepairStage                   0
>>  0       44578656         0                 0
>> tpstats_2011-10-05_14:05:01:ReadRepairStage                   0
>>  0       44647555         0                 0
>> tpstats_2011-10-05_14:10:02:ReadRepairStage                   0
>>  0       44716730         0                 0
>> tpstats_2011-10-05_14:15:01:ReadRepairStage                   0
>>  0       44776644         0                 0
>> tpstats_2011-10-05_14:20:01:ReadRepairStage                   0
>>  0       44840237         0                 0
>> tpstats_2011-10-05_14:25:01:ReadRepairStage                   0
>>  0       44891444         0                 0
>> tpstats_2011-10-05_14:30:01:ReadRepairStage                   0
>>  0       44931105         0                 0
>> tpstats_2011-10-05_14:35:02:ReadRepairStage                   0
>>  0       44976801         0                 0
>> tpstats_2011-10-05_14:40:01:ReadRepairStage                   0
>>  0       45042220         0                 0
>> tpstats_2011-10-05_14:45:01:ReadRep

Re: Could not reach schema agreement

2011-10-05 Thread Ben Ashton
Ah thats great!

I was rubbing my head for a while as google only showed mailing lists
posts with the same error.

All working now, thanks

On 5 October 2011 19:49, Jonathan Ellis  wrote:
> Did you try wiki.apache.org/cassandra/FAQ#schema_disagreement ?
>
> On Wed, Oct 5, 2011 at 1:47 PM, Ben Ashton  wrote:
>> Hi Guys,
>>
>> How would I go about fixing this? (running 0.8.4)
>>
>> [default@unknown] connect 10.58.135.19/9160;
>> Connected to: "Test Cluster" on 10.58.135.19/9160
>> [default@unknown] describe cluster;
>> Cluster Information:
>>   Snitch: org.apache.cassandra.locator.SimpleSnitch
>>   Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>   Schema versions:
>>        66bd76c0-ee97-11e0--242d50cf1fbf: [10.234.119.110]
>>        777ae000-cfd5-11e0--242d50cf1fbf: [10.58.135.19,
>> 10.48.234.31, 10.224.55.162]
>>
>> ERROR [HintedHandoff:2] 2011-10-05 18:39:36,896
>> AbstractCassandraDaemon.java (line 134) Fatal exception in thread
>> Thread[HintedHandoff:2,1,main]
>> java.lang.RuntimeException: java.lang.RuntimeException: Could not
>> reach schema agreement with /10.234.119.110 in 6ms
>>        at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>        at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>        at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>        at java.lang.Thread.run(Thread.java:636)
>> Caused by: java.lang.RuntimeException: Could not reach schema
>> agreement with /10.234.119.110 in 6ms
>>        at 
>> org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293)
>>        at 
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)
>>        at 
>> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89)
>>        at 
>> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397)
>>        at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>        ... 3 more
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Consistency level and ReadRepair

2011-10-05 Thread Mohit Anchlia
Do you see any errors in the logs? Is your HH enabled?

On Wed, Oct 5, 2011 at 12:00 PM, Ramesh Natarajan  wrote:
> Lets assume we have 3 nodes all up and running at all times with no
> failures or communication problems.
> 1. If I have a RF=3 and writing with QUORUM,  2 nodes the change gets
> committed, what is the delay we should expect before the 3rd replica
> gets written
> 2. In this scenario ( no failures e.t.c )  if we do a read with a
> QUORUM read what situation can lead to read repair? I didn't expect
> any ReadRepair because all 3 must have the same value.
>
>
> On Wed, Oct 5, 2011 at 1:11 PM, Jonathan Ellis  wrote:
>> Start with http://wiki.apache.org/cassandra/ReadRepair.  Read repair
>> count increasing just means you were doing reads at < CL.ALL, and had
>> the CF configured to perform RR.
>>
>> On Wed, Oct 5, 2011 at 12:37 PM, Ramesh Natarajan  wrote:
>>> I have a 12 node cassandra cluster running with RF=3.  I have severl
>>> clients ( all running on a single node ) connecting to the cluster (
>>> fixed client - node mapping ) and try to do a insert, update , select
>>> and delete. Each client has a fixed mapping of the row-keys and always
>>> connect to the same node. The timestamp on the client node is used for
>>> all operations.  All operations are done using CL QUORUM.
>>>
>>> When  I run a tpstats I see the ReadRepair count consistently
>>> increasing. i need to figure out why ReadRepair is happening..
>>>
>>> One scenario I can think of is, it could happen when there is a delay
>>> in updating the nodes to reach eventual consistency..
>>>
>>> Let's say I have 3 nodes (RF=3)  A,B,C. I insert   with timestamp
>>>  to A and the call will return as soon as it inserts the record
>>> to A and B. At some later point this information is sent to C...
>>>
>>> A while later A,B,C have the same data with the same timestamp.
>>>
>>> A 
>>> B  and
>>> C 
>>>
>>> When I update  on A with timestamp  to A, the call will
>>> return as soon as it inserts the record to A and B.
>>> Now the data is
>>>
>>> A 
>>> B 
>>> C 
>>>
>>> Assuming I query for   A,C respond and since there is no QUORUM,
>>> it waits for B to respond and when A,B match, the response is returned
>>> to the client and ReadRepair is sent to C.
>>>
>>> This could happen only when C is running behind in catching up the
>>> updates to A,B.  Are there any stats that would let me know if the
>>> system is in a consistent state?
>>>
>>> thanks
>>> Ramesh
>>>
>>>
>>> tpstats_2011-10-05_12:50:01:ReadRepairStage                   0
>>>  0       43569781         0                 0
>>> tpstats_2011-10-05_12:55:01:ReadRepairStage                   0
>>>  0       43646420         0                 0
>>> tpstats_2011-10-05_13:00:02:ReadRepairStage                   0
>>>  0       43725850         0                 0
>>> tpstats_2011-10-05_13:05:01:ReadRepairStage                   0
>>>  0       43790047         0                 0
>>> tpstats_2011-10-05_13:10:02:ReadRepairStage                   0
>>>  0       43869704         0                 0
>>> tpstats_2011-10-05_13:15:01:ReadRepairStage                   0
>>>  0       43945635         0                 0
>>> tpstats_2011-10-05_13:20:01:ReadRepairStage                   0
>>>  0       44020406         0                 0
>>> tpstats_2011-10-05_13:25:02:ReadRepairStage                   0
>>>  0       44093227         0                 0
>>> tpstats_2011-10-05_13:30:01:ReadRepairStage                   0
>>>  0       44167455         0                 0
>>> tpstats_2011-10-05_13:35:02:ReadRepairStage                   0
>>>  0       44247519         0                 0
>>> tpstats_2011-10-05_13:40:01:ReadRepairStage                   0
>>>  0       44312726         0                 0
>>> tpstats_2011-10-05_13:45:01:ReadRepairStage                   0
>>>  0       44387633         0                 0
>>> tpstats_2011-10-05_13:50:01:ReadRepairStage                   0
>>>  0       3683         0                 0
>>> tpstats_2011-10-05_13:55:02:ReadRepairStage                   0
>>>  0       44499487         0                 0
>>> tpstats_2011-10-05_14:00:01:ReadRepairStage                   0
>>>  0       44578656         0                 0
>>> tpstats_2011-10-05_14:05:01:ReadRepairStage                   0
>>>  0       44647555         0                 0
>>> tpstats_2011-10-05_14:10:02:ReadRepairStage                   0
>>>  0       44716730         0                 0
>>> tpstats_2011-10-05_14:15:01:ReadRepairStage                   0
>>>  0       44776644         0                 0
>>> tpstats_2011-10-05_14:20:01:ReadRepairStage                   0
>>>  0       44840237         0                 0
>>> tpstats_2011-10-05_14:25:01:ReadRepairStage                   0
>>>  0       44891444         0                 0
>>> tpstats_2011-10-05_14:30:01:ReadRepairStage                   0
>>>  0       44931105         0                 0
>>> tpstats_2011-10-05_14:35:02:ReadRepairStag

Re: Consistency level and ReadRepair

2011-10-05 Thread Ramesh Natarajan
Yes Hinted Handoff is enabled. However I don't see any counters
raising against the HintedHandoff in the tpstats.

thanks
Ramesh

On Wed, Oct 5, 2011 at 2:10 PM, Mohit Anchlia  wrote:
> Do you see any errors in the logs? Is your HH enabled?
>
> On Wed, Oct 5, 2011 at 12:00 PM, Ramesh Natarajan  wrote:
>> Lets assume we have 3 nodes all up and running at all times with no
>> failures or communication problems.
>> 1. If I have a RF=3 and writing with QUORUM,  2 nodes the change gets
>> committed, what is the delay we should expect before the 3rd replica
>> gets written
>> 2. In this scenario ( no failures e.t.c )  if we do a read with a
>> QUORUM read what situation can lead to read repair? I didn't expect
>> any ReadRepair because all 3 must have the same value.
>>
>>
>> On Wed, Oct 5, 2011 at 1:11 PM, Jonathan Ellis  wrote:
>>> Start with http://wiki.apache.org/cassandra/ReadRepair.  Read repair
>>> count increasing just means you were doing reads at < CL.ALL, and had
>>> the CF configured to perform RR.
>>>
>>> On Wed, Oct 5, 2011 at 12:37 PM, Ramesh Natarajan  
>>> wrote:
 I have a 12 node cassandra cluster running with RF=3.  I have severl
 clients ( all running on a single node ) connecting to the cluster (
 fixed client - node mapping ) and try to do a insert, update , select
 and delete. Each client has a fixed mapping of the row-keys and always
 connect to the same node. The timestamp on the client node is used for
 all operations.  All operations are done using CL QUORUM.

 When  I run a tpstats I see the ReadRepair count consistently
 increasing. i need to figure out why ReadRepair is happening..

 One scenario I can think of is, it could happen when there is a delay
 in updating the nodes to reach eventual consistency..

 Let's say I have 3 nodes (RF=3)  A,B,C. I insert   with timestamp
  to A and the call will return as soon as it inserts the record
 to A and B. At some later point this information is sent to C...

 A while later A,B,C have the same data with the same timestamp.

 A 
 B  and
 C 

 When I update  on A with timestamp  to A, the call will
 return as soon as it inserts the record to A and B.
 Now the data is

 A 
 B 
 C 

 Assuming I query for   A,C respond and since there is no QUORUM,
 it waits for B to respond and when A,B match, the response is returned
 to the client and ReadRepair is sent to C.

 This could happen only when C is running behind in catching up the
 updates to A,B.  Are there any stats that would let me know if the
 system is in a consistent state?

 thanks
 Ramesh


 tpstats_2011-10-05_12:50:01:ReadRepairStage                   0
  0       43569781         0                 0
 tpstats_2011-10-05_12:55:01:ReadRepairStage                   0
  0       43646420         0                 0
 tpstats_2011-10-05_13:00:02:ReadRepairStage                   0
  0       43725850         0                 0
 tpstats_2011-10-05_13:05:01:ReadRepairStage                   0
  0       43790047         0                 0
 tpstats_2011-10-05_13:10:02:ReadRepairStage                   0
  0       43869704         0                 0
 tpstats_2011-10-05_13:15:01:ReadRepairStage                   0
  0       43945635         0                 0
 tpstats_2011-10-05_13:20:01:ReadRepairStage                   0
  0       44020406         0                 0
 tpstats_2011-10-05_13:25:02:ReadRepairStage                   0
  0       44093227         0                 0
 tpstats_2011-10-05_13:30:01:ReadRepairStage                   0
  0       44167455         0                 0
 tpstats_2011-10-05_13:35:02:ReadRepairStage                   0
  0       44247519         0                 0
 tpstats_2011-10-05_13:40:01:ReadRepairStage                   0
  0       44312726         0                 0
 tpstats_2011-10-05_13:45:01:ReadRepairStage                   0
  0       44387633         0                 0
 tpstats_2011-10-05_13:50:01:ReadRepairStage                   0
  0       3683         0                 0
 tpstats_2011-10-05_13:55:02:ReadRepairStage                   0
  0       44499487         0                 0
 tpstats_2011-10-05_14:00:01:ReadRepairStage                   0
  0       44578656         0                 0
 tpstats_2011-10-05_14:05:01:ReadRepairStage                   0
  0       44647555         0                 0
 tpstats_2011-10-05_14:10:02:ReadRepairStage                   0
  0       44716730         0                 0
 tpstats_2011-10-05_14:15:01:ReadRepairStage                   0
  0       44776644         0                 0
 tpstats_2011-10-05_14:20:01:ReadRepairStage                   0
  0       44840237         0          

Re: Consistency level and ReadRepair

2011-10-05 Thread Jonathan Ellis
As explained in the link in my earlier reply, "Read Repair" just means
"a replica was checked in the background," not that it was out of
sync.

On Wed, Oct 5, 2011 at 2:00 PM, Ramesh Natarajan  wrote:
> Lets assume we have 3 nodes all up and running at all times with no
> failures or communication problems.
> 1. If I have a RF=3 and writing with QUORUM,  2 nodes the change gets
> committed, what is the delay we should expect before the 3rd replica
> gets written
> 2. In this scenario ( no failures e.t.c )  if we do a read with a
> QUORUM read what situation can lead to read repair? I didn't expect
> any ReadRepair because all 3 must have the same value.
>
>
> On Wed, Oct 5, 2011 at 1:11 PM, Jonathan Ellis  wrote:
>> Start with http://wiki.apache.org/cassandra/ReadRepair.  Read repair
>> count increasing just means you were doing reads at < CL.ALL, and had
>> the CF configured to perform RR.
>>
>> On Wed, Oct 5, 2011 at 12:37 PM, Ramesh Natarajan  wrote:
>>> I have a 12 node cassandra cluster running with RF=3.  I have severl
>>> clients ( all running on a single node ) connecting to the cluster (
>>> fixed client - node mapping ) and try to do a insert, update , select
>>> and delete. Each client has a fixed mapping of the row-keys and always
>>> connect to the same node. The timestamp on the client node is used for
>>> all operations.  All operations are done using CL QUORUM.
>>>
>>> When  I run a tpstats I see the ReadRepair count consistently
>>> increasing. i need to figure out why ReadRepair is happening..
>>>
>>> One scenario I can think of is, it could happen when there is a delay
>>> in updating the nodes to reach eventual consistency..
>>>
>>> Let's say I have 3 nodes (RF=3)  A,B,C. I insert   with timestamp
>>>  to A and the call will return as soon as it inserts the record
>>> to A and B. At some later point this information is sent to C...
>>>
>>> A while later A,B,C have the same data with the same timestamp.
>>>
>>> A 
>>> B  and
>>> C 
>>>
>>> When I update  on A with timestamp  to A, the call will
>>> return as soon as it inserts the record to A and B.
>>> Now the data is
>>>
>>> A 
>>> B 
>>> C 
>>>
>>> Assuming I query for   A,C respond and since there is no QUORUM,
>>> it waits for B to respond and when A,B match, the response is returned
>>> to the client and ReadRepair is sent to C.
>>>
>>> This could happen only when C is running behind in catching up the
>>> updates to A,B.  Are there any stats that would let me know if the
>>> system is in a consistent state?
>>>
>>> thanks
>>> Ramesh
>>>
>>>
>>> tpstats_2011-10-05_12:50:01:ReadRepairStage                   0
>>>  0       43569781         0                 0
>>> tpstats_2011-10-05_12:55:01:ReadRepairStage                   0
>>>  0       43646420         0                 0
>>> tpstats_2011-10-05_13:00:02:ReadRepairStage                   0
>>>  0       43725850         0                 0
>>> tpstats_2011-10-05_13:05:01:ReadRepairStage                   0
>>>  0       43790047         0                 0
>>> tpstats_2011-10-05_13:10:02:ReadRepairStage                   0
>>>  0       43869704         0                 0
>>> tpstats_2011-10-05_13:15:01:ReadRepairStage                   0
>>>  0       43945635         0                 0
>>> tpstats_2011-10-05_13:20:01:ReadRepairStage                   0
>>>  0       44020406         0                 0
>>> tpstats_2011-10-05_13:25:02:ReadRepairStage                   0
>>>  0       44093227         0                 0
>>> tpstats_2011-10-05_13:30:01:ReadRepairStage                   0
>>>  0       44167455         0                 0
>>> tpstats_2011-10-05_13:35:02:ReadRepairStage                   0
>>>  0       44247519         0                 0
>>> tpstats_2011-10-05_13:40:01:ReadRepairStage                   0
>>>  0       44312726         0                 0
>>> tpstats_2011-10-05_13:45:01:ReadRepairStage                   0
>>>  0       44387633         0                 0
>>> tpstats_2011-10-05_13:50:01:ReadRepairStage                   0
>>>  0       3683         0                 0
>>> tpstats_2011-10-05_13:55:02:ReadRepairStage                   0
>>>  0       44499487         0                 0
>>> tpstats_2011-10-05_14:00:01:ReadRepairStage                   0
>>>  0       44578656         0                 0
>>> tpstats_2011-10-05_14:05:01:ReadRepairStage                   0
>>>  0       44647555         0                 0
>>> tpstats_2011-10-05_14:10:02:ReadRepairStage                   0
>>>  0       44716730         0                 0
>>> tpstats_2011-10-05_14:15:01:ReadRepairStage                   0
>>>  0       44776644         0                 0
>>> tpstats_2011-10-05_14:20:01:ReadRepairStage                   0
>>>  0       44840237         0                 0
>>> tpstats_2011-10-05_14:25:01:ReadRepairStage                   0
>>>  0       44891444         0                 0
>>> tpstats_2011-10-05_14:30:01:ReadRepairStage                   0
>>>  0   

Re: 0.7.9 RejectedExecutionException

2011-10-05 Thread Ashley Martens
No OOM errors appear and the memory used is far below physical and Java max.
I changed the JAR to 0.7.8 to see if that works. If so I'll find a way to
roll out that version instead of 0.7.9.


Re: Could not reach schema agreement

2011-10-05 Thread Ben Ashton
oh no spoke to soon..

All me data are being gone :(

/opt/apache-cassandra-0.8.4/bin/nodetool -h 10.224.55.162 repair
Exception in thread "main" java.lang.AssertionError: Repairing no
column families seems pointless, doesn't it
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.(AntiEntropyService.java:625)
at 
org.apache.cassandra.service.AntiEntropyService$RepairSession.(AntiEntropyService.java:617)
at 
org.apache.cassandra.service.AntiEntropyService.getRepairSession(AntiEntropyService.java:129)
at 
org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1620)
at 
org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


On 5 October 2011 20:03, Ben Ashton  wrote:
> Ah thats great!
>
> I was rubbing my head for a while as google only showed mailing lists
> posts with the same error.
>
> All working now, thanks
>
> On 5 October 2011 19:49, Jonathan Ellis  wrote:
>> Did you try wiki.apache.org/cassandra/FAQ#schema_disagreement ?
>>
>> On Wed, Oct 5, 2011 at 1:47 PM, Ben Ashton  wrote:
>>> Hi Guys,
>>>
>>> How would I go about fixing this? (running 0.8.4)
>>>
>>> [default@unknown] connect 10.58.135.19/9160;
>>> Connected to: "Test Cluster" on 10.58.135.19/9160
>>> [default@unknown] describe cluster;
>>> Cluster Information:
>>>   Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>   Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>   Schema versions:
>>>        66bd76c0-ee97-11e0--242d50cf1fbf: [10.234.119.110]
>>>        777ae000-cfd5-11e0--242d50cf1fbf: [10.58.135.19,
>>> 10.48.234.31, 10.224.55.162]
>>>
>>> ERROR [HintedHandoff:2] 2011-10-05 18:39:36,896
>>> AbstractCassandraDaemon.java (line 134) Fatal exception in thread
>>> Thread[HintedHandoff:2,1,main]
>>> java.lang.RuntimeException: java.lang.RuntimeException: Could not
>>> reach schema agreement with /10.234.119.110 in 6ms
>>>        at 
>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>        at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>        at java.lang.Thread.run(Thread.java:636)
>>> Caused by: java.lang.RuntimeException: Could not reach schema
>>> agreement with /10.234.119.110 in 6ms
>>>     

Re: Consistency level and ReadRepair

2011-10-05 Thread Ramesh Natarajan
Thanks for the explanation. I think i am at loss trying to understand
the tpstats output.. when does the ReadRepair count get incremented?

- When any read is performed with CL < ALL and RF=3 (or)
- When there is a discrepency?

I have 2 snapshots when i run tpstats and the counts indicate there
were 1042805 reads and 354774 ReadRepairs.
All reads are done with consistenct QUORUM. Per documentation should
we do the read repair on all the reads?

ReadStage 1 13533450 0
0
RequestResponseStage  0 07258586 0
0
MutationStage 0 15056119 0
0
ReadRepairStage   0 01210754 0
0


ReadStage 1 14576255 0
0
RequestResponseStage  0 09460969 0
0
MutationStage 0 26638499 0
0
ReadRepairStage   0 01565528 0
0


Read difference: 1042805
ReadRepair difference : 354774

thanks
Ramesh

On Wed, Oct 5, 2011 at 2:21 PM, Jonathan Ellis  wrote:
> As explained in the link in my earlier reply, "Read Repair" just means
> "a replica was checked in the background," not that it was out of
> sync.
>
> On Wed, Oct 5, 2011 at 2:00 PM, Ramesh Natarajan  wrote:
>> Lets assume we have 3 nodes all up and running at all times with no
>> failures or communication problems.
>> 1. If I have a RF=3 and writing with QUORUM,  2 nodes the change gets
>> committed, what is the delay we should expect before the 3rd replica
>> gets written
>> 2. In this scenario ( no failures e.t.c )  if we do a read with a
>> QUORUM read what situation can lead to read repair? I didn't expect
>> any ReadRepair because all 3 must have the same value.
>>
>>
>> On Wed, Oct 5, 2011 at 1:11 PM, Jonathan Ellis  wrote:
>>> Start with http://wiki.apache.org/cassandra/ReadRepair.  Read repair
>>> count increasing just means you were doing reads at < CL.ALL, and had
>>> the CF configured to perform RR.
>>>
>>> On Wed, Oct 5, 2011 at 12:37 PM, Ramesh Natarajan  
>>> wrote:
 I have a 12 node cassandra cluster running with RF=3.  I have severl
 clients ( all running on a single node ) connecting to the cluster (
 fixed client - node mapping ) and try to do a insert, update , select
 and delete. Each client has a fixed mapping of the row-keys and always
 connect to the same node. The timestamp on the client node is used for
 all operations.  All operations are done using CL QUORUM.

 When  I run a tpstats I see the ReadRepair count consistently
 increasing. i need to figure out why ReadRepair is happening..

 One scenario I can think of is, it could happen when there is a delay
 in updating the nodes to reach eventual consistency..

 Let's say I have 3 nodes (RF=3)  A,B,C. I insert   with timestamp
  to A and the call will return as soon as it inserts the record
 to A and B. At some later point this information is sent to C...

 A while later A,B,C have the same data with the same timestamp.

 A 
 B  and
 C 

 When I update  on A with timestamp  to A, the call will
 return as soon as it inserts the record to A and B.
 Now the data is

 A 
 B 
 C 

 Assuming I query for   A,C respond and since there is no QUORUM,
 it waits for B to respond and when A,B match, the response is returned
 to the client and ReadRepair is sent to C.

 This could happen only when C is running behind in catching up the
 updates to A,B.  Are there any stats that would let me know if the
 system is in a consistent state?

 thanks
 Ramesh


 tpstats_2011-10-05_12:50:01:ReadRepairStage                   0
  0       43569781         0                 0
 tpstats_2011-10-05_12:55:01:ReadRepairStage                   0
  0       43646420         0                 0
 tpstats_2011-10-05_13:00:02:ReadRepairStage                   0
  0       43725850         0                 0
 tpstats_2011-10-05_13:05:01:ReadRepairStage                   0
  0       43790047         0                 0
 tpstats_2011-10-05_13:10:02:ReadRepairStage                   0
  0       43869704         0                 0
 tpstats_2011-10-05_13:15:01:ReadRepairStage                   0
  0       43945635         0                 0
 tpstats_2011-10-05_13:20:01:ReadRepairStage                   0
  0       44020406         0                 0
 tpstats_2011-10-05_13:25:02:ReadRepairStage                   0
  0       44093227         0                 0
 tpstats_2011-10-05_13:30:01:ReadRepairStage                   0
  0       44167455         0          

Re: Why is mutation stage increasing ??

2011-10-05 Thread aaron morton
Sounds like there is a lot going on. 

I'm going to assume the order you showed the HH stats in is the order of the 
nodes. I'm guessing node 180 is node 2, but it would be easier if you could 
identify the nodes and identify the stats for them. 

In no particular order:

* Have a heavily used My SQL box on one node is going to make your life *much* 
harder than it needs to be. The HH stats looks really suspicious, nodes 1 and 2 
running lots of HH and node 2 not. It looks like node 2 is having a hard time 
keeping up, and i would guess my sql may have something to do with that. Does 
the cassandra service on node 2 have the same memory allocation as the other 
nodes ? Were the TP stats below that showed the flush stage blocking from node 
2 ? Whats the %util and queue-sz from iostat on node 2 compared to the other 
nodes? 

* Ignore all the metrics about what is happening on node 2 and get mysql off 
there.   

* You are running a 16GB heap, that is at the upper range (some would say above 
the upper range) of what is effective for java / cassandra. 8GB seems to be a 
popular sweet spot. Work out why you are using so much memory and then try to 
get the heap size down to 8.

* Try to understand whats happening with memory usage. Do you have lots of CF's 
? Lots of row cache ?  Some very big rows ? There are several features in 0.8.6 
(see the yaml) that will kick in when memory usage is high, but you need to 
understand why it's high.

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5/10/2011, at 11:36 PM, Philippe wrote:

> Followup, 
> I was mistaken in saying there weren't writes to the cluster. There's a 
> process that's doing a couple mutations per second.
> 
> I just restarted node #3 and found this message on node #1
>  INFO [HintedHandoff:1] 2011-10-05 12:25:08,173 HintedHandOffManager.java 
> (line 314) Endpoint /xx.xx.xx.180 died before hint delivery, aborting
> 
> Could HH have stuck the nodes on the receiving end ? Is there any way to 
> throttle this ?
> If it can't be throttled and you confirm HH is a suspect I may simply disable 
> it as I'm running a repair 3 times a week (once per node) so I guess my 
> cluster won't be too out of sync.
> 
> Thanks
> 
> 2011/10/5 Philippe 
> Thanks for the quick responses.
> 
> @Yi
> Using Hector 0.8.0-1
> Hardware is : 
> AMD Opteron 4174 6x 2.30+ GHz
> 32 Go DDR3
> 1 Gbps Lossless
> 
> @aaron
> I'm running 0.8.6 on all nodes, straight from the debian packages.
> I get hinted handoffs from time to time because of flapping, I've planning to 
> increase the phi as per another thread but haven't yet.
> Here are the HH per node::
> HintedHandoff 0 0437 0
>  0
> HintedHandoff 0 0  2 0
>  0
> 
> HintedHandoff 0 0   1798 0
>  0
> Not seeing any iowait from my munin CPU graph, at least not more than the 
> past couple of weeks. There is a little more on the 2nd node because it's 
> also holding a mysql database that gets hit hard.
> Munin iostats graph shows an average 10-20kilo blocks/s read/write.
> 
> Nothing special was happening besides a weekly repair on node two starting 
> yesterday at 4am. That one failed with
> ERROR [AntiEntropySessions:5] 2011-10-04 04:03:56,676 
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
> Thread[AntiEntropySessions:5,5,RMI Runtime]
> java.lang.RuntimeException: java.io.IOException: Problem during repair 
> session manual-repair-0d4125f0-aab7-403b-a083-7c19ef6579b1, endpoint 
> /xx.xx.xx.180 died
> Then the next planned repairs failed before starting
> INFO [AntiEntropySessions:8] 2011-10-04 04:03:57,137 AntiEntropyService.java 
> (line 658) Could not proceed on repair because a neighbor (/xx.xx.xx.180) is 
> dead: manual-repair-5dc4c221-3a15-4031-9aa8-0931e41816cd failed
> 
> Looking at the logs on that node shows no Exception. And I was about to say, 
> "nothing special happening at that time" except that it looks like at 4am, 
> the GC started working hard and got the heap down to 9GB and then it shot 
> straight up to almost 16GB so I guess ParNew couldn't keep up and 
> ConcurrentMarkSweep had to step in and basically hang the server ? It took 
> another 2 minutes until I get the "Heap is 0.75 full" message, I get a lot of 
> StatusLogger messages before that.
> So it looks like computing the Merkle tree was very expensive this time... I 
> wonder why ? Anything I can do to handle this ?
> 
> INFO [ScheduledTasks:1] 2011-10-04 04:03:12,874 GCInspector.java (line 122) 
> GC for ParNew: 427 ms for 2 collections, 16227154488 used; max is 16838033408
>  INFO [GossipTasks:1] 2011-10-04 04:04:24,092 Gossiper.java (line 697) 
> InetAddress /xx.xx.xx.97 is now dead.
>  INFO [ScheduledTasks:1] 2011-10-04 04:04:24,093 GCInspector.java (line 122) 
> G

Re: Memtable Switch Count

2011-10-05 Thread aaron morton
How many times a "full" memtable was swapped for an empty one 
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

Chhers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/10/2011, at 7:04 AM, Ramesh Natarajan wrote:

> What is Memtable Switch Count in the cfstats output?
> 
> thanks
> Ramesh



cassandra-cli: Create column family with composite column name

2011-10-05 Thread Jim Ancona
Using Cassandra 0.8.6, I've been trying to figure out how to use the
CLI to create column families using composite keys and column names.
The documentation on CompositeType seems pretty skimpy. But in the
course of writing this email to ask how to do it, I figured out the
proper syntax. In the hope of making it easier for the next person, I
repurposed this message to document what I figured out. I'll also
update the wiki. Here is the syntax:

create column family MyCF
with key_validation_class = 'CompositeType(UTF8Type, IntegerType)'
and comparator = 'CompositeType(DateType(reversed=true), UTF8Type)'
and default_validation_class='CompositeType(UTF8Type, DateType)'
and column_metadata=[
{ column_name:'0:my Column Name', validation_class:LongType,
index_type:KEYS}
];

One weakness of this syntax is that there doesn't seem to be a way to
escape a ':' in a composite value. There's a FIXME in the code to that
effect.

Jim


Re: Could not reach schema agreement

2011-10-05 Thread aaron morton
Check the data directories, including the snapshot one. Data is not deleted. 

If you create a CF the server will look for existing files and load them. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/10/2011, at 8:35 AM, Ben Ashton wrote:

> oh no spoke to soon..
> 
> All me data are being gone :(
> 
> /opt/apache-cassandra-0.8.4/bin/nodetool -h 10.224.55.162 repair
> Exception in thread "main" java.lang.AssertionError: Repairing no
> column families seems pointless, doesn't it
>at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.(AntiEntropyService.java:625)
>at 
> org.apache.cassandra.service.AntiEntropyService$RepairSession.(AntiEntropyService.java:617)
>at 
> org.apache.cassandra.service.AntiEntropyService.getRepairSession(AntiEntropyService.java:129)
>at 
> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1620)
>at 
> org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1579)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:616)
>at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
>at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
>at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
>at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
>at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
>at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
>at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
>at 
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
>at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
>at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
>at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:616)
>at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
>at sun.rmi.transport.Transport$1.run(Transport.java:177)
>at java.security.AccessController.doPrivileged(Native Method)
>at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
>at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
>at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
>at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>at java.lang.Thread.run(Thread.java:636)
> 
> 
> On 5 October 2011 20:03, Ben Ashton  wrote:
>> Ah thats great!
>> 
>> I was rubbing my head for a while as google only showed mailing lists
>> posts with the same error.
>> 
>> All working now, thanks
>> 
>> On 5 October 2011 19:49, Jonathan Ellis  wrote:
>>> Did you try wiki.apache.org/cassandra/FAQ#schema_disagreement ?
>>> 
>>> On Wed, Oct 5, 2011 at 1:47 PM, Ben Ashton  wrote:
 Hi Guys,
 
 How would I go about fixing this? (running 0.8.4)
 
 [default@unknown] connect 10.58.135.19/9160;
 Connected to: "Test Cluster" on 10.58.135.19/9160
 [default@unknown] describe cluster;
 Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
66bd76c0-ee97-11e0--242d50cf1fbf: [10.234.119.110]
777ae000-cfd5-11e0--242d50cf1fbf: [10.58.135.19,
 10.48.234.31, 10.224.55.162]
 
 ERROR [HintedHandoff:2] 2011-10-05 18:39:36,896
 AbstractCassandraDaemon.java (line 134) Fatal exception in thread
 Thread[HintedHandoff:2,1,main]
 java.lang.RuntimeException: java.lang.RuntimeException: Could not
 reach schema agreement with /10.234.119.110 in 6ms

TimedOutException and UnavailableException from multiGetSliceQuery

2011-10-05 Thread Yuhan Zhang
Hi all,

I have been experiencing the unavailableException and TimedOutException on a
3-node cassandra cluster
during a multiGetSliceQuery with 1000 columns. Since there are many keys
involved in the query, I divided
them into groups of 5000 rows and process each group individually in a for
loop. but seems like it is not helping..
Once the TimedOutException appears, further requests to cassandra will cause
UnavailableException.
However, the servers can recover after a while without intervention.

Which settings should I pay attention to in order to fix the problem? This
problem becomes very frequent recently.


Thank you.

Yuhan

The exception looks like:

1/10/05 13:05:31 ERROR connection.HConnectionManager: Could not fullfill
request on this host
CassandraClient
11/10/05 13:05:31 ERROR connection.HConnectionManager: Exception:
me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:32)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:161)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:143)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:155)
...
Caused by: TimedOutException()
at
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)


11/10/05 20:06:05 ERROR connection.HConnectionManager: Could not fullfill
request on this host
CassandraClient
11/10/05 20:06:05 ERROR connection.HConnectionManager: Exception:
me.prettyprint.hector.api.exceptions.HUnavailableException:
UnavailableException()
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:50)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)

Caused by: UnavailableException()
at
org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:9620)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:636)
at
org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:608)
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:388)
... 35 more


Re: 0.7.9 RejectedExecutionException

2011-10-05 Thread Ashley Martens
I could be wrong. I just looked the amount of memory being used and it's
huge. WTF?


Re: cassandra-cli: Create column family with composite column name

2011-10-05 Thread aaron morton
Hi Jim, 

The best resource I know so far is 
http://www.slideshare.net/edanuff/indexing-in-cassandra  

I just started working on a blog post about them last night, and I hope to 
update the wiki with some information when I am done. Feel free to mail me 
directly if you want to collaborate. 

I'm none the wiser on the ":" issue, one of the things I was hoping to learn 
about. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/10/2011, at 9:08 AM, Jim Ancona wrote:

> Using Cassandra 0.8.6, I've been trying to figure out how to use the
> CLI to create column families using composite keys and column names.
> The documentation on CompositeType seems pretty skimpy. But in the
> course of writing this email to ask how to do it, I figured out the
> proper syntax. In the hope of making it easier for the next person, I
> repurposed this message to document what I figured out. I'll also
> update the wiki. Here is the syntax:
> 
> create column family MyCF
>with key_validation_class = 'CompositeType(UTF8Type, IntegerType)'
>and comparator = 'CompositeType(DateType(reversed=true), UTF8Type)'
>and default_validation_class='CompositeType(UTF8Type, DateType)'
>and column_metadata=[
>{ column_name:'0:my Column Name', validation_class:LongType,
> index_type:KEYS}
>];
> 
> One weakness of this syntax is that there doesn't seem to be a way to
> escape a ':' in a composite value. There's a FIXME in the code to that
> effect.
> 
> Jim



Re: 0.7.9 RejectedExecutionException

2011-10-05 Thread aaron morton
check this http://wiki.apache.org/cassandra/FAQ#mmap

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/10/2011, at 9:25 AM, Ashley Martens wrote:

> I could be wrong. I just looked the amount of memory being used and it's 
> huge. WTF?



Re: TimedOutException and UnavailableException from multiGetSliceQuery

2011-10-05 Thread aaron morton
5000 rows in a mutli get is way, way, way (did I say way ? ) to many. 

Whenever you get a TimedOutException check the tp stats on the nodes, you will 
normally see a high pending count. Every row get get turns into an message in a 
TP. So if you ask for 5k rows you flood the TP with 5k messages which will 
often result in the node(s) been temporarily overloaded. 

More is not always more. I would guess 100 as a starting point, i would be 
doubtful that you would see much benefit beyond 1000. pycassa defaults to 1024 
https://github.com/pycassa/pycassa/blob/master/pycassa/columnfamily.py#L63 

 Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/10/2011, at 9:14 AM, Yuhan Zhang wrote:

> Hi all,
> 
> I have been experiencing the unavailableException and TimedOutException on a 
> 3-node cassandra cluster
> during a multiGetSliceQuery with 1000 columns. Since there are many keys 
> involved in the query, I divided
> them into groups of 5000 rows and process each group individually in a for 
> loop. but seems like it is not helping..
> Once the TimedOutException appears, further requests to cassandra will cause 
> UnavailableException.
> However, the servers can recover after a while without intervention. 
> 
> Which settings should I pay attention to in order to fix the problem? This 
> problem becomes very frequent recently.
>  
> 
> Thank you.
> 
> Yuhan
> 
> The exception looks like:
> 
> 1/10/05 13:05:31 ERROR connection.HConnectionManager: Could not fullfill 
> request on this host 
> CassandraClient
> 11/10/05 13:05:31 ERROR connection.HConnectionManager: Exception: 
> me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
> at 
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:32)
> at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:161)
> at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:143)
> at 
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
> at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:155)
> ...
> Caused by: TimedOutException()
> at 
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)
> 
> 
> 11/10/05 20:06:05 ERROR connection.HConnectionManager: Could not fullfill 
> request on this host 
> CassandraClient
> 11/10/05 20:06:05 ERROR connection.HConnectionManager: Exception: 
> me.prettyprint.hector.api.exceptions.HUnavailableException: 
> UnavailableException()
> at 
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:50)
> at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397)
> at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383)
> at 
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
> 
> Caused by: UnavailableException()
> at 
> org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:9620)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:636)
> at 
> org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:608)
> at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:388)
> ... 35 more
> 



Re: TimedOutException and UnavailableException from multiGetSliceQuery

2011-10-05 Thread Yuhan Zhang
Hi Aaron,

thanks for the suggestion. It works again after I cut back the # of rows.

On Wed, Oct 5, 2011 at 1:43 PM, aaron morton wrote:

> 5000 rows in a mutli get is way, way, way (did I say way ? ) to many.
>
> Whenever you get a TimedOutException check the tp stats on the nodes, you
> will normally see a high pending count. Every row get get turns into an
> message in a TP. So if you ask for 5k rows you flood the TP with 5k messages
> which will often result in the node(s) been temporarily overloaded.
>
> More is not always more. I would guess 100 as a starting point, i would be
> doubtful that you would see much benefit beyond 1000. pycassa defaults to
> 1024
> https://github.com/pycassa/pycassa/blob/master/pycassa/columnfamily.py#L63
>
>
>  Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/10/2011, at 9:14 AM, Yuhan Zhang wrote:
>
> Hi all,
>
> I have been experiencing the unavailableException and TimedOutException on
> a 3-node cassandra cluster
> during a multiGetSliceQuery with 1000 columns. Since there are many keys
> involved in the query, I divided
> them into groups of 5000 rows and process each group individually in a for
> loop. but seems like it is not helping..
> Once the TimedOutException appears, further requests to cassandra will
> cause UnavailableException.
> However, the servers can recover after a while without intervention.
>
> Which settings should I pay attention to in order to fix the problem? This
> problem becomes very frequent recently.
>
>
> Thank you.
>
> Yuhan
>
> The exception looks like:
>
> 1/10/05 13:05:31 ERROR connection.HConnectionManager: Could not fullfill
> request on this host
> CassandraClient
> 11/10/05 13:05:31 ERROR connection.HConnectionManager: Exception:
> me.prettyprint.hector.api.exceptions.HTimedOutException:
> TimedOutException()
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:32)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:161)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$3.execute(KeyspaceServiceImpl.java:143)
> at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:155)
> ...
> Caused by: TimedOutException()
> at
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)
>
>
> 11/10/05 20:06:05 ERROR connection.HConnectionManager: Could not fullfill
> request on this host
> CassandraClient
> 11/10/05 20:06:05 ERROR connection.HConnectionManager: Exception:
> me.prettyprint.hector.api.exceptions.HUnavailableException:
> UnavailableException()
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:50)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383)
> at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
>
> Caused by: UnavailableException()
> at
> org.apache.cassandra.thrift.Cassandra$multiget_slice_result.read(Cassandra.java:9620)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_multiget_slice(Cassandra.java:636)
> at
> org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:608)
> at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:388)
> ... 35 more
>
>
>


Compaction and total disk space used for highly overwritten CF

2011-10-05 Thread Derek Andree
We have a very hot CF which we use essentially as a durable memory cache for 
our application.  It is about 70MBytes in size after being fully populated.  We 
completely overwrite this entire CF every few minutes (not delete).  Our hope 
was that the CF would stay around 70MB in size, but it grows to multiple 
Gigabytes in size rather quickly (less than an hour).  I've heard that doing 
major compactions using nodetool is no longer recommended, but when we force a 
compaction on this CF using nodetool compact, then perform GC, size on disk 
shrinks to the expected 70MB.

I'm wondering if we are doing something wrong here, we thought we were avoiding 
tombstones since we are just overwriting each column using the same keys.  Is 
the fact that we have to do a GC to get the size on disk to shrink 
significantly a smoking gun that we have a bunch of tombstones?

We've row cached the entire CF to make reads really fast, and writes are 
definitely fast enough, it's this growing disk space that has us concerned.

Here's the output from nodetool cfstats for the CF in question (hrm, I just 
noticed that we still have a key cache for this CF which is rather dumb):

Column Family: Test
SSTable count: 4
Space used (live): 309767193
Space used (total): 926926841
Number of Keys (estimate): 275456
Memtable Columns Count: 37510
Memtable Data Size: 15020598
Memtable Switch Count: 22
Read Count: 4827496
Read Latency: 0.010 ms.
Write Count: 1615946
Write Latency: 0.095 ms.
Pending Tasks: 0
Key cache capacity: 15
Key cache size: 55762
Key cache hit rate: 0.030557854052177317
Row cache capacity: 15
Row cache size: 68752
Row cache hit rate: 1.0
Compacted row minimum size: 925
Compacted row maximum size: 1109
Compacted row mean size: 1109


Any insight appreciated.

Thanks,
-Derek