Re: One node out of three not flushing memtables

2013-09-10 Thread Jan Algermissen

On 10.09.2013, at 02:34, "Laing, Michael"  wrote:

> I have seen something similar.
> 
> Of course correlation is not causation...

Thanks for sharing - interesting. However, I still find it confusing that C* 
does not refuse service befor it dies. Maybe that is a by-product of the SEDA 
architecture, though.

I switched back from hsha to sync and increased memtable max size and heap. 
That did the trick. Now it flies.

Jan


> 
> Like you, doing testing with heavy writes.
> 
> I was using a python client to drive the writes using the cql module which is 
> thrift based.
> 
> The correlation I eventually tracked down was that whichever node my python 
> client(s) connected to eventually ran out of memory because it could not gain 
> enough back by flushing memtables. It was just a matter of time.
> 
> I switched to the new python-driver client and the problem disappeared.
> 
> I have now been able to return almost all parameters to defaults and get out 
> the business of manually managing the JVM heap, to my great relief!
> 
> Currently, I have to retool my test harness as I have been unable to drive 
> C*2.0.0 to destruction (yet).
> 
> Michael
> 
> 
> On Mon, Sep 9, 2013 at 8:11 PM, Jan Algermissen  
> wrote:
> I have a strange pattern: In a cluster with three equally dimensioned and 
> configured nodes I keep loosing one because apparently it fails to flush its 
> memtables:
> 
> http://twitpic.com/dcrtel
> 
> 
> It is a different node every time.
> 
> So far I understand that I should expect to see the chain-saw graph when 
> memtables are build up and then get flushed. But what about that third node? 
> Has anyone seen something similar?
> 
> Jan
> 
> C* dsc 2.0 ,  3x 4GB, 2CPU nodes with heavy writes of 70 col-rows (aprox 10 
> of those rows per wide row)
> 
> I have turned off caches, reduced overall memtable and set flush-wroters to 
> 2,  rpc_reader and writer threads to 1.
> 
> 
> 



Throughput and RAM

2013-09-10 Thread Jan Algermissen
Based on my tuning work with C* over the last days, I guess I reached the 
following insights.

Maybe someone can confirm whether they make sense:

The more heap I give to Cassandra (up to the GC tipping point of ~8GB) the more 
writes it can accumulate in memtables before doing IO.

The more writes are accumulated in memtables, the closer the IO gets towards 
the maximum possible IO throughput (because there will be fewer writes of 
larger sstables).

So in a sense, C* is designed to maximize IO write efficiency by pre-organizing 
write queries in memory. The more memory, the better the organization works 
(caveat GC).

Cassandra takes this eagerness for consuming writes and organizing the writes 
in memory to such an extreme, that any given node will rather die than stop 
consuming writes.

Especially I am looking a confirmation of the last one.

Jan

cassandra hi bandwith

2013-09-10 Thread Nikolay Mihaylov
Hi,

we have cassandra 1.2.6, single node.

we have a website there, running on different server.

recently we noticed that we have 40 MBit traffic from cassandra server to
the web server

we use phpcassa.

on ops center we have "KeyCache Hits" value around 2000 .

I found the most used CF's from nodetool, but I do not think the traffic
problem is there.

Is there a way I can find what are those keycache hits?

Nick.


Composite Column Grouping

2013-09-10 Thread Ravikumar Govindarajan
I have been faced with a problem of grouping composites on the second-part.

Lets say my CF contains this


TimeSeriesCF
   key:UserID
   composite-col-name:TimeUUID:PKID

Some sample data

UserID = XYZ
 Time:PKID
   Col-Name1 = 200:1000
   Col-Name2 = 201:1001
   Col-Name3 = 202:1000
   Col-Name4 = 203:1000
   Col-Name5 = 204:1002

Whenever a time-series query is issued, it should return the following in
time-desc order.

UserID = XYZ
  Col-Name5 = 204:1002
  Col-Name4 = 203:1000
  Col-Name2 = 201:1001

Is something like this possible in Cassandra? Is there a different way to
design and achieve the same objective?

--
Ravi


Re: Streaming never completes during nodetool rebuild

2013-09-10 Thread Paulo Motta
Thanks for the reply Robert!

Actually increasing the property "streaming_socket_timeout_in_ms" fixed the
problem. :)

It seems 60 seconds is a too low value for this property for inter-region
streaming of very large files.

I increased it to 600 seconds, but a lower value should be enough.


2013/9/9 Robert Coli 

> On Mon, Sep 9, 2013 at 12:28 PM, Paulo Motta wrote:
>
>> I've been trying to add a new data center to our Cassandra 1.1.10 cluster
>> for the last few days, but I've been unable to successfully rebuild the
>> nodes on the new DC due to streaming problems.
>>
>
> There are some upstream streaming fixes in 1.2. However, I do not know
> whether they would help in this case. A brief glance at the CHANGES.txt is
> not suggestive.
>
>  Unfortunately the only solution to hung streaming is to restart the
> affected nodes.
>
> https://issues.apache.org/jira/browse/CASSANDRA-3486
> https://issues.apache.org/jira/browse/CASSANDRA-5286
>
> =Rob
>



-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior Técnico - IST*
*http://paulormg.com*


cassandra error on restart

2013-09-10 Thread Langston, Jim
Hi all,

I restarted my cassandra ring this morning, but it is refusing to
start. Everything was fine, but now I get this error in the log:

….
 INFO 14:05:14,420 Compacting 
[SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-20-Data.db'),
 
SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-21-Data.db'),
 
SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-23-Data.db'),
 
SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-22-Data.db')]
 INFO 14:05:14,493 Compacted 4 sstables to 
[/raid0/cassandra/data/system/local/system-local-ic-24,].  1,086 bytes to 486 
(~44% of original) in 66ms = 0.007023MB/s.  4 total rows, 1 unique.  Row merge 
counts were {1:0, 2:0, 3:0, 4:1, }
 INFO 14:05:14,543 Starting Messaging Service on port 7000
java.lang.NullPointerException
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:745)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:554)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:451)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)
Cannot load daemon


and cassandra will not start. I get the same error on all the nodes in the ring.

Thoughts?

Thanks,

Jim


Re: cassandra error on restart

2013-09-10 Thread Mina Naguib

There was mention of a similar crash on the mailing list.  Does this apply to 
your case ?

http://mail-archives.apache.org/mod_mbox/cassandra-user/201306.mbox/%3ccdecfcfa.11e95%25agundabatt...@threatmetrix.com%3E


--
Mina Naguib
AdGear Technologies Inc.
http://adgear.com/

On 2013-09-10, at 10:09 AM, "Langston, Jim"  wrote:

> Hi all,
> 
> I restarted my cassandra ring this morning, but it is refusing to
> start. Everything was fine, but now I get this error in the log:
> 
> ….
>  INFO 14:05:14,420 Compacting 
> [SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-20-Data.db'),
>  
> SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-21-Data.db'),
>  
> SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-23-Data.db'),
>  
> SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-22-Data.db')]
>  INFO 14:05:14,493 Compacted 4 sstables to 
> [/raid0/cassandra/data/system/local/system-local-ic-24,].  1,086 bytes to 486 
> (~44% of original) in 66ms = 0.007023MB/s.  4 total rows, 1 unique.  Row 
> merge counts were {1:0, 2:0, 3:0, 4:1, }
>  INFO 14:05:14,543 Starting Messaging Service on port 7000
> java.lang.NullPointerException
> at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:745)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:554)
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:451)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
> at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)
> Cannot load daemon
> 
> 
> and cassandra will not start. I get the same error on all the nodes in the 
> ring.
> 
> Thoughts?
> 
> Thanks,
> 
> Jim



Re: cassandra error on restart

2013-09-10 Thread Langston, Jim
Thanks Mina,

That was it exactly …

Jim

From: Mina Naguib mailto:mina.nag...@adgear.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Tue, 10 Sep 2013 10:16:17 -0400
To: mailto:user@cassandra.apache.org>>
Subject: Re: cassandra error on restart


There was mention of a similar crash on the mailing list.  Does this apply to 
your case ?

http://mail-archives.apache.org/mod_mbox/cassandra-user/201306.mbox/%3ccdecfcfa.11e95%25agundabatt...@threatmetrix.com%3E


--
Mina Naguib
AdGear Technologies Inc.
http://adgear.com/

On 2013-09-10, at 10:09 AM, "Langston, Jim" 
mailto:jim.langs...@compuware.com>> wrote:

Hi all,

I restarted my cassandra ring this morning, but it is refusing to
start. Everything was fine, but now I get this error in the log:

….
 INFO 14:05:14,420 Compacting 
[SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-20-Data.db'),
 
SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-21-Data.db'),
 
SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-23-Data.db'),
 
SSTableReader(path='/raid0/cassandra/data/system/local/system-local-ic-22-Data.db')]
 INFO 14:05:14,493 Compacted 4 sstables to 
[/raid0/cassandra/data/system/local/system-local-ic-24,].  1,086 bytes to 486 
(~44% of original) in 66ms = 0.007023MB/s.  4 total rows, 1 unique.  Row merge 
counts were {1:0, 2:0, 3:0, 4:1, }
 INFO 14:05:14,543 Starting Messaging Service on port 7000
java.lang.NullPointerException
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:745)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:554)
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:451)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)
Cannot load daemon


and cassandra will not start. I get the same error on all the nodes in the ring.

Thoughts?

Thanks,

Jim



Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-10 Thread Robert Coli
On Tue, Sep 10, 2013 at 10:17 AM, Robert Coli  wrote:

> On Tue, Sep 10, 2013 at 7:55 AM, Keith Freeman <8fo...@gmail.com> wrote:
>
>> On my 3-node cluster (v1.2.8) with 4-cores each and SSDs for commitlog
>> and data
>
>
BTW, is RF=3? If so, you effectively have a 1 node cluster while writing.

=Rob


Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-10 Thread Robert Coli
On Tue, Sep 10, 2013 at 7:55 AM, Keith Freeman <8fo...@gmail.com> wrote:

> On my 3-node cluster (v1.2.8) with 4-cores each and SSDs for commitlog and
> data


On SSD, you don't need to separate commitlog and data. You only win from
this separation if you have a head to not-move between appends to the
commit log. You will get better IO from a strip with an additional SSD.


> Pool NameActive   Pending  Completed   Blocked
>>  All time blocked
>> MutationStage 1 9 290394 0
>>   0
>> FlushWriter   1 2 20 0
>>   0
>>
>

> I can't seem find information about the real meaning of MutationStage, is
> this just normal for lots of inserts?
>

The mutation stage is the stage in which mutations to rows in memtables
("writes") occur.

The FlushWriter stage is the stage that turns memtables into SSTables by
flushing them.

However, 9 pending mutations is a very small number. For reference on an
overloaded cluster which was being written to death I recently saw
1216434 pending MutationStage. What problem other than "high CPU load" are
you experiencing? 2 Pending FlushWriters is slightly suggestive of some
sort of bound related to flushing..


> Also, switching from spinning disks to SSDs didn't seem to significantly
> improve insert performance, so it seems clear my use-case it totally
> CPU-bound.  Cassandra docs say "Insert-heavy workloads are CPU-bound in
> Cassandra before becoming memory-bound.", so I guess that's what I'm
> seeing, but there's no explanation. So I'm wonder what's overloading my
> CPUs, and is there anything I can do about it short of adding more nodes?
>

Insert performance is pretty optimized from an I/O perspective. There is
probably not too much you can do. You can disable durability guarantees if
you truly require insert performance at all costs.

That said, the percentage of people running Cassandra on SSDs is still
relatively low. It is likely that performance improvements wrt CPU usage
are possible.

=Rob


FileNotFoundException while inserting (1.2.8)

2013-09-10 Thread Keith Freeman
While running a heavy insert load, one of my nodes started throwing this 
exception when trying a compaction:


 INFO [CompactionExecutor:23] 2013-09-09 16:08:07,528 
CompactionTask.java (line 105) Compacting [SSTableReader(p
ath='/var/lib/cassandra/data/smdb/tracedata/smdb-tracedata-ic-6-Data.db'), 
SSTableReader(path='/var/lib/cassandr
a/data/smdb/tracedata/smdb-tracedata-ic-5-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/smdb/tracedata/
smdb-tracedata-ic-1-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/smdb/tracedata/smdb-tracedata-ic-4-Da
ta.db'), 
SSTableReader(path='/var/lib/cassandra/data/smdb/tracedata/smdb-tracedata-ic-2-Data.db')]
ERROR [CompactionExecutor:23] 2013-09-09 16:08:07,611 
CassandraDaemon.java (line 192) Exception in thread Thread

[CompactionExecutor:23,1,main]
java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/lib/cassandra/data/smdb/tracedata/smdb-tracedata

-ic-5-Data.db (No such file or directory)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1194)
at 
org.apache.cassandra.io.sstable.SSTableScanner.(SSTableScanner.java:54)
at 
org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:1014)
at 
org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:1026)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.

java:157)
at 
org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:163)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:117)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: 
/var/lib/cassandra/data/smdb/tracedata/smdb-tracedata-ic-5-Data.db (No 
such file or directory)

at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:216)
at 
org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:67)
at 
org.apache.cassandra.io.util.ThrottledReader.(ThrottledReader.java:35)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:49)

... 18 more

This shows up many times in the log.  I figured running a repair on the 
node might fix it, but the repair ran for over an hour (the node only 
has about 3G of data), so I figured it was hung.  I tried restarting the 
repair, but each time it starts the node logs that same exception 
immediately:


 INFO [AntiEntropySessions:5] 2013-09-10 09:36:35,526 
AntiEntropyService.java (line 651) [repair #c6ab9c00-1a2e-
11e3-b0e5-05d1729cecff] new session: will sync /192.168.27.73, 
/192.168.27.75 on range (4925454539472655923,4991

066214171147775] for smdb.[tracedata, processors]
 INFO [AntiEntropySessions:5] 2013-09-10 09:36:35,526 
AntiEntropyService.java (line 857) [repair #c6ab9c00-1a2e-
11e3-b0e5-05d1729cecff] requesting merkle trees for tracedata (to 
[/192.168.27.75, /192.168.27.73])
ERROR [ValidationExecutor:2] 2013-09-10 09:36:35,535 
CassandraDaemon.java (line 192) Exception in thread Thread[

ValidationExecutor:2,1,main]
java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/lib/cassandra/data/smdb/tracedata/smdb-tracedata

-ic-5-Data.db (No such file or directory)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1194)

...

How can I fix this node?





cluster rename ?

2013-09-10 Thread Langston, Jim
Hi all,

Following these instructions:

http://comments.gmane.org/gmane.comp.db.cassandra.user/29753


I am trying to change the name of the cluster, but I'm getting an error:

ERROR [main] 2013-09-10 17:52:43,250 CassandraDaemon.java (line 247) Fatal 
exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name 
tmpCassandra != configured name cassandra
at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:450)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:243)
at org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)


For step 3 in the instructions, I moved LocationInfo located in the system 
keyspace to another
directory and when I try to restart the node, the directory is re-created, but 
still get the error.

I'm running 1.2.8, is this still the correct system keyspace to move.

Also, although not indicated as a must do – in step 2, I did a drain, but did 
not remove the commitlog's

If I name the cluster back to its original name, the cluster will come back up 
without any problems.


Jim


Leveled Compaction resetting tool in 2.0

2013-09-10 Thread Nate McCall
LCS fragmentation comes up a lot here and this issue caught a lot of us on
IRC by surprise so I'm going to pass it on here:

https://issues.apache.org/jira/browse/CASSANDRA-5271

See this thread for additional context:

http://www.mail-archive.com/user@cassandra.apache.org/msg31416.html


Re: Throughput and RAM

2013-09-10 Thread Robert Coli
On Tue, Sep 10, 2013 at 2:30 AM, Jan Algermissen  wrote:

> So in a sense, C* is designed to maximize IO write efficiency by
> pre-organizing write queries in memory. The more memory, the better the
> organization works (caveat GC).
>

http://en.wikipedia.org/wiki/Log-structured_merge-tree
"
The LSM-tree is a hybrid data structure. It is composed of two
tree-like
structures,
known as the C0 and C1 components. C0 is smaller and entirely resident in
memory, whereas C1 is resident on disk. New records are inserted into the
memory-resident C0 component. If the insertion causes the C0 component to
exceed a certain size threshold, a contiguous segment of entries is removed
from C0 and merged into C1 on disk. The performance characteristics of
LSM-trees stem for the fact that each component is tuned to the
characteristics of its underlying storage medium, and that data is
efficiently migrated across media in rolling batches, using an algorithm
reminiscent of merge sort .
"

Cassandra takes this eagerness for consuming writes and organizing the
> writes in memory to such an extreme, that any given node will rather die
> than stop consuming writes.
>

Perhaps more simply : "RAM is faster than disk" and "Cassandra does not
prevent a given node from writing to RAM faster than it can flush to disk"?

=Rob


Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-10 Thread Keith Freeman


On 09/10/2013 11:17 AM, Robert Coli wrote:
On Tue, Sep 10, 2013 at 7:55 AM, Keith Freeman <8fo...@gmail.com 
> wrote:


On my 3-node cluster (v1.2.8) with 4-cores each and SSDs for
commitlog and data


On SSD, you don't need to separate commitlog and data. You only win 
from this separation if you have a head to not-move between appends to 
the commit log. You will get better IO from a strip with an additional 
SSD.
Right, actually both partitions are on the same SSD.   Assuming you 
meant "stripe", would that really make a difference


Pool NameActive   Pending  Completed  
Blocked  All time blocked

MutationStage 1 9 290394 0
0
FlushWriter   1 2   20 0  
  0


I can't seem find information about the real meaning of
MutationStage, is this just normal for lots of inserts?


The mutation stage is the stage in which mutations to rows in 
memtables ("writes") occur.


The FlushWriter stage is the stage that turns memtables into SSTables 
by flushing them.


However, 9 pending mutations is a very small number. For reference on 
an overloaded cluster which was being written to death I recently 
saw 1216434 pending MutationStage. What problem other than "high 
CPU load" are you experiencing? 2 Pending FlushWriters is slightly 
suggestive of some sort of bound related to flushing..
So the basic problem is that write performance is lower than I 
expected.  I can't get sustained writing of 5000 ~1024-byte records / 
sec at RF=2 on a good 3-node cluster, and my only guess is that's 
because of the heavy CPU loads on the server (loads over 10 on 4-CPU 
systems).  I've tried both a single client writing 5000 rows/second and 
2 clients (on separate boxes) writing 2500 rows/second, and in both 
cases the server(s) doesn't respond quickly enough to maintain that 
rate.  It keeps up ok with 2000 or 3000 rows per second (and has lower 
server loads).





Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-10 Thread Nate McCall
With SSDs, you can turn up memtable_flush_writers - try 3 initially (1 by
default) and see what happens. However, given that there are no entries in
'All time blocked' for such, they may be something else.

How are you inserting the data?


On Tue, Sep 10, 2013 at 12:40 PM, Keith Freeman <8fo...@gmail.com> wrote:

>
> On 09/10/2013 11:17 AM, Robert Coli wrote:
>
> On Tue, Sep 10, 2013 at 7:55 AM, Keith Freeman <8fo...@gmail.com> wrote:
>
>> On my 3-node cluster (v1.2.8) with 4-cores each and SSDs for commitlog
>> and data
>
>
>  On SSD, you don't need to separate commitlog and data. You only win from
> this separation if you have a head to not-move between appends to the
> commit log. You will get better IO from a strip with an additional SSD.
>
> Right, actually both partitions are on the same SSD.   Assuming you meant
> "stripe", would that really make a difference
>
>
>
>>  Pool NameActive   Pending  Completed   Blocked
>>>  All time blocked
>>> MutationStage 1 9 290394 0
>>>   0
>>> FlushWriter   1 2 20 0
>>>   0
>>>
>>
>
>>  I can't seem find information about the real meaning of MutationStage,
>> is this just normal for lots of inserts?
>>
>
>  The mutation stage is the stage in which mutations to rows in memtables
> ("writes") occur.
>
>  The FlushWriter stage is the stage that turns memtables into SSTables by
> flushing them.
>
>  However, 9 pending mutations is a very small number. For reference on an
> overloaded cluster which was being written to death I recently saw
> 1216434 pending MutationStage. What problem other than "high CPU load" are
> you experiencing? 2 Pending FlushWriters is slightly suggestive of some
> sort of bound related to flushing..
>
> So the basic problem is that write performance is lower than I expected.
> I can't get sustained writing of 5000 ~1024-byte records / sec at RF=2 on a
> good 3-node cluster, and my only guess is that's because of the heavy CPU
> loads on the server (loads over 10 on 4-CPU systems).  I've tried both a
> single client writing 5000 rows/second and 2 clients (on separate boxes)
> writing 2500 rows/second, and in both cases the server(s) doesn't respond
> quickly enough to maintain that rate.  It keeps up ok with 2000 or 3000
> rows per second (and has lower server loads).
>
>
>


heavy insert load overloads CPUs, with MutationStage pending

2013-09-10 Thread Keith Freeman
On my 3-node cluster (v1.2.8) with 4-cores each and SSDs for commitlog 
and data, I get high CPU loads during a heavy-ish wide-row insert load 
into a single CF (5000 1k inserts/sec), e.g. uptime load avg for last 
minute 18/11/10.  Checking tpstats, I see MutationStage pending on all 
the nodes, e.g.:



Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 0 0144 0  
   0
RequestResponseStage  0 0 243529 0  
   0
MutationStage 1 9 290394 0  
   0
ReadRepairStage   0 0  0 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 0   1014 0  
   0
AntiEntropyStage  0 0  0 0  
   0
MigrationStage0 0 13 0  
   0
MemtablePostFlusher   1 2 35 0  
   0
FlushWriter   1 2 20 0  
   0
MiscStage 0 0  1 0  
   0
commitlog_archiver0 0  0 0  
   0


I can't seem find information about the real meaning of MutationStage, 
is this just normal for lots of inserts?


Also, switching from spinning disks to SSDs didn't seem to significantly 
improve insert performance, so it seems clear my use-case it totally 
CPU-bound.  Cassandra docs say "Insert-heavy workloads are CPU-bound in 
Cassandra before becoming memory-bound.", so I guess that's what I'm 
seeing, but there's no explanation. So I'm wonder what's overloading my 
CPUs, and is there anything I can do about it short of adding more nodes?


.



Re: making sure 1 copy per availability zone(rack) using EC2Snitch

2013-09-10 Thread Robert Coli
On Mon, Sep 9, 2013 at 11:21 AM, rash aroskar wrote:

> Are you suggesting deploying 1.2.9 only if using Cassandra "DC" outside of
> EC2 or if I wish to use rack replication at all?
>

1) use 1.2.9 no matter what, instead of 1.2.5
2) if only *ever* will have clusters in EC2, EC2Snitch is fine, but read
and understand CASSANDRA-3810, especially if not using vnodes
3) if *ever* may have clusters outside of EC2 + inside EC2, use
GossipingPropertyFileSnitch
4) if using vnodes, just create a cluster out of hosts with 50% in each AZ
and you should be all set.

=Rob


Re: Throughput and RAM

2013-09-10 Thread Jan Algermissen

On 10.09.2013, at 19:37, Robert Coli  wrote:

> "Cassandra does not prevent a given node from writing to RAM faster than it 
> can flush to disk"? 

Yes, that is what I meant.

What remains unclear to me is what the oprational strategy is towards handling 
an increase in writes or peaks.

Seems to be: "wait until nodes die and then add capacity".

I guess what I am looking for is the switch so that *I* can tell C* not to 
write more to RAM than it is able to flush.

I have a hunch that coordinators pile up incoming requests and that the memory 
used by them causes the node to stop flushing completely.

I tried to reduce rpc connections and/or reduce write timeouts but both hadd no 
effect.

Can anybody provide a direction in which to look?

This image ( http://twitpic.com/dcwlmn)  shows the typical situation for me, no 
matter what switches I work with. There is always this segment of an arc which 
shows  the increasing unflushed memtables.

Jan

Re: FileNotFoundException while inserting (1.2.8)

2013-09-10 Thread sankalp kohli
Have you dropped and recreated a keyspace with the same name recently?


On Tue, Sep 10, 2013 at 8:40 AM, Keith Freeman <8fo...@gmail.com> wrote:

> While running a heavy insert load, one of my nodes started throwing this
> exception when trying a compaction:
>
>  INFO [CompactionExecutor:23] 2013-09-09 16:08:07,528 CompactionTask.java
> (line 105) Compacting [SSTableReader(p
> ath='/var/lib/cassandra/data/**smdb/tracedata/smdb-tracedata-**ic-6-Data.db'),
> SSTableReader(path='/var/lib/**cassandr
> a/data/smdb/tracedata/smdb-**tracedata-ic-5-Data.db'),
> SSTableReader(path='/var/lib/**cassandra/data/smdb/tracedata/
> smdb-tracedata-ic-1-Data.db'), SSTableReader(path='/var/lib/**
> cassandra/data/smdb/tracedata/**smdb-tracedata-ic-4-Da
> ta.db'), SSTableReader(path='/var/lib/**cassandra/data/smdb/tracedata/**
> smdb-tracedata-ic-2-Data.db')]
> ERROR [CompactionExecutor:23] 2013-09-09 16:08:07,611 CassandraDaemon.java
> (line 192) Exception in thread Thread
> [CompactionExecutor:23,1,main]
> java.lang.RuntimeException: java.io.FileNotFoundException:
> /var/lib/cassandra/data/smdb/**tracedata/smdb-tracedata
> -ic-5-Data.db (No such file or directory)
> at org.apache.cassandra.io.util.**ThrottledReader.open(**
> ThrottledReader.java:53)
> at org.apache.cassandra.io.**sstable.SSTableReader.**
> openDataReader(SSTableReader.**java:1194)
> at org.apache.cassandra.io.**sstable.SSTableScanner.(**
> SSTableScanner.java:54)
> at org.apache.cassandra.io.**sstable.SSTableReader.**
> getDirectScanner(**SSTableReader.java:1014)
> at org.apache.cassandra.io.**sstable.SSTableReader.**
> getDirectScanner(**SSTableReader.java:1026)
> at org.apache.cassandra.db.**compaction.**
> AbstractCompactionStrategy.**getScanners(**AbstractCompactionStrategy.
> java:157)
> at org.apache.cassandra.db.**compaction.**
> AbstractCompactionStrategy.**getScanners(**AbstractCompactionStrategy.**
> java:163)
> at org.apache.cassandra.db.**compaction.CompactionTask.**
> runWith(CompactionTask.java:**117)
> at org.apache.cassandra.io.util.**DiskAwareRunnable.runMayThrow(**
> DiskAwareRunnable.java:48)
> at org.apache.cassandra.utils.**WrappedRunnable.run(**
> WrappedRunnable.java:28)
> at org.apache.cassandra.db.**compaction.CompactionTask.**
> executeInternal(**CompactionTask.java:58)
> at org.apache.cassandra.db.**compaction.**AbstractCompactionTask.*
> *execute(**AbstractCompactionTask.java:**60)
> at org.apache.cassandra.db.**compaction.CompactionManager$**
> BackgroundCompactionTask.run(**CompactionManager.java:211)
> at java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:439)
> at java.util.concurrent.**FutureTask$Sync.innerRun(**
> FutureTask.java:303)
> at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
> at java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:895)
> at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.**java:662)
> Caused by: java.io.FileNotFoundException: /var/lib/cassandra/data/smdb/**
> tracedata/smdb-tracedata-ic-5-**Data.db (No such file or directory)
> at java.io.RandomAccessFile.open(**Native Method)
> at java.io.RandomAccessFile.<**init>(RandomAccessFile.java:**216)
> at org.apache.cassandra.io.util.**RandomAccessReader.(**
> RandomAccessReader.java:67)
> at org.apache.cassandra.io.util.**ThrottledReader.(**
> ThrottledReader.java:35)
> at org.apache.cassandra.io.util.**ThrottledReader.open(**
> ThrottledReader.java:49)
> ... 18 more
>
> This shows up many times in the log.  I figured running a repair on the
> node might fix it, but the repair ran for over an hour (the node only has
> about 3G of data), so I figured it was hung.  I tried restarting the
> repair, but each time it starts the node logs that same exception
> immediately:
>
>  INFO [AntiEntropySessions:5] 2013-09-10 09:36:35,526
> AntiEntropyService.java (line 651) [repair #c6ab9c00-1a2e-
> 11e3-b0e5-05d1729cecff] new session: will sync /192.168.27.73, /
> 192.168.27.75 on range (4925454539472655923,4991
> 066214171147775] for smdb.[tracedata, processors]
>  INFO [AntiEntropySessions:5] 2013-09-10 09:36:35,526
> AntiEntropyService.java (line 857) [repair #c6ab9c00-1a2e-
> 11e3-b0e5-05d1729cecff] requesting merkle trees for tracedata (to [/
> 192.168.27.75, /192.168.27.73])
> ERROR [ValidationExecutor:2] 2013-09-10 09:36:35,535 CassandraDaemon.java
> (line 192) Exception in thread Thread[
> ValidationExecutor:2,1,main]
> java.lang.RuntimeException: java.io.FileNotFoundException:
> /var/lib/cassandra/data/smdb/**tracedata/smdb-tracedata
> -ic-5-Data.db (No such file or directory)
> at org.apache.cassandra.io.util.**ThrottledReader.open(**
> ThrottledReader.java:53)
> at org.apache.cassandra.io.**sstable.

Re: Composite Column Grouping

2013-09-10 Thread Laing, Michael
You could try this. C* doesn't do it all for you, but it will efficiently
get you the right data.

-ml

-- put this in  and run using 'cqlsh -f 

DROP KEYSPACE latest;

CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};

USE latest;

CREATE TABLE time_series (
userid text,
pkid text,
colname map,
PRIMARY KEY (userid, pkid)
);

UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE
userid = 'XYZ' AND pkid = '1000';
UPDATE time_series SET colname = colname +
{'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001';
UPDATE time_series SET colname = colname +
{'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000';
UPDATE time_series SET colname = colname +
{'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000';
UPDATE time_series SET colname = colname +
{'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002';

SELECT * FROM time_series WHERE userid = 'XYZ';

-- returns:
-- userid | pkid | colname
--+--+-
--XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203':
'Col-Name-4'}
--XYZ | 1001 |   {'201':
'Col-Name-2'}
--XYZ | 1002 |   {'204':
'Col-Name-5'}

-- use an app to pop off the latest key/value from the map for each row,
then sort by key desc.


On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <
ravikumar.govindara...@gmail.com> wrote:

> I have been faced with a problem of grouping composites on the second-part.
>
> Lets say my CF contains this
>
>
> TimeSeriesCF
>key:UserID
>composite-col-name:TimeUUID:PKID
>
> Some sample data
>
> UserID = XYZ
>  Time:PKID
>Col-Name1 = 200:1000
>Col-Name2 = 201:1001
>Col-Name3 = 202:1000
>Col-Name4 = 203:1000
>Col-Name5 = 204:1002
>
> Whenever a time-series query is issued, it should return the following in
> time-desc order.
>
> UserID = XYZ
>   Col-Name5 = 204:1002
>   Col-Name4 = 203:1000
>   Col-Name2 = 201:1001
>
> Is something like this possible in Cassandra? Is there a different way to
> design and achieve the same objective?
>
> --
> Ravi
>
>


Cassandra input paging for Hadoop

2013-09-10 Thread Renat Gilfanov
 Hi,

We have Hadoop jobs that read data from our Cassandra column families and write 
some data back to another column families.
The input column families are pretty simple CQL3 tables without wide rows.
In Hadoop jobs we set up corresponding WHERE clause in 
ConfigHelper.setInputWhereClauses(...), so we don't process the whole table at 
once. 
Never  the less, sometimes the amount of data returned by input query is big  
enough to cause TimedOutExceptions.

To mitigate this, I'd like to configure Hadoop job in a such way that it 
sequentially fetches input rows by smaller portions.

I'm looking at the ConfigHelper.setRangeBatchSize() and 
CqlConfigHelper.setInputCQLPageRowSize() methods, but a bit confused if that's 
what I need and if yes, which one should I use for those purposes.

Any help is appreciated.

Hadoop version is 1.1.2, Cassandra version is 1.2.8.

Re: cluster rename ?

2013-09-10 Thread Robert Coli
On Tue, Sep 10, 2013 at 11:03 AM, Langston, Jim
wrote:

>  http://comments.gmane.org/gmane.comp.db.cassandra.user/29753
>

For step 3 in the instructions, I moved LocationInfo located in the system
> keyspace to another
> directory and when I try to restart the node, the directory is re-created,
> but still get the error.
>

As of 1.2.x, this information is now kept in a different place.

It is now kept in a column called "cluster_name" in a row with key "local"
in a CF called "local" in the keyspace "system".

You have two options :

1) change this key on all nodes
2) do a rolling restart
3) realize you will have two partial clusters until you complete your
rolling restart, with attendant consequences

This should work because the cluster name check only occurs at node start
time.

... or you can ...

1) stop all nodes
2) move the system keyspace aside entirely
3) start nodes in such a way that your cluster re-coalesces (start seeds
first, etc.)
4) reload your schema

The latter is probably what I'd do. If you do either, please respond on
thread and let us know your results.

=Rob
PS - Prompted by this thread and my dissatisfaction with either of the
above workarounds, I have created the following feature request :

https://issues.apache.org/jira/browse/CASSANDRA-5997 : -D option to change
cluster name


Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-10 Thread sankalp kohli
What have you set these to?
# commitlog_sync may be either "periodic" or "batch."
# When in batch mode, Cassandra won't ack writes until the commit log
# has been fsynced to disk.  It will wait up to
# commitlog_sync_batch_window_in_ms milliseconds for other writes, before
# performing the sync.
#
# commitlog_sync: batch
# commitlog_sync_batch_window_in_ms: 50
#
# the other option is "periodic" where writes may be acked immediately
# and the CommitLog is simply synced every commitlog_sync_period_in_ms
# milliseconds.
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1000


On Tue, Sep 10, 2013 at 10:42 AM, Nate McCall wrote:

> With SSDs, you can turn up memtable_flush_writers - try 3 initially (1 by
> default) and see what happens. However, given that there are no entries in
> 'All time blocked' for such, they may be something else.
>
> How are you inserting the data?
>
>
> On Tue, Sep 10, 2013 at 12:40 PM, Keith Freeman <8fo...@gmail.com> wrote:
>
>>
>> On 09/10/2013 11:17 AM, Robert Coli wrote:
>>
>> On Tue, Sep 10, 2013 at 7:55 AM, Keith Freeman <8fo...@gmail.com> wrote:
>>
>>> On my 3-node cluster (v1.2.8) with 4-cores each and SSDs for commitlog
>>> and data
>>
>>
>>  On SSD, you don't need to separate commitlog and data. You only win
>> from this separation if you have a head to not-move between appends to the
>> commit log. You will get better IO from a strip with an additional SSD.
>>
>> Right, actually both partitions are on the same SSD.   Assuming you meant
>> "stripe", would that really make a difference
>>
>>
>>
>>>  Pool NameActive   Pending  Completed   Blocked
  All time blocked
 MutationStage 1 9 290394 0
 0
 FlushWriter   1 2 20 0
 0

>>>
>>
>>>  I can't seem find information about the real meaning of MutationStage,
>>> is this just normal for lots of inserts?
>>>
>>
>>  The mutation stage is the stage in which mutations to rows in memtables
>> ("writes") occur.
>>
>>  The FlushWriter stage is the stage that turns memtables into SSTables
>> by flushing them.
>>
>>  However, 9 pending mutations is a very small number. For reference on
>> an overloaded cluster which was being written to death I recently saw
>> 1216434 pending MutationStage. What problem other than "high CPU load" are
>> you experiencing? 2 Pending FlushWriters is slightly suggestive of some
>> sort of bound related to flushing..
>>
>> So the basic problem is that write performance is lower than I expected.
>> I can't get sustained writing of 5000 ~1024-byte records / sec at RF=2 on a
>> good 3-node cluster, and my only guess is that's because of the heavy CPU
>> loads on the server (loads over 10 on 4-CPU systems).  I've tried both a
>> single client writing 5000 rows/second and 2 clients (on separate boxes)
>> writing 2500 rows/second, and in both cases the server(s) doesn't respond
>> quickly enough to maintain that rate.  It keeps up ok with 2000 or 3000
>> rows per second (and has lower server loads).
>>
>>
>>
>


Re: Composite Column Grouping

2013-09-10 Thread Laing, Michael
If you have set up the table as described in my previous message, you could
run this python snippet to return the desired result:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
logging.basicConfig()

from operator import itemgetter

import cassandra
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

cql_cluster = Cluster()
cql_session = cql_cluster.connect()
cql_session.set_keyspace('latest')

select_stmt = "select * from time_series where userid = 'XYZ'"
query = SimpleStatement(select_stmt)
rows = cql_session.execute(query)

results = []
for row in rows:
max_time = max(row.colname.keys())
results.append((row.userid, row.pkid, max_time, row.colname[max_time]))

sorted_results = sorted(results, key=itemgetter(2), reverse=True)
for result in sorted_results: print result

# prints:

# (u'XYZ', u'1002', u'204', u'Col-Name-5')
# (u'XYZ', u'1000', u'203', u'Col-Name-4')
# (u'XYZ', u'1001', u'201', u'Col-Name-2')



On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael
wrote:

> You could try this. C* doesn't do it all for you, but it will efficiently
> get you the right data.
>
> -ml
>
> -- put this in  and run using 'cqlsh -f 
>
> DROP KEYSPACE latest;
>
> CREATE KEYSPACE latest WITH replication = {
> 'class': 'SimpleStrategy',
> 'replication_factor' : 1
> };
>
> USE latest;
>
> CREATE TABLE time_series (
> userid text,
> pkid text,
> colname map,
> PRIMARY KEY (userid, pkid)
> );
>
> UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE
> userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001';
> UPDATE time_series SET colname = colname +
> {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000';
> UPDATE time_series SET colname = colname +
> {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002';
>
> SELECT * FROM time_series WHERE userid = 'XYZ';
>
> -- returns:
> -- userid | pkid | colname
>
> --+--+-
> --XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203':
> 'Col-Name-4'}
> --XYZ | 1001 |   {'201':
> 'Col-Name-2'}
> --XYZ | 1002 |   {'204':
> 'Col-Name-5'}
>
> -- use an app to pop off the latest key/value from the map for each row,
> then sort by key desc.
>
>
> On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
>> I have been faced with a problem of grouping composites on the
>> second-part.
>>
>> Lets say my CF contains this
>>
>>
>> TimeSeriesCF
>>key:UserID
>>composite-col-name:TimeUUID:PKID
>>
>> Some sample data
>>
>> UserID = XYZ
>>  Time:PKID
>>Col-Name1 = 200:1000
>>Col-Name2 = 201:1001
>>Col-Name3 = 202:1000
>>Col-Name4 = 203:1000
>>Col-Name5 = 204:1002
>>
>> Whenever a time-series query is issued, it should return the following in
>> time-desc order.
>>
>> UserID = XYZ
>>   Col-Name5 = 204:1002
>>   Col-Name4 = 203:1000
>>   Col-Name2 = 201:1001
>>
>> Is something like this possible in Cassandra? Is there a different way to
>> design and achieve the same objective?
>>
>> --
>> Ravi
>>
>>
>
>


Re: Long running nodetool move operation

2013-09-10 Thread Ike Walker
Below is the output of "nodetool netstats".

I've never run that before, but from what I can read it shows no incoming 
streams, and a bunch of outgoing streams to two other nodes, all at 0%.

I'll try the restart.

Thanks.

nodetool netstats
Mode: MOVING
Streaming to: /10.xxx.xx.xx

...
Streaming to: /10.xxx.xx.xxx

...
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0  243401039
Responses   n/a 0  295522535

On Sep 9, 2013, at 10:54 PM, Robert Coli  wrote:

>   On Mon, Sep 9, 2013 at 7:08 PM, Ike Walker  wrote:
> I've been using nodetool move to rebalance my cluster. Most of the moves take 
> under an hour, or a few hours at most. The current move has taken 4+ days so 
> I'm afraid it will never complete. What's the best way to cancel it and try 
> again?
> 
> What does "nodetool netstats" say? If it shows no streams in progress, the 
> move is probably hung...
> 
> Restart the affected node. If that doesn't work, restart other nodes which 
> might have been receiving a stream. I think in the case of "move" it should 
> work to just restart the affected node. Restart the move, you will re-stream 
> anything you already streamed once.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-3486
> 
> If this ticket were completed, it would presumably include the ability to 
> stop other hung streaming operations, like "move".
> 
> =Rob



read consistency and clock drift and ntp

2013-09-10 Thread Jimmy Lin
hi,
I have few question around the area how Cassandra use record's timestamp to
determine which one to return from its replicated nodes ...

-
A record's timestamp is determined by the Cassandra server node's system
timestamp when the request arrive the server and NOT by the client
timestamp who make the request(unlike timeuuid)??

-
so clock synchronization between nodes is very important , clock drifting
however is still possible even if one use NTP? I wonder what are the common
practices in Cassandra community do to minimize clock drifting?

-
is there a recommend maximum drifting allowed in a cluster before things
can get very ugly?

-
how to determine if two nodes in a cluster have out of sync clock?
(as monitor or alert so appropriate action can be taken)

Thanks


Re: Composite Column Grouping

2013-09-10 Thread Ravikumar Govindarajan
Thanks Michael,

But I cannot sort the rows in memory, as the number of columns will be
quite huge.

>From the python script above:
   select_stmt = "select * from time_series where userid = 'XYZ'"

This would return me many hundreds of thousands of columns. I need to go in
time-series order using ranges [Pagination queries].


On Wed, Sep 11, 2013 at 7:06 AM, Laing, Michael
wrote:

> If you have set up the table as described in my previous message, you
> could run this python snippet to return the desired result:
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
> import logging
> logging.basicConfig()
>
> from operator import itemgetter
>
> import cassandra
> from cassandra.cluster import Cluster
> from cassandra.query import SimpleStatement
>
> cql_cluster = Cluster()
> cql_session = cql_cluster.connect()
> cql_session.set_keyspace('latest')
>
> select_stmt = "select * from time_series where userid = 'XYZ'"
> query = SimpleStatement(select_stmt)
> rows = cql_session.execute(query)
>
> results = []
> for row in rows:
> max_time = max(row.colname.keys())
> results.append((row.userid, row.pkid, max_time, row.colname[max_time]))
>
> sorted_results = sorted(results, key=itemgetter(2), reverse=True)
> for result in sorted_results: print result
>
> # prints:
>
> # (u'XYZ', u'1002', u'204', u'Col-Name-5')
> # (u'XYZ', u'1000', u'203', u'Col-Name-4')
> # (u'XYZ', u'1001', u'201', u'Col-Name-2')
>
>
>
> On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael  > wrote:
>
>> You could try this. C* doesn't do it all for you, but it will efficiently
>> get you the right data.
>>
>> -ml
>>
>> -- put this in  and run using 'cqlsh -f 
>>
>> DROP KEYSPACE latest;
>>
>> CREATE KEYSPACE latest WITH replication = {
>> 'class': 'SimpleStrategy',
>> 'replication_factor' : 1
>> };
>>
>> USE latest;
>>
>> CREATE TABLE time_series (
>> userid text,
>> pkid text,
>> colname map,
>> PRIMARY KEY (userid, pkid)
>> );
>>
>> UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE
>> userid = 'XYZ' AND pkid = '1000';
>> UPDATE time_series SET colname = colname +
>> {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001';
>> UPDATE time_series SET colname = colname +
>> {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000';
>> UPDATE time_series SET colname = colname +
>> {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000';
>> UPDATE time_series SET colname = colname +
>> {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002';
>>
>> SELECT * FROM time_series WHERE userid = 'XYZ';
>>
>> -- returns:
>> -- userid | pkid | colname
>>
>> --+--+-
>> --XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203':
>> 'Col-Name-4'}
>> --XYZ | 1001 |   {'201':
>> 'Col-Name-2'}
>> --XYZ | 1002 |   {'204':
>> 'Col-Name-5'}
>>
>> -- use an app to pop off the latest key/value from the map for each row,
>> then sort by key desc.
>>
>>
>> On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan <
>> ravikumar.govindara...@gmail.com> wrote:
>>
>>> I have been faced with a problem of grouping composites on the
>>> second-part.
>>>
>>> Lets say my CF contains this
>>>
>>>
>>> TimeSeriesCF
>>>key:UserID
>>>composite-col-name:TimeUUID:PKID
>>>
>>> Some sample data
>>>
>>> UserID = XYZ
>>>  Time:PKID
>>>Col-Name1 = 200:1000
>>>Col-Name2 = 201:1001
>>>Col-Name3 = 202:1000
>>>Col-Name4 = 203:1000
>>>Col-Name5 = 204:1002
>>>
>>> Whenever a time-series query is issued, it should return the following
>>> in time-desc order.
>>>
>>> UserID = XYZ
>>>   Col-Name5 = 204:1002
>>>   Col-Name4 = 203:1000
>>>   Col-Name2 = 201:1001
>>>
>>> Is something like this possible in Cassandra? Is there a different way
>>> to design and achieve the same objective?
>>>
>>> --
>>> Ravi
>>>
>>>
>>
>>
>