cassandra disk access

2013-08-07 Thread Nikolay Mihaylov
Hi

I am researching various hash-tables and b-trees on disk.

while I researched, I has a thoughts about cassandra sstables that I want
to verify it here.

1. cassandra sstable uses sequential disk I/O when created. e.g. disk head
write it from the beginning to the end. Assuming the disk is not
fragmented, the sstable is placed on disk sectors one after the other.

2. when cassandra lookups a key in sstable (assuming bloom-filter and other
"stuff" failed, also assuming the key is located in this single sstable),
cassandra DO NOT USE sequential I/O. "She" probably will read the
hash-table slot or similar structure, then cassandra will do another disk
seek in order to get the value (and probably the key). Also probably there
will need another seek, if there is key collision there will need
additional seeks.

3. once the data (e.g. the row) is located, a sequential read for entire
row will occur. (Once again I assume there is single well compacted
sstable). Also if disk is not fragmented, the data will be placed on disk
sectors one after the other.

Am I wrong?

Nick.


Re: cassandra disk access

2013-08-07 Thread Michał Michalski



2. when cassandra lookups a key in sstable (assuming bloom-filter and other
"stuff" failed, also assuming the key is located in this single sstable),
cassandra DO NOT USE sequential I/O. "She" probably will read the
hash-table slot or similar structure, then cassandra will do another disk
seek in order to get the value (and probably the key). Also probably there
will need another seek, if there is key collision there will need
additional seeks.


It will use the Index Sample (RAM) first, then it will use "full" Index 
(disk) and finally it will read data from SSTable (disk). There's no 
such thing like "collision" in this case.



3. once the data (e.g. the row) is located, a sequential read for entire
row will occur. (Once again I assume there is single well compacted
sstable). Also if disk is not fragmented, the data will be placed on disk
sectors one after the other.


Yes, this is how I understand it too.

M.



Re: cassandra disk access

2013-08-07 Thread Nikolay Mihaylov
thanks

It will use the Index Sample (RAM) first, then it will use "full" Index
(disk) and finally it will read data from SSTable (disk). There's no such
thing like "collision" in this case.

so it still have 2 seeks :)

where I can see the internal structure of the sstable i tried to find it
documented but was unable to find anything ?




On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski  wrote:

>
>  2. when cassandra lookups a key in sstable (assuming bloom-filter and
>> other
>> "stuff" failed, also assuming the key is located in this single sstable),
>> cassandra DO NOT USE sequential I/O. "She" probably will read the
>> hash-table slot or similar structure, then cassandra will do another disk
>> seek in order to get the value (and probably the key). Also probably there
>> will need another seek, if there is key collision there will need
>> additional seeks.
>>
>
> It will use the Index Sample (RAM) first, then it will use "full" Index
> (disk) and finally it will read data from SSTable (disk). There's no such
> thing like "collision" in this case.
>
>
>  3. once the data (e.g. the row) is located, a sequential read for entire
>> row will occur. (Once again I assume there is single well compacted
>> sstable). Also if disk is not fragmented, the data will be placed on disk
>> sectors one after the other.
>>
>
> Yes, this is how I understand it too.
>
> M.
>
>


Re: cassandra disk access

2013-08-07 Thread Michał Michalski
I'm not sure how accurate it is (it's from 2011, one of its sources is 
from 2010), but I'm pretty sure it's more or less OK:


http://blog.csdn.net/firecoder/article/details/7019435

M.

W dniu 07.08.2013 10:34, Nikolay Mihaylov pisze:

thanks

It will use the Index Sample (RAM) first, then it will use "full" Index
(disk) and finally it will read data from SSTable (disk). There's no such
thing like "collision" in this case.

so it still have 2 seeks :)

where I can see the internal structure of the sstable i tried to find it
documented but was unable to find anything ?




On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski  wrote:



  2. when cassandra lookups a key in sstable (assuming bloom-filter and

other
"stuff" failed, also assuming the key is located in this single sstable),
cassandra DO NOT USE sequential I/O. "She" probably will read the
hash-table slot or similar structure, then cassandra will do another disk
seek in order to get the value (and probably the key). Also probably there
will need another seek, if there is key collision there will need
additional seeks.



It will use the Index Sample (RAM) first, then it will use "full" Index
(disk) and finally it will read data from SSTable (disk). There's no such
thing like "collision" in this case.


  3. once the data (e.g. the row) is located, a sequential read for entire

row will occur. (Once again I assume there is single well compacted
sstable). Also if disk is not fragmented, the data will be placed on disk
sectors one after the other.



Yes, this is how I understand it too.

M.








Composite keys - terrible write performance issue when using BATCH

2013-08-07 Thread Przemek Maciolek
Hi All,

I found a significant performance problem when using composite primary key,
"wide" row and BATCH.

Ideally, I would like to have following structure:
  CREATE TABLE bar1 (
some_id bigint,
some_type text,
some_value int,
some_data text,
PRIMARY KEY((some_id, some_type), some_value)
  );

For each (some_id, some_type) there might be hundreds of thousands columns.
However, storing them gets incredibly slow

So I played with the structure and used something like that (concatenating
some_type and some_value together) :
  CREATE TABLE bar2 (
some_id bigint,
some_type text,
some_value_and_data text,
PRIMARY KEY((some_id, some_type))
  );

The speedup was unbelievable. I made some more tests, using BATCH vs
executing each statement separately. 10 000 entries took following time
(seconds):

Using composite keys
Separetely: 12.892867
Batch: 189.731306
Using just partition key and wide row
Separetely: 11.292507
Batch: 0.093355

So using BATCH for composite key was roughly 2000 times slower than it
should be, making it pretty much unusable.

Why!?

My code snippet (using cql-rb) is available here:
http://pastebin.com/qAcRcqbF

Thanks,
Przemek


System hints compaction stuck

2013-08-07 Thread David McNelis
Morning folks,

For the last couple of days all of my nodes (17, all running 1.2.8) have
been stuck at various percentages of completion for compacting
system.hints.  I've tried restarting the nodes (including a full rolling
restart of the cluster) to no avail.

When I turn on Debugging I am seeing this error on all of the nodes
constantly:

DEBUG 09:03:21,999 Thrift transport error occurred during processing of
message.
org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)


When I turn on tracing, I see that shortly after this error there is a
message similar to:
TRACE 09:03:22,000 ClientState removed for socket addr /10.55.56.211:35431

The IP in this message is sometimes a client machine, sometimes another
cassandra node with no processes other than C* running on it (which I think
rules out an issue with a particular client library doing something funny
with Thrift).

While I wouldn't expect a Thrift issue to cause problems with compaction,
I'm out of other ideas at the moment.  Anyone have any thoughts they could
share?

Thanks,
David


Can't perform repair on a 1.1.5 cassandra node -SSTable corrupted

2013-08-07 Thread Madalina Matei
Hi,

I have a 5 nodes cassandra (version 1.1.5) ring, RF=2, CL- READ/Write =1.
After a node went down without any error reported in OS syslog or Cassandra
syslog i decided to perform a repair.

Each time i run a nodetool repair I get this error:

 INFO [FlushWriter:5] 2013-08-07 11:09:26,770 Memtable.java (line 305)
Completed flushing /data/-298-Data.db (18694 bytes) for commitlog
position ReplayPosition(segmentId=1375867548785, position=199)
ERROR [Thrift:286] 2013-08-07 11:10:04,448 CustomTThreadPoolServer.java
(line 204) Error occurred during processing of message.
java.lang.RuntimeException: error reading 1 of 1
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:116)
at
org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:203)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:117)
at
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:140)
at
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:107)
at
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:80)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1381)
at
org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1377)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1454)
at
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1433)
at
org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:50)
at
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:870)
at
org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:691)
at
org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3008)
at
org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:2996)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
at
org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:91)
at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:77)
at
org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324)
at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398)
at
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:88)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:83)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73)
at
org.apache.cass

Re: Can't perform repair on a 1.1.5 cassandra node -SSTable corrupted

2013-08-07 Thread Ondřej Černoš
Hello,

please see these issues:
https://issues.apache.org/jira/browse/CASSANDRA-5686 and
https://issues.apache.org/jira/browse/CASSANDRA-5391 if you hit any of them.

regards,

ondrej cernos


On Wed, Aug 7, 2013 at 5:00 PM, Madalina Matei wrote:

> Hi,
>
> I have a 5 nodes cassandra (version 1.1.5) ring, RF=2, CL- READ/Write =1.
> After a node went down without any error reported in OS syslog or Cassandra
> syslog i decided to perform a repair.
>
> Each time i run a nodetool repair I get this error:
>
>  INFO [FlushWriter:5] 2013-08-07 11:09:26,770 Memtable.java (line 305)
> Completed flushing /data/-298-Data.db (18694 bytes) for commitlog
> position ReplayPosition(segmentId=1375867548785, position=199)
> ERROR [Thrift:286] 2013-08-07 11:10:04,448 CustomTThreadPoolServer.java
> (line 204) Error occurred during processing of message.
> java.lang.RuntimeException: error reading 1 of 1
> at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83)
> at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:116)
> at
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:203)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:117)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:140)
> at
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:107)
> at
> org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:80)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at
> org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1381)
> at
> org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1377)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> at
> org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1454)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1433)
> at
> org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:50)
> at
> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:870)
> at
> org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:691)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3008)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:2996)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
> at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
> at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
> at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
> at
> org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94)
> at
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:91)
> at
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:77)
> at
> org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302)
> at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
> at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
> at
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324)
> at
> org.apache

lots of small nodes vs fewer big nodes

2013-08-07 Thread Paul Ingalls
Quick question about systems architecture.

Would it be better to run 5 nodes with 7GB RAM and 4CPU's or 10 nodes with 
3.5GB RAM and 2CPUS?

I'm currently running the former, but am considering the latter.  My goal would 
be to improve overall performance by spreading the IO across more disks.  My 
currently cluster has low CPU utilization but does spend a good amount of time 
in iowait.  Would moving to more smaller nodes help with that?  Or would I run 
into trouble with the smaller ram and cpu?

Thanks!

Paul

Re: lots of small nodes vs fewer big nodes

2013-08-07 Thread Andrey Ilinykh
You still have the same amount of RAM, so you cache the same amount of
data. I don't think you gain much here. On the other side, maintenance
procedures (compaction, repair) may hit your 2CPU box. I wouldn't do it.

Thank you,
  Andrey


On Wed, Aug 7, 2013 at 10:24 AM, Paul Ingalls  wrote:

> Quick question about systems architecture.
>
> Would it be better to run 5 nodes with 7GB RAM and 4CPU's or 10 nodes with
> 3.5GB RAM and 2CPUS?
>
> I'm currently running the former, but am considering the latter.  My goal
> would be to improve overall performance by spreading the IO across more
> disks.  My currently cluster has low CPU utilization but does spend a good
> amount of time in iowait.  Would moving to more smaller nodes help with
> that?  Or would I run into trouble with the smaller ram and cpu?
>
> Thanks!
>
> Paul


Re: Is there update-in-place on maps?

2013-08-07 Thread Aaron Morton
> As for the atomic increment, I take the answer is 'no, there is no atomic 
> increment, I have to pull the value to the client and send an update with the 
> new value'.
Saying "atomic increment" is probably confusing. 
You cannot have Counters, the thing most people would think about when you say 
"increment", in a collection type.

You can update the values in a map server side. 

If you can provide a concrete example of what you want to do it may be easier. 

Cheers
 
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/08/2013, at 10:05 PM, Andy Twigg  wrote:

> Counters can be atomically incremented 
> (http://wiki.apache.org/cassandra/Counters). Pick a UUID for the counter, and 
> use that: c=map.get(k); c.incr()
> 
> 
> On 6 August 2013 11:01, Jan Algermissen  wrote:
> 
> On 06.08.2013, at 11:36, Andy Twigg  wrote:
> 
> > Store pointers to counters as map values?
> 
> Sorry, but this fits into nothing I know about C* so far - can you explain?
> 
> Jan
> 
> 
> 
> 
> -- 
> Dr Andy Twigg
> Junior Research Fellow, St Johns College, Oxford
> Room 351, Department of Computer Science
> http://www.cs.ox.ac.uk/people/andy.twigg/
> andy.tw...@cs.ox.ac.uk | +447799647538



Re: Any good GUI based tool to manage data in Casandra?

2013-08-07 Thread Aaron Morton
I think on of the versions of ops centre has the feature 
http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

otherwise people use the cassandra-cli or cqlsh. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/08/2013, at 1:28 AM, Tony Anecito  wrote:

> Thanks Aaron. I found that before I asked the question and Helenos seems the 
> closest but it does not allow you to easily use CRUD like say SQL Server 
> Management tools where you can get a list of say 1,000 records in a grid 
> control and select rows for deletion or insert or update.
>  
> I will look closer at that one since this is the reply from the team but if 
> users on this email list have other suggestions please do not hesitate to 
> reply.
>  
> Many Thanks,
> -Tony
> 
> From: Aaron Morton 
> To: Cassandra User  
> Sent: Tuesday, August 6, 2013 1:38 AM
> Subject: Re: Any good GUI based tool to manage data in Casandra?
> 
> There is a list here. 
> 
> http://wiki.apache.org/cassandra/Administration%20Tools
> 
> Cheers
> 
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com/
> 
> On 3/08/2013, at 6:19 AM, Tony Anecito  wrote:
> 
>> Hi All,
>> 
>> Is there a GUI tool for managing data in Cassandra database? I have google 
>> and seen tools but they seem to be schema management or explorer to just 
>> view data. IT would be great to delete/inset rows or update values for a 
>> column via GUI.
>> 
>> Thanks,
>> -Tony
> 
> 
> 



Re: System hints compaction stuck

2013-08-07 Thread Nate McCall
Thrift and ClientState are both unrelated to hints.

What do you see in the logs after "Started hinted handoff for
host:..." from HintedHandoffManager?

It should either have an error message or something along the lines of
"Finished hinted handoff of:..."

Where there any schema updates that preceded this happening?

As for the thrift stuff, which rpc_server_type are you using?



On Wed, Aug 7, 2013 at 6:14 AM, David McNelis  wrote:
> Morning folks,
>
> For the last couple of days all of my nodes (17, all running 1.2.8) have
> been stuck at various percentages of completion for compacting system.hints.
> I've tried restarting the nodes (including a full rolling restart of the
> cluster) to no avail.
>
> When I turn on Debugging I am seeing this error on all of the nodes
> constantly:
>
> DEBUG 09:03:21,999 Thrift transport error occurred during processing of
> message.
> org.apache.thrift.transport.TTransportException
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
>
>
> When I turn on tracing, I see that shortly after this error there is a
> message similar to:
> TRACE 09:03:22,000 ClientState removed for socket addr /10.55.56.211:35431
>
> The IP in this message is sometimes a client machine, sometimes another
> cassandra node with no processes other than C* running on it (which I think
> rules out an issue with a particular client library doing something funny
> with Thrift).
>
> While I wouldn't expect a Thrift issue to cause problems with compaction,
> I'm out of other ideas at the moment.  Anyone have any thoughts they could
> share?
>
> Thanks,
> David


Re: Unable to bootstrap node

2013-08-07 Thread Aaron Morton
Thanks for the update :)

A

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/08/2013, at 7:03 AM, sankalp kohli  wrote:

> @Aaron
> This problem happens when you drop and recreate a keyspace with the same name 
> and you do it very quickly. I have also filed a JIRA for it
> 
> https://issues.apache.org/jira/browse/CASSANDRA-5843
> 
> 
> On Tue, Aug 6, 2013 at 10:31 AM, Keith Wright  wrote:
> The file does not appear on disk and the permissions are definitely correct.  
> We have seen the file in snapshots.   This is completely blocking us from 
> adding the new node.  How can we recover?  Just run repairs?
> 
> Thanks
> 
> From: Aaron Morton 
> Reply-To: "user@cassandra.apache.org" 
> Date: Tuesday, August 6, 2013 4:06 AM
> To: "user@cassandra.apache.org" 
> Subject: Re: Unable to bootstrap node
> 
>> Caused by: java.io.FileNotFoundException: 
>> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
>>  (No such file or directory)
>> at java.io.RandomAccessFile.open(Native Method)
>> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>> at 
>> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:67)
>> at 
>> org.apache.cassandra.io.compress.CompressedRandomAccessReader.
> This is somewhat serous, specially if it's from the a bug in dropping tables. 
> Though I would expect that would show up for a lot of people. 
> 
> Does the file exist on disk?
> Are the permissions correct ? 
> 
> IMHO you need to address this issue on the existing nodes before worrying 
> about the new node. 
> 
> Cheers
>  
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 6/08/2013, at 1:25 PM, sankalp kohli  wrote:
> 
>> Let me know if this fixes the problem?
>> 
>> 
>> On Mon, Aug 5, 2013 at 6:24 PM, sankalp kohli  wrote:
>> So the problem is that when you dropped and recreated the table with the 
>> same name, some how the old CFStore object was not purged. So now there were 
>> two objects which caused same sstable to have 2 SSTableReader object. 
>> 
>> The fix is to find all nodes which is emitting this FileNotFound Exception 
>> and restart them. 
>> 
>> In your case, restart the node which is serving the data and emitting 
>> FileNotFound exception. 
>> 
>> Once this is up, again restart the bootstrapping node with bootstrap 
>> argument. Now it will successfully stream the data. 
>> 
>> 
>> On Mon, Aug 5, 2013 at 6:08 PM, Keith Wright  wrote:
>> 
>> Yes we likely dropped and recreated tables.  If we stop the sending node, 
>> what will happen to the bootstrapping node?
>> 
>> sankalp kohli  wrote:
>> 
>> Hi,
>> The problem is that the node sending the stream is hitting this 
>> FileNotFound exception. You need to restart this node and it should fix the 
>> problem. 
>> 
>> Are you seeing lot of FileNotFoundExceptions? Did you do any schema change 
>> recently?
>> 
>> Sankalp
>> 
>> 
>> On Mon, Aug 5, 2013 at 5:39 PM, Keith Wright  wrote:
>> Hi all,
>> 
>>I have been trying to bootstrap a new node into my 7 node 1.2.4 C* 
>> cluster with Vnodes RF3 with no luck.  It gets close to completing and then 
>> the streaming just stalls with  streaming at 99% from 1 or 2 nodes.  
>> Nodetool netstats shows the items that have yet to stream but the logs on 
>> the new node do not show any errors.  I tried shutting down then node, 
>> clearing all data/commit logs/caches, and re-boot strapping with no luck.  
>> The nodes that are hanging sending the data only have the error below but 
>> that's related to compactions (see below) although it is one of the files 
>> that is waiting to be sent.  I tried nodetool scrub on the column family 
>> with the missing item but got an error indicating it could not get a hard 
>> link.  Any ideas?  We were able to bootstrap one of the new nodes with no 
>> issues but this other one has been a real pain.  Note that when the new node 
>> is joining the cluster, it does not appear in nodetool status.  Is that 
>> expected?
>> 
>> Thanks all, my next step is to try getting a new IP for this machine, my 
>> thought being that the cluster doesn't like me continuing to attempt to 
>> bootstrap the node repeatedly each time getting a new host id.
>> 
>> [kwright@lxpcas008 ~]$ nodetool netstats | grep 
>> rts-40301_feedProducts-ib-1-Data.db
>>rts: 
>> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
>>  sections=73 progress=0/1884669 - 0%
>> 
>> ERROR [ReadStage:427] 2013-08-05 23:23:29,294 CassandraDaemon.java (line 
>> 174) Exception in thread Thread[ReadStage:427,5,main]
>> java.lang.RuntimeException: java.io.FileNotFoundException: 
>> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db
>>  (No such file or directory)
>> at 
>> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(Com

Re: System hints compaction stuck

2013-08-07 Thread David McNelis
Nate,

We had a node that was flaking on us last week and had a lot of handoffs
fail to that node.  We ended up decommissioning that node entirely.  I
can't find the actual error we were getting at the time (logs have been
rotated out), but currently we're not seeing any errors there.

We haven't had any schema updates recently and we are using the sync rpc
server.  We had hsha turned on for a while, but we were getting a bunch of
transport frame size errors.


On Wed, Aug 7, 2013 at 1:55 PM, Nate McCall  wrote:

> Thrift and ClientState are both unrelated to hints.
>
> What do you see in the logs after "Started hinted handoff for
> host:..." from HintedHandoffManager?
>
> It should either have an error message or something along the lines of
> "Finished hinted handoff of:..."
>
> Where there any schema updates that preceded this happening?
>
> As for the thrift stuff, which rpc_server_type are you using?
>
>
>
> On Wed, Aug 7, 2013 at 6:14 AM, David McNelis  wrote:
> > Morning folks,
> >
> > For the last couple of days all of my nodes (17, all running 1.2.8) have
> > been stuck at various percentages of completion for compacting
> system.hints.
> > I've tried restarting the nodes (including a full rolling restart of the
> > cluster) to no avail.
> >
> > When I turn on Debugging I am seeing this error on all of the nodes
> > constantly:
> >
> > DEBUG 09:03:21,999 Thrift transport error occurred during processing of
> > message.
> > org.apache.thrift.transport.TTransportException
> > at
> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> > at
> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > at
> >
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> > at
> >
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> > at
> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> > at
> >
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> > at
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
> > at
> >
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:724)
> >
> >
> > When I turn on tracing, I see that shortly after this error there is a
> > message similar to:
> > TRACE 09:03:22,000 ClientState removed for socket addr /
> 10.55.56.211:35431
> >
> > The IP in this message is sometimes a client machine, sometimes another
> > cassandra node with no processes other than C* running on it (which I
> think
> > rules out an issue with a particular client library doing something funny
> > with Thrift).
> >
> > While I wouldn't expect a Thrift issue to cause problems with compaction,
> > I'm out of other ideas at the moment.  Anyone have any thoughts they
> could
> > share?
> >
> > Thanks,
> > David
>


Re: Any good GUI based tool to manage data in Casandra?

2013-08-07 Thread Nick Bailey
OpsCenter allows CRUD of column families  themselves (although not CQL3
column families). It only allows viewing the data inside column families
though, no support for writing or updating.


On Wed, Aug 7, 2013 at 12:54 PM, Aaron Morton wrote:

> I think on of the versions of ops centre has the feature
> http://www.datastax.com/what-we-offer/products-services/datastax-opscenter
>
> otherwise people use the cassandra-cli or cqlsh.
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/08/2013, at 1:28 AM, Tony Anecito  wrote:
>
> Thanks Aaron. I found that before I asked the question and Helenos seems
> the closest but it does not allow you to easily use CRUD like say SQL
> Server Management tools where you can get a list of say 1,000 records in a
> grid control and select rows for deletion or insert or update.
>
> I will look closer at that one since this is the reply from the team but
> if users on this email list have other suggestions please do not hesitate
> to reply.
>
> Many Thanks,
> -Tony
>
>   *From:* Aaron Morton 
> *To:* Cassandra User 
> *Sent:* Tuesday, August 6, 2013 1:38 AM
> *Subject:* Re: Any good GUI based tool to manage data in Casandra?
>
>  There is a list here.
>
> http://wiki.apache.org/cassandra/Administration%20Tools
>
> Cheers
>
>  -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com/
>
>  On 3/08/2013, at 6:19 AM, Tony Anecito  wrote:
>
>  Hi All,
>
> Is there a GUI tool for managing data in Cassandra database? I have google
> and seen tools but they seem to be schema management or explorer to just
> view data. IT would be great to delete/inset rows or update values for a
> column via GUI.
>
> Thanks,
> -Tony
>
>
>
>
>
>


Re: Any good GUI based tool to manage data in Casandra?

2013-08-07 Thread Tony Anecito
Thanks Nick for your reply. Good to know that. I knew OpsCenter was mainly 
schema management.

Best Regards,
-Tony





 From: Nick Bailey 
To: user@cassandra.apache.org 
Sent: Wednesday, August 7, 2013 12:04 PM
Subject: Re: Any good GUI based tool to manage data in Casandra?
 


OpsCenter allows CRUD of column families  themselves (although not CQL3 column 
families). It only allows viewing the data inside column families though, no 
support for writing or updating.



On Wed, Aug 7, 2013 at 12:54 PM, Aaron Morton  wrote:

I think on of the versions of ops centre has the feature 
http://www.datastax.com/what-we-offer/products-services/datastax-opscenter
>
>
>otherwise people use the cassandra-cli or cqlsh. 
>
>
>Cheers
>
>
>-
>Aaron Morton
>Cassandra Consultant
>New Zealand
>
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 7/08/2013, at 1:28 AM, Tony Anecito  wrote:
>
>Thanks Aaron. I found that before I asked the question and Helenos seems the 
>closest but it does not allow you to easily use CRUD like say SQL Server 
>Management tools where you can get a list of say 1,000 records in a grid 
>control and select rows for deletion or insert or update.
>> 
>>I will look closer at that one since this is the reply from the team but if 
>>users on this email list have other suggestions please do not hesitate to 
>>reply.
>> 
>>Many Thanks,
>>-Tony
>>
>>
>>From: Aaron Morton 
>>To: Cassandra User  
>>Sent: Tuesday, August 6, 2013 1:38 AM
>>Subject: Re: Any good GUI based tool to manage data in Casandra?
>>
>>
>>
>>There is a list here.  
>>
>>
>>http://wiki.apache.org/cassandra/Administration%20Tools
>>
>>
>>Cheers
>>
>>
>>-
>>Aaron Morton
>>Cassandra Consultant
>>New Zealand
>>
>>
>>@aaronmorton
>>http://www.thelastpickle.com/
>>
>>On 3/08/2013, at 6:19 AM, Tony Anecito  wrote:
>>
>>Hi All,
>>>
>>>
>>>Is there a GUI tool for managing data in Cassandra database? I have google 
>>>and seen tools but they seem to be schema management or explorer to just 
>>>view data. IT would be great to delete/inset rows or update values for a 
>>>column via GUI.
>>>
>>>
>>>Thanks,
>>>-Tony
>>>
>>
>>
>>
>

Re: System hints compaction stuck

2013-08-07 Thread Nate McCall
Is there anything else on the network that could be attempting to
connect to 9160?

That is the exact error you would get when someone initiates a
connection and sends a null byte. You can reproduce it thusly:
echo -n 'm' | nc localhost 9160


On Wed, Aug 7, 2013 at 11:11 AM, David McNelis  wrote:
> Nate,
>
> We had a node that was flaking on us last week and had a lot of handoffs
> fail to that node.  We ended up decommissioning that node entirely.  I can't
> find the actual error we were getting at the time (logs have been rotated
> out), but currently we're not seeing any errors there.
>
> We haven't had any schema updates recently and we are using the sync rpc
> server.  We had hsha turned on for a while, but we were getting a bunch of
> transport frame size errors.
>
>
> On Wed, Aug 7, 2013 at 1:55 PM, Nate McCall  wrote:
>>
>> Thrift and ClientState are both unrelated to hints.
>>
>> What do you see in the logs after "Started hinted handoff for
>> host:..." from HintedHandoffManager?
>>
>> It should either have an error message or something along the lines of
>> "Finished hinted handoff of:..."
>>
>> Where there any schema updates that preceded this happening?
>>
>> As for the thrift stuff, which rpc_server_type are you using?
>>
>>
>>
>> On Wed, Aug 7, 2013 at 6:14 AM, David McNelis  wrote:
>> > Morning folks,
>> >
>> > For the last couple of days all of my nodes (17, all running 1.2.8) have
>> > been stuck at various percentages of completion for compacting
>> > system.hints.
>> > I've tried restarting the nodes (including a full rolling restart of the
>> > cluster) to no avail.
>> >
>> > When I turn on Debugging I am seeing this error on all of the nodes
>> > constantly:
>> >
>> > DEBUG 09:03:21,999 Thrift transport error occurred during processing of
>> > message.
>> > org.apache.thrift.transport.TTransportException
>> > at
>> >
>> > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>> > at
>> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>> > at
>> >
>> > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>> > at
>> >
>> > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>> > at
>> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>> > at
>> >
>> > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>> > at
>> >
>> > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>> > at
>> >
>> > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>> > at
>> > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
>> > at
>> >
>> > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
>> > at
>> >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > at
>> >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > at java.lang.Thread.run(Thread.java:724)
>> >
>> >
>> > When I turn on tracing, I see that shortly after this error there is a
>> > message similar to:
>> > TRACE 09:03:22,000 ClientState removed for socket addr
>> > /10.55.56.211:35431
>> >
>> > The IP in this message is sometimes a client machine, sometimes another
>> > cassandra node with no processes other than C* running on it (which I
>> > think
>> > rules out an issue with a particular client library doing something
>> > funny
>> > with Thrift).
>> >
>> > While I wouldn't expect a Thrift issue to cause problems with
>> > compaction,
>> > I'm out of other ideas at the moment.  Anyone have any thoughts they
>> > could
>> > share?
>> >
>> > Thanks,
>> > David
>
>


Re: System hints compaction stuck

2013-08-07 Thread David McNelis
Fwiw, similar to another issue of stuck compaction that was on the list
several days ago, if I cleared out the hints, either by removing files
while node was down, or running a scrub on system.hints during node
startup, I was able to get these compactions cleared, an the nodes are
starting to get caught up on tasks that had been blocked.

Nate, there are definiately a number of things that could be hitting the
9160 port... but I was seeing the transport size error even between nodes
(and there was nothing runnining on any node other than C*)... switching
back to sync and no longer get that error.


On Wed, Aug 7, 2013 at 2:58 PM, Nate McCall  wrote:

> Is there anything else on the network that could be attempting to
> connect to 9160?
>
> That is the exact error you would get when someone initiates a
> connection and sends a null byte. You can reproduce it thusly:
> echo -n 'm' | nc localhost 9160
>
>
> On Wed, Aug 7, 2013 at 11:11 AM, David McNelis  wrote:
> > Nate,
> >
> > We had a node that was flaking on us last week and had a lot of handoffs
> > fail to that node.  We ended up decommissioning that node entirely.  I
> can't
> > find the actual error we were getting at the time (logs have been rotated
> > out), but currently we're not seeing any errors there.
> >
> > We haven't had any schema updates recently and we are using the sync rpc
> > server.  We had hsha turned on for a while, but we were getting a bunch
> of
> > transport frame size errors.
> >
> >
> > On Wed, Aug 7, 2013 at 1:55 PM, Nate McCall  wrote:
> >>
> >> Thrift and ClientState are both unrelated to hints.
> >>
> >> What do you see in the logs after "Started hinted handoff for
> >> host:..." from HintedHandoffManager?
> >>
> >> It should either have an error message or something along the lines of
> >> "Finished hinted handoff of:..."
> >>
> >> Where there any schema updates that preceded this happening?
> >>
> >> As for the thrift stuff, which rpc_server_type are you using?
> >>
> >>
> >>
> >> On Wed, Aug 7, 2013 at 6:14 AM, David McNelis 
> wrote:
> >> > Morning folks,
> >> >
> >> > For the last couple of days all of my nodes (17, all running 1.2.8)
> have
> >> > been stuck at various percentages of completion for compacting
> >> > system.hints.
> >> > I've tried restarting the nodes (including a full rolling restart of
> the
> >> > cluster) to no avail.
> >> >
> >> > When I turn on Debugging I am seeing this error on all of the nodes
> >> > constantly:
> >> >
> >> > DEBUG 09:03:21,999 Thrift transport error occurred during processing
> of
> >> > message.
> >> > org.apache.thrift.transport.TTransportException
> >> > at
> >> >
> >> >
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >> > at
> >> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> >> > at
> >> >
> >> >
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> >> > at
> >> >
> >> >
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> >> > at
> >> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> >> > at
> >> >
> >> >
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> >> > at
> >> >
> >> >
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> >> > at
> >> >
> >> >
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> >> > at
> >> > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
> >> > at
> >> >
> >> >
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
> >> > at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> > at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> > at java.lang.Thread.run(Thread.java:724)
> >> >
> >> >
> >> > When I turn on tracing, I see that shortly after this error there is a
> >> > message similar to:
> >> > TRACE 09:03:22,000 ClientState removed for socket addr
> >> > /10.55.56.211:35431
> >> >
> >> > The IP in this message is sometimes a client machine, sometimes
> another
> >> > cassandra node with no processes other than C* running on it (which I
> >> > think
> >> > rules out an issue with a particular client library doing something
> >> > funny
> >> > with Thrift).
> >> >
> >> > While I wouldn't expect a Thrift issue to cause problems with
> >> > compaction,
> >> > I'm out of other ideas at the moment.  Anyone have any thoughts they
> >> > could
> >> > share?
> >> >
> >> > Thanks,
> >> > David
> >
> >
>


Re: Large number of pending gossip stage tasks in nodetool tpstats

2013-08-07 Thread Aaron Morton
>  When looking at nodetool
> gossipinfo, I notice that this node has updated to the latest schema hash, but
> that it thinks other nodes in the cluster are on the older version.
What does describe cluster in cassandra-cli say ? It will let you know if there 
are multiple schema versions in the cluster. 

Can you include the output from nodetool gossipinfo ? 

You may also get some value from increase the log level for 
org.apache.cassandra.gms.Gossiper to DEBUG so you can see what's going on. It's 
unusual for only the gossip pool to backup. If there were issues with GC taking 
CPU we would expect to see it across the board. 

Cheers



-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/08/2013, at 7:52 AM, Faraaz Sareshwala  wrote:

> I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. All
> nodes are well behaved except one. Whenever I start this node, it starts
> churning CPU. Running nodetool tpstats, I notice that the number of pending
> gossip stage tasks is constantly increasing [1]. When looking at nodetool
> gossipinfo, I notice that this node has updated to the latest schema hash, but
> that it thinks other nodes in the cluster are on the older version. I've tried
> to drain, decommission, wipe node data, bootstrap, and repair the node. 
> However,
> the node just started doing the same thing again.
> 
> Has anyone run into this issue before? Can anyone provide any insight into why
> this node is the only one in the cluster having problems? Are there any easy
> fixes?
> 
> Thank you,
> Faraaz
> 
> [1] $ /cassandra/bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0  8 0
>  0
> RequestResponseStage  0 0  49198 0
>  0
> MutationStage 0 0 224286 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> GossipStage   1  2213 18 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0 72 0
>  0
> MemtablePostFlusher   0 0102 0
>  0
> FlushWriter   0 0 99 0
>  0
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0 19 0
>  0
> HintedHandoff 0 0  2 0
>  0
> 
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION 0
> _TRACE   0
> REQUEST_RESPONSE 0



Re: cassandra disk access

2013-08-07 Thread Aaron Morton
Some background on the read and write paths, some of the extra details are a 
little out of date but mostly correct in 1.2

http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118/40
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/08/2013, at 9:07 PM, Michał Michalski  wrote:

> I'm not sure how accurate it is (it's from 2011, one of its sources is from 
> 2010), but I'm pretty sure it's more or less OK:
> 
> http://blog.csdn.net/firecoder/article/details/7019435
> 
> M.
> 
> W dniu 07.08.2013 10:34, Nikolay Mihaylov pisze:
>> thanks
>> 
>> It will use the Index Sample (RAM) first, then it will use "full" Index
>> (disk) and finally it will read data from SSTable (disk). There's no such
>> thing like "collision" in this case.
>> 
>> so it still have 2 seeks :)
>> 
>> where I can see the internal structure of the sstable i tried to find it
>> documented but was unable to find anything ?
>> 
>> 
>> 
>> 
>> On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski  wrote:
>> 
>>> 
>>>  2. when cassandra lookups a key in sstable (assuming bloom-filter and
 other
 "stuff" failed, also assuming the key is located in this single sstable),
 cassandra DO NOT USE sequential I/O. "She" probably will read the
 hash-table slot or similar structure, then cassandra will do another disk
 seek in order to get the value (and probably the key). Also probably there
 will need another seek, if there is key collision there will need
 additional seeks.
 
>>> 
>>> It will use the Index Sample (RAM) first, then it will use "full" Index
>>> (disk) and finally it will read data from SSTable (disk). There's no such
>>> thing like "collision" in this case.
>>> 
>>> 
>>>  3. once the data (e.g. the row) is located, a sequential read for entire
 row will occur. (Once again I assume there is single well compacted
 sstable). Also if disk is not fragmented, the data will be placed on disk
 sectors one after the other.
 
>>> 
>>> Yes, this is how I understand it too.
>>> 
>>> M.
>>> 
>>> 
>> 
> 



Re: Large number of pending gossip stage tasks in nodetool tpstats

2013-08-07 Thread Faraaz Sareshwala
Thanks Aaron. The node that was behaving this way was a production node so I 
had to take some drastic measures to get it back to doing the right thing. It's 
no longer behaving this way after wiping the system tables and having cassandra 
resync the schema from other nodes. In hindsight, maybe I could have gotten 
away with a nodetool resetlocalschema. Since the node has been restored to a 
working state, I sadly can't run commands on it to investigate any longer.

When the node was in this hosed state, I did check nodetool gossipinfo. The bad 
node had the correct schema hash; the same as the rest of the nodes in the 
cluster. However, it thought every other node in the cluster had another schema 
hash, most likely the older one everyone migrated from.

This issue occurred again today on three machines so I feel it may occur again. 
Typically I see it when our entire datacenter updates it's configuration and 
restarts along an hour. All nodes point to the same list of seeds, but the 
restart order is random across one your. I'm not sure if this information helps 
at all.

Are there any specific things I should look for when it does occur again?

Thank you,
Faraaz

On Aug 7, 2013, at 7:23 PM, "Aaron Morton"  wrote:

>> When looking at nodetool
>> gossipinfo, I notice that this node has updated to the latest schema hash, 
>> but
>> that it thinks other nodes in the cluster are on the older version.
> What does describe cluster in cassandra-cli say ? It will let you know if 
> there are multiple schema versions in the cluster. 
> 
> Can you include the output from nodetool gossipinfo ? 
> 
> You may also get some value from increase the log level for 
> org.apache.cassandra.gms.Gossiper to DEBUG so you can see what's going on. 
> It's unusual for only the gossip pool to backup. If there were issues with GC 
> taking CPU we would expect to see it across the board. 
> 
> Cheers
> 
> 
> 
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 7/08/2013, at 7:52 AM, Faraaz Sareshwala  wrote:
> 
>> I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. 
>> All
>> nodes are well behaved except one. Whenever I start this node, it starts
>> churning CPU. Running nodetool tpstats, I notice that the number of pending
>> gossip stage tasks is constantly increasing [1]. When looking at nodetool
>> gossipinfo, I notice that this node has updated to the latest schema hash, 
>> but
>> that it thinks other nodes in the cluster are on the older version. I've 
>> tried
>> to drain, decommission, wipe node data, bootstrap, and repair the node. 
>> However,
>> the node just started doing the same thing again.
>> 
>> Has anyone run into this issue before? Can anyone provide any insight into 
>> why
>> this node is the only one in the cluster having problems? Are there any easy
>> fixes?
>> 
>> Thank you,
>> Faraaz
>> 
>> [1] $ /cassandra/bin/nodetool tpstats
>> Pool NameActive   Pending  Completed   Blocked  All 
>> time blocked
>> ReadStage 0 0  8 0   
>>   0
>> RequestResponseStage  0 0  49198 0   
>>   0
>> MutationStage 0 0 224286 0   
>>   0
>> ReadRepairStage   0 0  0 0   
>>   0
>> ReplicateOnWriteStage 0 0  0 0   
>>   0
>> GossipStage   1  2213 18 0   
>>   0
>> AntiEntropyStage  0 0  0 0   
>>   0
>> MigrationStage0 0 72 0   
>>   0
>> MemtablePostFlusher   0 0102 0   
>>   0
>> FlushWriter   0 0 99 0   
>>   0
>> MiscStage 0 0  0 0   
>>   0
>> commitlog_archiver0 0  0 0   
>>   0
>> InternalResponseStage 0 0 19 0   
>>   0
>> HintedHandoff 0 0  2 0   
>>   0
>> 
>> Message type   Dropped
>> RANGE_SLICE  0
>> READ_REPAIR  0
>> BINARY   0
>> READ 0
>> MUTATION 0
>> _TRACE   0
>> REQUEST_RESPONSE 0
> 


Re: Large number of pending gossip stage tasks in nodetool tpstats

2013-08-07 Thread Faraaz Sareshwala
And by that last statement, I mean are there any further things I should look 
for given the information in my response? I'll definitely look at implementing 
your suggestions and see what I can find.

On Aug 7, 2013, at 7:31 PM, "Faraaz Sareshwala"  
wrote:

> Thanks Aaron. The node that was behaving this way was a production node so I 
> had to take some drastic measures to get it back to doing the right thing. 
> It's no longer behaving this way after wiping the system tables and having 
> cassandra resync the schema from other nodes. In hindsight, maybe I could 
> have gotten away with a nodetool resetlocalschema. Since the node has been 
> restored to a working state, I sadly can't run commands on it to investigate 
> any longer.
> 
> When the node was in this hosed state, I did check nodetool gossipinfo. The 
> bad node had the correct schema hash; the same as the rest of the nodes in 
> the cluster. However, it thought every other node in the cluster had another 
> schema hash, most likely the older one everyone migrated from.
> 
> This issue occurred again today on three machines so I feel it may occur 
> again. Typically I see it when our entire datacenter updates it's 
> configuration and restarts along an hour. All nodes point to the same list of 
> seeds, but the restart order is random across one your. I'm not sure if this 
> information helps at all.
> 
> Are there any specific things I should look for when it does occur again?
> 
> Thank you,
> Faraaz
> 
> On Aug 7, 2013, at 7:23 PM, "Aaron Morton"  wrote:
> 
>>> When looking at nodetool
>>> gossipinfo, I notice that this node has updated to the latest schema hash, 
>>> but
>>> that it thinks other nodes in the cluster are on the older version.
>> What does describe cluster in cassandra-cli say ? It will let you know if 
>> there are multiple schema versions in the cluster. 
>> 
>> Can you include the output from nodetool gossipinfo ? 
>> 
>> You may also get some value from increase the log level for 
>> org.apache.cassandra.gms.Gossiper to DEBUG so you can see what's going on. 
>> It's unusual for only the gossip pool to backup. If there were issues with 
>> GC taking CPU we would expect to see it across the board. 
>> 
>> Cheers
>> 
>> 
>> 
>> -
>> Aaron Morton
>> Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 7/08/2013, at 7:52 AM, Faraaz Sareshwala  
>> wrote:
>> 
>>> I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. 
>>> All
>>> nodes are well behaved except one. Whenever I start this node, it starts
>>> churning CPU. Running nodetool tpstats, I notice that the number of pending
>>> gossip stage tasks is constantly increasing [1]. When looking at nodetool
>>> gossipinfo, I notice that this node has updated to the latest schema hash, 
>>> but
>>> that it thinks other nodes in the cluster are on the older version. I've 
>>> tried
>>> to drain, decommission, wipe node data, bootstrap, and repair the node. 
>>> However,
>>> the node just started doing the same thing again.
>>> 
>>> Has anyone run into this issue before? Can anyone provide any insight into 
>>> why
>>> this node is the only one in the cluster having problems? Are there any easy
>>> fixes?
>>> 
>>> Thank you,
>>> Faraaz
>>> 
>>> [1] $ /cassandra/bin/nodetool tpstats
>>> Pool NameActive   Pending  Completed   Blocked  All 
>>> time blocked
>>> ReadStage 0 0  8 0  
>>>0
>>> RequestResponseStage  0 0  49198 0  
>>>0
>>> MutationStage 0 0 224286 0  
>>>0
>>> ReadRepairStage   0 0  0 0  
>>>0
>>> ReplicateOnWriteStage 0 0  0 0  
>>>0
>>> GossipStage   1  2213 18 0  
>>>0
>>> AntiEntropyStage  0 0  0 0  
>>>0
>>> MigrationStage0 0 72 0  
>>>0
>>> MemtablePostFlusher   0 0102 0  
>>>0
>>> FlushWriter   0 0 99 0  
>>>0
>>> MiscStage 0 0  0 0  
>>>0
>>> commitlog_archiver0 0  0 0  
>>>0
>>> InternalResponseStage 0 0 19 0  
>>>0
>>> HintedHandoff 0 0  2 0  
>>>0
>>> 
>>> Message type   Dropped
>>> RANGE_SLICE  0
>>> READ_REPAIR  0
>>> BINARY   0
>>> READ 0

Re: Is there update-in-place on maps?

2013-08-07 Thread Alex Popescu
On Wed, Aug 7, 2013 at 10:47 AM, Aaron Morton wrote:

> As for the atomic increment, I take the answer is 'no, there is no atomic
> increment, I have to pull the value to the client and send an update with
> the new value'.
>
> Saying "atomic increment" is probably confusing.
> You cannot have Counters, the thing most people would think about when you
> say "increment", in a collection type.
>
> You can update the values in a map server side.
>
> If you can provide a concrete example of what you want to do it may be
> easier.
>
>

I think the OP is asking if the following op is atomic:

UPDATE users SET favs['posts'] = favs['post'] + 1 WHERE id = 'smith'


:- a)

Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/08/2013, at 10:05 PM, Andy Twigg  wrote:
>
> Counters can be atomically incremented (
> http://wiki.apache.org/cassandra/Counters). Pick a UUID for the counter,
> and use that: c=map.get(k); c.incr()
>
>
> On 6 August 2013 11:01, Jan Algermissen wrote:
>
>>
>> On 06.08.2013, at 11:36, Andy Twigg  wrote:
>>
>> > Store pointers to counters as map values?
>>
>> Sorry, but this fits into nothing I know about C* so far - can you
>> explain?
>>
>> Jan
>>
>>
>
>
> --
> Dr Andy Twigg
> Junior Research Fellow, St Johns College, Oxford
> Room 351, Department of Computer Science
> http://www.cs.ox.ac.uk/people/andy.twigg/
> andy.tw...@cs.ox.ac.uk | +447799647538
>
>
>


-- 

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru