cassandra disk access
Hi I am researching various hash-tables and b-trees on disk. while I researched, I has a thoughts about cassandra sstables that I want to verify it here. 1. cassandra sstable uses sequential disk I/O when created. e.g. disk head write it from the beginning to the end. Assuming the disk is not fragmented, the sstable is placed on disk sectors one after the other. 2. when cassandra lookups a key in sstable (assuming bloom-filter and other "stuff" failed, also assuming the key is located in this single sstable), cassandra DO NOT USE sequential I/O. "She" probably will read the hash-table slot or similar structure, then cassandra will do another disk seek in order to get the value (and probably the key). Also probably there will need another seek, if there is key collision there will need additional seeks. 3. once the data (e.g. the row) is located, a sequential read for entire row will occur. (Once again I assume there is single well compacted sstable). Also if disk is not fragmented, the data will be placed on disk sectors one after the other. Am I wrong? Nick.
Re: cassandra disk access
2. when cassandra lookups a key in sstable (assuming bloom-filter and other "stuff" failed, also assuming the key is located in this single sstable), cassandra DO NOT USE sequential I/O. "She" probably will read the hash-table slot or similar structure, then cassandra will do another disk seek in order to get the value (and probably the key). Also probably there will need another seek, if there is key collision there will need additional seeks. It will use the Index Sample (RAM) first, then it will use "full" Index (disk) and finally it will read data from SSTable (disk). There's no such thing like "collision" in this case. 3. once the data (e.g. the row) is located, a sequential read for entire row will occur. (Once again I assume there is single well compacted sstable). Also if disk is not fragmented, the data will be placed on disk sectors one after the other. Yes, this is how I understand it too. M.
Re: cassandra disk access
thanks It will use the Index Sample (RAM) first, then it will use "full" Index (disk) and finally it will read data from SSTable (disk). There's no such thing like "collision" in this case. so it still have 2 seeks :) where I can see the internal structure of the sstable i tried to find it documented but was unable to find anything ? On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski wrote: > > 2. when cassandra lookups a key in sstable (assuming bloom-filter and >> other >> "stuff" failed, also assuming the key is located in this single sstable), >> cassandra DO NOT USE sequential I/O. "She" probably will read the >> hash-table slot or similar structure, then cassandra will do another disk >> seek in order to get the value (and probably the key). Also probably there >> will need another seek, if there is key collision there will need >> additional seeks. >> > > It will use the Index Sample (RAM) first, then it will use "full" Index > (disk) and finally it will read data from SSTable (disk). There's no such > thing like "collision" in this case. > > > 3. once the data (e.g. the row) is located, a sequential read for entire >> row will occur. (Once again I assume there is single well compacted >> sstable). Also if disk is not fragmented, the data will be placed on disk >> sectors one after the other. >> > > Yes, this is how I understand it too. > > M. > >
Re: cassandra disk access
I'm not sure how accurate it is (it's from 2011, one of its sources is from 2010), but I'm pretty sure it's more or less OK: http://blog.csdn.net/firecoder/article/details/7019435 M. W dniu 07.08.2013 10:34, Nikolay Mihaylov pisze: thanks It will use the Index Sample (RAM) first, then it will use "full" Index (disk) and finally it will read data from SSTable (disk). There's no such thing like "collision" in this case. so it still have 2 seeks :) where I can see the internal structure of the sstable i tried to find it documented but was unable to find anything ? On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski wrote: 2. when cassandra lookups a key in sstable (assuming bloom-filter and other "stuff" failed, also assuming the key is located in this single sstable), cassandra DO NOT USE sequential I/O. "She" probably will read the hash-table slot or similar structure, then cassandra will do another disk seek in order to get the value (and probably the key). Also probably there will need another seek, if there is key collision there will need additional seeks. It will use the Index Sample (RAM) first, then it will use "full" Index (disk) and finally it will read data from SSTable (disk). There's no such thing like "collision" in this case. 3. once the data (e.g. the row) is located, a sequential read for entire row will occur. (Once again I assume there is single well compacted sstable). Also if disk is not fragmented, the data will be placed on disk sectors one after the other. Yes, this is how I understand it too. M.
Composite keys - terrible write performance issue when using BATCH
Hi All, I found a significant performance problem when using composite primary key, "wide" row and BATCH. Ideally, I would like to have following structure: CREATE TABLE bar1 ( some_id bigint, some_type text, some_value int, some_data text, PRIMARY KEY((some_id, some_type), some_value) ); For each (some_id, some_type) there might be hundreds of thousands columns. However, storing them gets incredibly slow So I played with the structure and used something like that (concatenating some_type and some_value together) : CREATE TABLE bar2 ( some_id bigint, some_type text, some_value_and_data text, PRIMARY KEY((some_id, some_type)) ); The speedup was unbelievable. I made some more tests, using BATCH vs executing each statement separately. 10 000 entries took following time (seconds): Using composite keys Separetely: 12.892867 Batch: 189.731306 Using just partition key and wide row Separetely: 11.292507 Batch: 0.093355 So using BATCH for composite key was roughly 2000 times slower than it should be, making it pretty much unusable. Why!? My code snippet (using cql-rb) is available here: http://pastebin.com/qAcRcqbF Thanks, Przemek
System hints compaction stuck
Morning folks, For the last couple of days all of my nodes (17, all running 1.2.8) have been stuck at various percentages of completion for compacting system.hints. I've tried restarting the nodes (including a full rolling restart of the cluster) to no avail. When I turn on Debugging I am seeing this error on all of the nodes constantly: DEBUG 09:03:21,999 Thrift transport error occurred during processing of message. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) When I turn on tracing, I see that shortly after this error there is a message similar to: TRACE 09:03:22,000 ClientState removed for socket addr /10.55.56.211:35431 The IP in this message is sometimes a client machine, sometimes another cassandra node with no processes other than C* running on it (which I think rules out an issue with a particular client library doing something funny with Thrift). While I wouldn't expect a Thrift issue to cause problems with compaction, I'm out of other ideas at the moment. Anyone have any thoughts they could share? Thanks, David
Can't perform repair on a 1.1.5 cassandra node -SSTable corrupted
Hi, I have a 5 nodes cassandra (version 1.1.5) ring, RF=2, CL- READ/Write =1. After a node went down without any error reported in OS syslog or Cassandra syslog i decided to perform a repair. Each time i run a nodetool repair I get this error: INFO [FlushWriter:5] 2013-08-07 11:09:26,770 Memtable.java (line 305) Completed flushing /data/-298-Data.db (18694 bytes) for commitlog position ReplayPosition(segmentId=1375867548785, position=199) ERROR [Thrift:286] 2013-08-07 11:10:04,448 CustomTThreadPoolServer.java (line 204) Error occurred during processing of message. java.lang.RuntimeException: error reading 1 of 1 at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:116) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:203) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:117) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:140) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:107) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:80) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1381) at org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1377) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1454) at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1433) at org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:50) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:870) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:691) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3008) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:2996) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) at org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:91) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:77) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361) at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:398) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:380) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:88) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:83) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73) at org.apache.cass
Re: Can't perform repair on a 1.1.5 cassandra node -SSTable corrupted
Hello, please see these issues: https://issues.apache.org/jira/browse/CASSANDRA-5686 and https://issues.apache.org/jira/browse/CASSANDRA-5391 if you hit any of them. regards, ondrej cernos On Wed, Aug 7, 2013 at 5:00 PM, Madalina Matei wrote: > Hi, > > I have a 5 nodes cassandra (version 1.1.5) ring, RF=2, CL- READ/Write =1. > After a node went down without any error reported in OS syslog or Cassandra > syslog i decided to perform a repair. > > Each time i run a nodetool repair I get this error: > > INFO [FlushWriter:5] 2013-08-07 11:09:26,770 Memtable.java (line 305) > Completed flushing /data/-298-Data.db (18694 bytes) for commitlog > position ReplayPosition(segmentId=1375867548785, position=199) > ERROR [Thrift:286] 2013-08-07 11:10:04,448 CustomTThreadPoolServer.java > (line 204) Error occurred during processing of message. > java.lang.RuntimeException: error reading 1 of 1 > at > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83) > at > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:116) > at > org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:203) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:117) > at > org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:140) > at > org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:107) > at > org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:80) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) > at > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1381) > at > org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1377) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) > at > org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1454) > at > org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1433) > at > org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:50) > at > org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:870) > at > org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:691) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:3008) > at > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.getResult(Cassandra.java:2996) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5) > at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) > at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) > at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) > at > org.apache.cassandra.io.compress.SnappyCompressor.uncompress(SnappyCompressor.java:94) > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:91) > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:77) > at > org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:302) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381) > at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361) > at > org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:324) > at > org.apache
lots of small nodes vs fewer big nodes
Quick question about systems architecture. Would it be better to run 5 nodes with 7GB RAM and 4CPU's or 10 nodes with 3.5GB RAM and 2CPUS? I'm currently running the former, but am considering the latter. My goal would be to improve overall performance by spreading the IO across more disks. My currently cluster has low CPU utilization but does spend a good amount of time in iowait. Would moving to more smaller nodes help with that? Or would I run into trouble with the smaller ram and cpu? Thanks! Paul
Re: lots of small nodes vs fewer big nodes
You still have the same amount of RAM, so you cache the same amount of data. I don't think you gain much here. On the other side, maintenance procedures (compaction, repair) may hit your 2CPU box. I wouldn't do it. Thank you, Andrey On Wed, Aug 7, 2013 at 10:24 AM, Paul Ingalls wrote: > Quick question about systems architecture. > > Would it be better to run 5 nodes with 7GB RAM and 4CPU's or 10 nodes with > 3.5GB RAM and 2CPUS? > > I'm currently running the former, but am considering the latter. My goal > would be to improve overall performance by spreading the IO across more > disks. My currently cluster has low CPU utilization but does spend a good > amount of time in iowait. Would moving to more smaller nodes help with > that? Or would I run into trouble with the smaller ram and cpu? > > Thanks! > > Paul
Re: Is there update-in-place on maps?
> As for the atomic increment, I take the answer is 'no, there is no atomic > increment, I have to pull the value to the client and send an update with the > new value'. Saying "atomic increment" is probably confusing. You cannot have Counters, the thing most people would think about when you say "increment", in a collection type. You can update the values in a map server side. If you can provide a concrete example of what you want to do it may be easier. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 6/08/2013, at 10:05 PM, Andy Twigg wrote: > Counters can be atomically incremented > (http://wiki.apache.org/cassandra/Counters). Pick a UUID for the counter, and > use that: c=map.get(k); c.incr() > > > On 6 August 2013 11:01, Jan Algermissen wrote: > > On 06.08.2013, at 11:36, Andy Twigg wrote: > > > Store pointers to counters as map values? > > Sorry, but this fits into nothing I know about C* so far - can you explain? > > Jan > > > > > -- > Dr Andy Twigg > Junior Research Fellow, St Johns College, Oxford > Room 351, Department of Computer Science > http://www.cs.ox.ac.uk/people/andy.twigg/ > andy.tw...@cs.ox.ac.uk | +447799647538
Re: Any good GUI based tool to manage data in Casandra?
I think on of the versions of ops centre has the feature http://www.datastax.com/what-we-offer/products-services/datastax-opscenter otherwise people use the cassandra-cli or cqlsh. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 7/08/2013, at 1:28 AM, Tony Anecito wrote: > Thanks Aaron. I found that before I asked the question and Helenos seems the > closest but it does not allow you to easily use CRUD like say SQL Server > Management tools where you can get a list of say 1,000 records in a grid > control and select rows for deletion or insert or update. > > I will look closer at that one since this is the reply from the team but if > users on this email list have other suggestions please do not hesitate to > reply. > > Many Thanks, > -Tony > > From: Aaron Morton > To: Cassandra User > Sent: Tuesday, August 6, 2013 1:38 AM > Subject: Re: Any good GUI based tool to manage data in Casandra? > > There is a list here. > > http://wiki.apache.org/cassandra/Administration%20Tools > > Cheers > > - > Aaron Morton > Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com/ > > On 3/08/2013, at 6:19 AM, Tony Anecito wrote: > >> Hi All, >> >> Is there a GUI tool for managing data in Cassandra database? I have google >> and seen tools but they seem to be schema management or explorer to just >> view data. IT would be great to delete/inset rows or update values for a >> column via GUI. >> >> Thanks, >> -Tony > > >
Re: System hints compaction stuck
Thrift and ClientState are both unrelated to hints. What do you see in the logs after "Started hinted handoff for host:..." from HintedHandoffManager? It should either have an error message or something along the lines of "Finished hinted handoff of:..." Where there any schema updates that preceded this happening? As for the thrift stuff, which rpc_server_type are you using? On Wed, Aug 7, 2013 at 6:14 AM, David McNelis wrote: > Morning folks, > > For the last couple of days all of my nodes (17, all running 1.2.8) have > been stuck at various percentages of completion for compacting system.hints. > I've tried restarting the nodes (including a full rolling restart of the > cluster) to no avail. > > When I turn on Debugging I am seeing this error on all of the nodes > constantly: > > DEBUG 09:03:21,999 Thrift transport error occurred during processing of > message. > org.apache.thrift.transport.TTransportException > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > When I turn on tracing, I see that shortly after this error there is a > message similar to: > TRACE 09:03:22,000 ClientState removed for socket addr /10.55.56.211:35431 > > The IP in this message is sometimes a client machine, sometimes another > cassandra node with no processes other than C* running on it (which I think > rules out an issue with a particular client library doing something funny > with Thrift). > > While I wouldn't expect a Thrift issue to cause problems with compaction, > I'm out of other ideas at the moment. Anyone have any thoughts they could > share? > > Thanks, > David
Re: Unable to bootstrap node
Thanks for the update :) A - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 7/08/2013, at 7:03 AM, sankalp kohli wrote: > @Aaron > This problem happens when you drop and recreate a keyspace with the same name > and you do it very quickly. I have also filed a JIRA for it > > https://issues.apache.org/jira/browse/CASSANDRA-5843 > > > On Tue, Aug 6, 2013 at 10:31 AM, Keith Wright wrote: > The file does not appear on disk and the permissions are definitely correct. > We have seen the file in snapshots. This is completely blocking us from > adding the new node. How can we recover? Just run repairs? > > Thanks > > From: Aaron Morton > Reply-To: "user@cassandra.apache.org" > Date: Tuesday, August 6, 2013 4:06 AM > To: "user@cassandra.apache.org" > Subject: Re: Unable to bootstrap node > >> Caused by: java.io.FileNotFoundException: >> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db >> (No such file or directory) >> at java.io.RandomAccessFile.open(Native Method) >> at java.io.RandomAccessFile.(RandomAccessFile.java:233) >> at >> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:67) >> at >> org.apache.cassandra.io.compress.CompressedRandomAccessReader. > This is somewhat serous, specially if it's from the a bug in dropping tables. > Though I would expect that would show up for a lot of people. > > Does the file exist on disk? > Are the permissions correct ? > > IMHO you need to address this issue on the existing nodes before worrying > about the new node. > > Cheers > > - > Aaron Morton > Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 6/08/2013, at 1:25 PM, sankalp kohli wrote: > >> Let me know if this fixes the problem? >> >> >> On Mon, Aug 5, 2013 at 6:24 PM, sankalp kohli wrote: >> So the problem is that when you dropped and recreated the table with the >> same name, some how the old CFStore object was not purged. So now there were >> two objects which caused same sstable to have 2 SSTableReader object. >> >> The fix is to find all nodes which is emitting this FileNotFound Exception >> and restart them. >> >> In your case, restart the node which is serving the data and emitting >> FileNotFound exception. >> >> Once this is up, again restart the bootstrapping node with bootstrap >> argument. Now it will successfully stream the data. >> >> >> On Mon, Aug 5, 2013 at 6:08 PM, Keith Wright wrote: >> >> Yes we likely dropped and recreated tables. If we stop the sending node, >> what will happen to the bootstrapping node? >> >> sankalp kohli wrote: >> >> Hi, >> The problem is that the node sending the stream is hitting this >> FileNotFound exception. You need to restart this node and it should fix the >> problem. >> >> Are you seeing lot of FileNotFoundExceptions? Did you do any schema change >> recently? >> >> Sankalp >> >> >> On Mon, Aug 5, 2013 at 5:39 PM, Keith Wright wrote: >> Hi all, >> >>I have been trying to bootstrap a new node into my 7 node 1.2.4 C* >> cluster with Vnodes RF3 with no luck. It gets close to completing and then >> the streaming just stalls with streaming at 99% from 1 or 2 nodes. >> Nodetool netstats shows the items that have yet to stream but the logs on >> the new node do not show any errors. I tried shutting down then node, >> clearing all data/commit logs/caches, and re-boot strapping with no luck. >> The nodes that are hanging sending the data only have the error below but >> that's related to compactions (see below) although it is one of the files >> that is waiting to be sent. I tried nodetool scrub on the column family >> with the missing item but got an error indicating it could not get a hard >> link. Any ideas? We were able to bootstrap one of the new nodes with no >> issues but this other one has been a real pain. Note that when the new node >> is joining the cluster, it does not appear in nodetool status. Is that >> expected? >> >> Thanks all, my next step is to try getting a new IP for this machine, my >> thought being that the cluster doesn't like me continuing to attempt to >> bootstrap the node repeatedly each time getting a new host id. >> >> [kwright@lxpcas008 ~]$ nodetool netstats | grep >> rts-40301_feedProducts-ib-1-Data.db >>rts: >> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db >> sections=73 progress=0/1884669 - 0% >> >> ERROR [ReadStage:427] 2013-08-05 23:23:29,294 CassandraDaemon.java (line >> 174) Exception in thread Thread[ReadStage:427,5,main] >> java.lang.RuntimeException: java.io.FileNotFoundException: >> /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db >> (No such file or directory) >> at >> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(Com
Re: System hints compaction stuck
Nate, We had a node that was flaking on us last week and had a lot of handoffs fail to that node. We ended up decommissioning that node entirely. I can't find the actual error we were getting at the time (logs have been rotated out), but currently we're not seeing any errors there. We haven't had any schema updates recently and we are using the sync rpc server. We had hsha turned on for a while, but we were getting a bunch of transport frame size errors. On Wed, Aug 7, 2013 at 1:55 PM, Nate McCall wrote: > Thrift and ClientState are both unrelated to hints. > > What do you see in the logs after "Started hinted handoff for > host:..." from HintedHandoffManager? > > It should either have an error message or something along the lines of > "Finished hinted handoff of:..." > > Where there any schema updates that preceded this happening? > > As for the thrift stuff, which rpc_server_type are you using? > > > > On Wed, Aug 7, 2013 at 6:14 AM, David McNelis wrote: > > Morning folks, > > > > For the last couple of days all of my nodes (17, all running 1.2.8) have > > been stuck at various percentages of completion for compacting > system.hints. > > I've tried restarting the nodes (including a full rolling restart of the > > cluster) to no avail. > > > > When I turn on Debugging I am seeing this error on all of the nodes > > constantly: > > > > DEBUG 09:03:21,999 Thrift transport error occurred during processing of > > message. > > org.apache.thrift.transport.TTransportException > > at > > > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > > at > > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > > at > > > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > > at > > > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > > at > > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > > at > > > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > > at > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) > > at > > > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:724) > > > > > > When I turn on tracing, I see that shortly after this error there is a > > message similar to: > > TRACE 09:03:22,000 ClientState removed for socket addr / > 10.55.56.211:35431 > > > > The IP in this message is sometimes a client machine, sometimes another > > cassandra node with no processes other than C* running on it (which I > think > > rules out an issue with a particular client library doing something funny > > with Thrift). > > > > While I wouldn't expect a Thrift issue to cause problems with compaction, > > I'm out of other ideas at the moment. Anyone have any thoughts they > could > > share? > > > > Thanks, > > David >
Re: Any good GUI based tool to manage data in Casandra?
OpsCenter allows CRUD of column families themselves (although not CQL3 column families). It only allows viewing the data inside column families though, no support for writing or updating. On Wed, Aug 7, 2013 at 12:54 PM, Aaron Morton wrote: > I think on of the versions of ops centre has the feature > http://www.datastax.com/what-we-offer/products-services/datastax-opscenter > > otherwise people use the cassandra-cli or cqlsh. > > Cheers > > - > Aaron Morton > Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 7/08/2013, at 1:28 AM, Tony Anecito wrote: > > Thanks Aaron. I found that before I asked the question and Helenos seems > the closest but it does not allow you to easily use CRUD like say SQL > Server Management tools where you can get a list of say 1,000 records in a > grid control and select rows for deletion or insert or update. > > I will look closer at that one since this is the reply from the team but > if users on this email list have other suggestions please do not hesitate > to reply. > > Many Thanks, > -Tony > > *From:* Aaron Morton > *To:* Cassandra User > *Sent:* Tuesday, August 6, 2013 1:38 AM > *Subject:* Re: Any good GUI based tool to manage data in Casandra? > > There is a list here. > > http://wiki.apache.org/cassandra/Administration%20Tools > > Cheers > > - > Aaron Morton > Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com/ > > On 3/08/2013, at 6:19 AM, Tony Anecito wrote: > > Hi All, > > Is there a GUI tool for managing data in Cassandra database? I have google > and seen tools but they seem to be schema management or explorer to just > view data. IT would be great to delete/inset rows or update values for a > column via GUI. > > Thanks, > -Tony > > > > > >
Re: Any good GUI based tool to manage data in Casandra?
Thanks Nick for your reply. Good to know that. I knew OpsCenter was mainly schema management. Best Regards, -Tony From: Nick Bailey To: user@cassandra.apache.org Sent: Wednesday, August 7, 2013 12:04 PM Subject: Re: Any good GUI based tool to manage data in Casandra? OpsCenter allows CRUD of column families themselves (although not CQL3 column families). It only allows viewing the data inside column families though, no support for writing or updating. On Wed, Aug 7, 2013 at 12:54 PM, Aaron Morton wrote: I think on of the versions of ops centre has the feature http://www.datastax.com/what-we-offer/products-services/datastax-opscenter > > >otherwise people use the cassandra-cli or cqlsh. > > >Cheers > > >- >Aaron Morton >Cassandra Consultant >New Zealand > > >@aaronmorton >http://www.thelastpickle.com > >On 7/08/2013, at 1:28 AM, Tony Anecito wrote: > >Thanks Aaron. I found that before I asked the question and Helenos seems the >closest but it does not allow you to easily use CRUD like say SQL Server >Management tools where you can get a list of say 1,000 records in a grid >control and select rows for deletion or insert or update. >> >>I will look closer at that one since this is the reply from the team but if >>users on this email list have other suggestions please do not hesitate to >>reply. >> >>Many Thanks, >>-Tony >> >> >>From: Aaron Morton >>To: Cassandra User >>Sent: Tuesday, August 6, 2013 1:38 AM >>Subject: Re: Any good GUI based tool to manage data in Casandra? >> >> >> >>There is a list here. >> >> >>http://wiki.apache.org/cassandra/Administration%20Tools >> >> >>Cheers >> >> >>- >>Aaron Morton >>Cassandra Consultant >>New Zealand >> >> >>@aaronmorton >>http://www.thelastpickle.com/ >> >>On 3/08/2013, at 6:19 AM, Tony Anecito wrote: >> >>Hi All, >>> >>> >>>Is there a GUI tool for managing data in Cassandra database? I have google >>>and seen tools but they seem to be schema management or explorer to just >>>view data. IT would be great to delete/inset rows or update values for a >>>column via GUI. >>> >>> >>>Thanks, >>>-Tony >>> >> >> >> >
Re: System hints compaction stuck
Is there anything else on the network that could be attempting to connect to 9160? That is the exact error you would get when someone initiates a connection and sends a null byte. You can reproduce it thusly: echo -n 'm' | nc localhost 9160 On Wed, Aug 7, 2013 at 11:11 AM, David McNelis wrote: > Nate, > > We had a node that was flaking on us last week and had a lot of handoffs > fail to that node. We ended up decommissioning that node entirely. I can't > find the actual error we were getting at the time (logs have been rotated > out), but currently we're not seeing any errors there. > > We haven't had any schema updates recently and we are using the sync rpc > server. We had hsha turned on for a while, but we were getting a bunch of > transport frame size errors. > > > On Wed, Aug 7, 2013 at 1:55 PM, Nate McCall wrote: >> >> Thrift and ClientState are both unrelated to hints. >> >> What do you see in the logs after "Started hinted handoff for >> host:..." from HintedHandoffManager? >> >> It should either have an error message or something along the lines of >> "Finished hinted handoff of:..." >> >> Where there any schema updates that preceded this happening? >> >> As for the thrift stuff, which rpc_server_type are you using? >> >> >> >> On Wed, Aug 7, 2013 at 6:14 AM, David McNelis wrote: >> > Morning folks, >> > >> > For the last couple of days all of my nodes (17, all running 1.2.8) have >> > been stuck at various percentages of completion for compacting >> > system.hints. >> > I've tried restarting the nodes (including a full rolling restart of the >> > cluster) to no avail. >> > >> > When I turn on Debugging I am seeing this error on all of the nodes >> > constantly: >> > >> > DEBUG 09:03:21,999 Thrift transport error occurred during processing of >> > message. >> > org.apache.thrift.transport.TTransportException >> > at >> > >> > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) >> > at >> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) >> > at >> > >> > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) >> > at >> > >> > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) >> > at >> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) >> > at >> > >> > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) >> > at >> > >> > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) >> > at >> > >> > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) >> > at >> > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) >> > at >> > >> > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) >> > at >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> > at >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> > at java.lang.Thread.run(Thread.java:724) >> > >> > >> > When I turn on tracing, I see that shortly after this error there is a >> > message similar to: >> > TRACE 09:03:22,000 ClientState removed for socket addr >> > /10.55.56.211:35431 >> > >> > The IP in this message is sometimes a client machine, sometimes another >> > cassandra node with no processes other than C* running on it (which I >> > think >> > rules out an issue with a particular client library doing something >> > funny >> > with Thrift). >> > >> > While I wouldn't expect a Thrift issue to cause problems with >> > compaction, >> > I'm out of other ideas at the moment. Anyone have any thoughts they >> > could >> > share? >> > >> > Thanks, >> > David > >
Re: System hints compaction stuck
Fwiw, similar to another issue of stuck compaction that was on the list several days ago, if I cleared out the hints, either by removing files while node was down, or running a scrub on system.hints during node startup, I was able to get these compactions cleared, an the nodes are starting to get caught up on tasks that had been blocked. Nate, there are definiately a number of things that could be hitting the 9160 port... but I was seeing the transport size error even between nodes (and there was nothing runnining on any node other than C*)... switching back to sync and no longer get that error. On Wed, Aug 7, 2013 at 2:58 PM, Nate McCall wrote: > Is there anything else on the network that could be attempting to > connect to 9160? > > That is the exact error you would get when someone initiates a > connection and sends a null byte. You can reproduce it thusly: > echo -n 'm' | nc localhost 9160 > > > On Wed, Aug 7, 2013 at 11:11 AM, David McNelis wrote: > > Nate, > > > > We had a node that was flaking on us last week and had a lot of handoffs > > fail to that node. We ended up decommissioning that node entirely. I > can't > > find the actual error we were getting at the time (logs have been rotated > > out), but currently we're not seeing any errors there. > > > > We haven't had any schema updates recently and we are using the sync rpc > > server. We had hsha turned on for a while, but we were getting a bunch > of > > transport frame size errors. > > > > > > On Wed, Aug 7, 2013 at 1:55 PM, Nate McCall wrote: > >> > >> Thrift and ClientState are both unrelated to hints. > >> > >> What do you see in the logs after "Started hinted handoff for > >> host:..." from HintedHandoffManager? > >> > >> It should either have an error message or something along the lines of > >> "Finished hinted handoff of:..." > >> > >> Where there any schema updates that preceded this happening? > >> > >> As for the thrift stuff, which rpc_server_type are you using? > >> > >> > >> > >> On Wed, Aug 7, 2013 at 6:14 AM, David McNelis > wrote: > >> > Morning folks, > >> > > >> > For the last couple of days all of my nodes (17, all running 1.2.8) > have > >> > been stuck at various percentages of completion for compacting > >> > system.hints. > >> > I've tried restarting the nodes (including a full rolling restart of > the > >> > cluster) to no avail. > >> > > >> > When I turn on Debugging I am seeing this error on all of the nodes > >> > constantly: > >> > > >> > DEBUG 09:03:21,999 Thrift transport error occurred during processing > of > >> > message. > >> > org.apache.thrift.transport.TTransportException > >> > at > >> > > >> > > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > >> > at > >> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > >> > at > >> > > >> > > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > >> > at > >> > > >> > > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > >> > at > >> > org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > >> > at > >> > > >> > > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) > >> > at > >> > > >> > > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) > >> > at > >> > > >> > > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) > >> > at > >> > org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) > >> > at > >> > > >> > > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) > >> > at > >> > > >> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >> > at > >> > > >> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >> > at java.lang.Thread.run(Thread.java:724) > >> > > >> > > >> > When I turn on tracing, I see that shortly after this error there is a > >> > message similar to: > >> > TRACE 09:03:22,000 ClientState removed for socket addr > >> > /10.55.56.211:35431 > >> > > >> > The IP in this message is sometimes a client machine, sometimes > another > >> > cassandra node with no processes other than C* running on it (which I > >> > think > >> > rules out an issue with a particular client library doing something > >> > funny > >> > with Thrift). > >> > > >> > While I wouldn't expect a Thrift issue to cause problems with > >> > compaction, > >> > I'm out of other ideas at the moment. Anyone have any thoughts they > >> > could > >> > share? > >> > > >> > Thanks, > >> > David > > > > >
Re: Large number of pending gossip stage tasks in nodetool tpstats
> When looking at nodetool > gossipinfo, I notice that this node has updated to the latest schema hash, but > that it thinks other nodes in the cluster are on the older version. What does describe cluster in cassandra-cli say ? It will let you know if there are multiple schema versions in the cluster. Can you include the output from nodetool gossipinfo ? You may also get some value from increase the log level for org.apache.cassandra.gms.Gossiper to DEBUG so you can see what's going on. It's unusual for only the gossip pool to backup. If there were issues with GC taking CPU we would expect to see it across the board. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 7/08/2013, at 7:52 AM, Faraaz Sareshwala wrote: > I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. All > nodes are well behaved except one. Whenever I start this node, it starts > churning CPU. Running nodetool tpstats, I notice that the number of pending > gossip stage tasks is constantly increasing [1]. When looking at nodetool > gossipinfo, I notice that this node has updated to the latest schema hash, but > that it thinks other nodes in the cluster are on the older version. I've tried > to drain, decommission, wipe node data, bootstrap, and repair the node. > However, > the node just started doing the same thing again. > > Has anyone run into this issue before? Can anyone provide any insight into why > this node is the only one in the cluster having problems? Are there any easy > fixes? > > Thank you, > Faraaz > > [1] $ /cassandra/bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0 8 0 > 0 > RequestResponseStage 0 0 49198 0 > 0 > MutationStage 0 0 224286 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 1 2213 18 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MigrationStage0 0 72 0 > 0 > MemtablePostFlusher 0 0102 0 > 0 > FlushWriter 0 0 99 0 > 0 > MiscStage 0 0 0 0 > 0 > commitlog_archiver0 0 0 0 > 0 > InternalResponseStage 0 0 19 0 > 0 > HintedHandoff 0 0 2 0 > 0 > > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > BINARY 0 > READ 0 > MUTATION 0 > _TRACE 0 > REQUEST_RESPONSE 0
Re: cassandra disk access
Some background on the read and write paths, some of the extra details are a little out of date but mostly correct in 1.2 http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118/40 http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 7/08/2013, at 9:07 PM, Michał Michalski wrote: > I'm not sure how accurate it is (it's from 2011, one of its sources is from > 2010), but I'm pretty sure it's more or less OK: > > http://blog.csdn.net/firecoder/article/details/7019435 > > M. > > W dniu 07.08.2013 10:34, Nikolay Mihaylov pisze: >> thanks >> >> It will use the Index Sample (RAM) first, then it will use "full" Index >> (disk) and finally it will read data from SSTable (disk). There's no such >> thing like "collision" in this case. >> >> so it still have 2 seeks :) >> >> where I can see the internal structure of the sstable i tried to find it >> documented but was unable to find anything ? >> >> >> >> >> On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski wrote: >> >>> >>> 2. when cassandra lookups a key in sstable (assuming bloom-filter and other "stuff" failed, also assuming the key is located in this single sstable), cassandra DO NOT USE sequential I/O. "She" probably will read the hash-table slot or similar structure, then cassandra will do another disk seek in order to get the value (and probably the key). Also probably there will need another seek, if there is key collision there will need additional seeks. >>> >>> It will use the Index Sample (RAM) first, then it will use "full" Index >>> (disk) and finally it will read data from SSTable (disk). There's no such >>> thing like "collision" in this case. >>> >>> >>> 3. once the data (e.g. the row) is located, a sequential read for entire row will occur. (Once again I assume there is single well compacted sstable). Also if disk is not fragmented, the data will be placed on disk sectors one after the other. >>> >>> Yes, this is how I understand it too. >>> >>> M. >>> >>> >> >
Re: Large number of pending gossip stage tasks in nodetool tpstats
Thanks Aaron. The node that was behaving this way was a production node so I had to take some drastic measures to get it back to doing the right thing. It's no longer behaving this way after wiping the system tables and having cassandra resync the schema from other nodes. In hindsight, maybe I could have gotten away with a nodetool resetlocalschema. Since the node has been restored to a working state, I sadly can't run commands on it to investigate any longer. When the node was in this hosed state, I did check nodetool gossipinfo. The bad node had the correct schema hash; the same as the rest of the nodes in the cluster. However, it thought every other node in the cluster had another schema hash, most likely the older one everyone migrated from. This issue occurred again today on three machines so I feel it may occur again. Typically I see it when our entire datacenter updates it's configuration and restarts along an hour. All nodes point to the same list of seeds, but the restart order is random across one your. I'm not sure if this information helps at all. Are there any specific things I should look for when it does occur again? Thank you, Faraaz On Aug 7, 2013, at 7:23 PM, "Aaron Morton" wrote: >> When looking at nodetool >> gossipinfo, I notice that this node has updated to the latest schema hash, >> but >> that it thinks other nodes in the cluster are on the older version. > What does describe cluster in cassandra-cli say ? It will let you know if > there are multiple schema versions in the cluster. > > Can you include the output from nodetool gossipinfo ? > > You may also get some value from increase the log level for > org.apache.cassandra.gms.Gossiper to DEBUG so you can see what's going on. > It's unusual for only the gossip pool to backup. If there were issues with GC > taking CPU we would expect to see it across the board. > > Cheers > > > > - > Aaron Morton > Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 7/08/2013, at 7:52 AM, Faraaz Sareshwala wrote: > >> I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. >> All >> nodes are well behaved except one. Whenever I start this node, it starts >> churning CPU. Running nodetool tpstats, I notice that the number of pending >> gossip stage tasks is constantly increasing [1]. When looking at nodetool >> gossipinfo, I notice that this node has updated to the latest schema hash, >> but >> that it thinks other nodes in the cluster are on the older version. I've >> tried >> to drain, decommission, wipe node data, bootstrap, and repair the node. >> However, >> the node just started doing the same thing again. >> >> Has anyone run into this issue before? Can anyone provide any insight into >> why >> this node is the only one in the cluster having problems? Are there any easy >> fixes? >> >> Thank you, >> Faraaz >> >> [1] $ /cassandra/bin/nodetool tpstats >> Pool NameActive Pending Completed Blocked All >> time blocked >> ReadStage 0 0 8 0 >> 0 >> RequestResponseStage 0 0 49198 0 >> 0 >> MutationStage 0 0 224286 0 >> 0 >> ReadRepairStage 0 0 0 0 >> 0 >> ReplicateOnWriteStage 0 0 0 0 >> 0 >> GossipStage 1 2213 18 0 >> 0 >> AntiEntropyStage 0 0 0 0 >> 0 >> MigrationStage0 0 72 0 >> 0 >> MemtablePostFlusher 0 0102 0 >> 0 >> FlushWriter 0 0 99 0 >> 0 >> MiscStage 0 0 0 0 >> 0 >> commitlog_archiver0 0 0 0 >> 0 >> InternalResponseStage 0 0 19 0 >> 0 >> HintedHandoff 0 0 2 0 >> 0 >> >> Message type Dropped >> RANGE_SLICE 0 >> READ_REPAIR 0 >> BINARY 0 >> READ 0 >> MUTATION 0 >> _TRACE 0 >> REQUEST_RESPONSE 0 >
Re: Large number of pending gossip stage tasks in nodetool tpstats
And by that last statement, I mean are there any further things I should look for given the information in my response? I'll definitely look at implementing your suggestions and see what I can find. On Aug 7, 2013, at 7:31 PM, "Faraaz Sareshwala" wrote: > Thanks Aaron. The node that was behaving this way was a production node so I > had to take some drastic measures to get it back to doing the right thing. > It's no longer behaving this way after wiping the system tables and having > cassandra resync the schema from other nodes. In hindsight, maybe I could > have gotten away with a nodetool resetlocalschema. Since the node has been > restored to a working state, I sadly can't run commands on it to investigate > any longer. > > When the node was in this hosed state, I did check nodetool gossipinfo. The > bad node had the correct schema hash; the same as the rest of the nodes in > the cluster. However, it thought every other node in the cluster had another > schema hash, most likely the older one everyone migrated from. > > This issue occurred again today on three machines so I feel it may occur > again. Typically I see it when our entire datacenter updates it's > configuration and restarts along an hour. All nodes point to the same list of > seeds, but the restart order is random across one your. I'm not sure if this > information helps at all. > > Are there any specific things I should look for when it does occur again? > > Thank you, > Faraaz > > On Aug 7, 2013, at 7:23 PM, "Aaron Morton" wrote: > >>> When looking at nodetool >>> gossipinfo, I notice that this node has updated to the latest schema hash, >>> but >>> that it thinks other nodes in the cluster are on the older version. >> What does describe cluster in cassandra-cli say ? It will let you know if >> there are multiple schema versions in the cluster. >> >> Can you include the output from nodetool gossipinfo ? >> >> You may also get some value from increase the log level for >> org.apache.cassandra.gms.Gossiper to DEBUG so you can see what's going on. >> It's unusual for only the gossip pool to backup. If there were issues with >> GC taking CPU we would expect to see it across the board. >> >> Cheers >> >> >> >> - >> Aaron Morton >> Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 7/08/2013, at 7:52 AM, Faraaz Sareshwala >> wrote: >> >>> I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. >>> All >>> nodes are well behaved except one. Whenever I start this node, it starts >>> churning CPU. Running nodetool tpstats, I notice that the number of pending >>> gossip stage tasks is constantly increasing [1]. When looking at nodetool >>> gossipinfo, I notice that this node has updated to the latest schema hash, >>> but >>> that it thinks other nodes in the cluster are on the older version. I've >>> tried >>> to drain, decommission, wipe node data, bootstrap, and repair the node. >>> However, >>> the node just started doing the same thing again. >>> >>> Has anyone run into this issue before? Can anyone provide any insight into >>> why >>> this node is the only one in the cluster having problems? Are there any easy >>> fixes? >>> >>> Thank you, >>> Faraaz >>> >>> [1] $ /cassandra/bin/nodetool tpstats >>> Pool NameActive Pending Completed Blocked All >>> time blocked >>> ReadStage 0 0 8 0 >>>0 >>> RequestResponseStage 0 0 49198 0 >>>0 >>> MutationStage 0 0 224286 0 >>>0 >>> ReadRepairStage 0 0 0 0 >>>0 >>> ReplicateOnWriteStage 0 0 0 0 >>>0 >>> GossipStage 1 2213 18 0 >>>0 >>> AntiEntropyStage 0 0 0 0 >>>0 >>> MigrationStage0 0 72 0 >>>0 >>> MemtablePostFlusher 0 0102 0 >>>0 >>> FlushWriter 0 0 99 0 >>>0 >>> MiscStage 0 0 0 0 >>>0 >>> commitlog_archiver0 0 0 0 >>>0 >>> InternalResponseStage 0 0 19 0 >>>0 >>> HintedHandoff 0 0 2 0 >>>0 >>> >>> Message type Dropped >>> RANGE_SLICE 0 >>> READ_REPAIR 0 >>> BINARY 0 >>> READ 0
Re: Is there update-in-place on maps?
On Wed, Aug 7, 2013 at 10:47 AM, Aaron Morton wrote: > As for the atomic increment, I take the answer is 'no, there is no atomic > increment, I have to pull the value to the client and send an update with > the new value'. > > Saying "atomic increment" is probably confusing. > You cannot have Counters, the thing most people would think about when you > say "increment", in a collection type. > > You can update the values in a map server side. > > If you can provide a concrete example of what you want to do it may be > easier. > > I think the OP is asking if the following op is atomic: UPDATE users SET favs['posts'] = favs['post'] + 1 WHERE id = 'smith' :- a) Cheers > > - > Aaron Morton > Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 6/08/2013, at 10:05 PM, Andy Twigg wrote: > > Counters can be atomically incremented ( > http://wiki.apache.org/cassandra/Counters). Pick a UUID for the counter, > and use that: c=map.get(k); c.incr() > > > On 6 August 2013 11:01, Jan Algermissen wrote: > >> >> On 06.08.2013, at 11:36, Andy Twigg wrote: >> >> > Store pointers to counters as map values? >> >> Sorry, but this fits into nothing I know about C* so far - can you >> explain? >> >> Jan >> >> > > > -- > Dr Andy Twigg > Junior Research Fellow, St Johns College, Oxford > Room 351, Department of Computer Science > http://www.cs.ox.ac.uk/people/andy.twigg/ > andy.tw...@cs.ox.ac.uk | +447799647538 > > > -- :- a) Alex Popescu Sen. Product Manager @ DataStax @al3xandru