Re: Datastructure time tracking
Thank you very much. This was very helpfull. I'll post an update here when I managed to finish my datastructure design. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Datastructure-time-tracking-tp7005672p7011370.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What sort of load do the tombstones create on the cluster?
What do you mean "performance loss"? For example are you seeing it on the read or write side? During compactions? Are deletions them selves expensive (they shouldn't be) but if you have a lot of tombstones that haven't been compacted away that will make reads slower since there is more data to scan. One thing to try is kicking of major compactions more often so they're smaller (less load) and clean out the deleted data more often. You should be able to tell if it is disk or CPU pretty easily via the JMX interface (jconsole or OpsCenter can read those values) or something like iostat. Basically look for high disk IO wait... if you see that it is disk. If not, it's CPU. One optimization I'm doing in my application is choosing row keys so that I can delete an entire row at a time rather then individual columns so there is only one tombstone for the whole row. This isn't always possible, but if you can layout your data in a way that makes this possible, it's a good optimization. On Thu, Nov 17, 2011 at 10:01 AM, Maxim Potekhin wrote: > In view of my unpleasant discovery last week that deletions in Cassandra > lead to a very real > and serious performance loss, I'm working on a strategy of moving forward. > > If the tombstones do cause such problem, where should I be looking for > performance bottlenecks? > Is it disk, CPU or something else? Thing is, I don't see anything > outstanding in my Ganglia plots. > > TIA, > > Maxim > > -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"
storage space and compaction speed
I recently changed the default_validation_class on a bunch of CFs from BytesType to UTF8Type and I observed two things: first I saw a number of compactions during the migration that showed ~200% to ~400% of original in the log entry. Second, it seems that compaction speed has now halved. I'm using v1.0.1, level compaction and compression. Before I create tests I thought I'd quickly ask: is there any difference in storage efficiency between BytesType, UTF8Type, and AsciiType when storing plain us-ascii strings? And is there any expected compaction speed difference? (It would be nice to have some docs about the expected storage space used for the various data types.) Thanks much! Thorsten
Re: storage space and compaction speed
I'm guessing something else is responsible for the compaction difference you're seeing -- Bytes, UTF8, and Ascii types all use the same lexical byte comparison code. The only place you should expect to lose a small amount of performance by using the latter two is on insert when it sanity-checks the input. On Sat, Nov 19, 2011 at 12:43 PM, Thorsten von Eicken wrote: > I recently changed the default_validation_class on a bunch of CFs from > BytesType to UTF8Type and I observed two things: first I saw a number of > compactions during the migration that showed ~200% to ~400% of original > in the log entry. Second, it seems that compaction speed has now halved. > I'm using v1.0.1, level compaction and compression. Before I create > tests I thought I'd quickly ask: is there any difference in storage > efficiency between BytesType, UTF8Type, and AsciiType when storing plain > us-ascii strings? And is there any expected compaction speed difference? > (It would be nice to have some docs about the expected storage space > used for the various data types.) > Thanks much! > Thorsten > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Added column does not sort as the last column
Hi, We got "Added column does not sort as the last column" error in the logs after upgrading to cass 1.0.3 from 0.6.13. After running scrub, we still getting the error. Here is stack trace: java.lang.AssertionError: Added column does not sort as the last column at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:126) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:122) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:117) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:147) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:231) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:115) at org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:102) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:127) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:132) at org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Does anyone have any idea what might be causing this issue?We use CompositeType class from https://github.com/edanuff/CassandraCompositeType/commit/a584bf2dadd3e6bb6071db7cf181e1546d8c93db. Would it have anything to do with the error? Thanks! Huy -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Added-column-does-not-sort-as-the-last-column-tp7012104p7012104.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Causes of a High Memtable Live Ratio
Hi All, >From what I've read in the source, a Memtable's "live ratio" is the ratio of >Memtable usage to the current write throughput. If this is too high, I >imagine the system could be in a possibly unsafe state, as the comment in >Memtable.java indicates. Today, while bulk loading some data, I got the following message: WARN [pool-1-thread-1] 2011-11-18 21:08:57,331 Memtable.java (line 172) setting live ratio to maximum of 64 instead of 78.87903667214012 Should I be worried? If so, does anybody have any suggestions for how to address it? Thanks :) Caleb Rackliffe | Software Developer M 949.981.0159 | ca...@steelhouse.com [cid:88029BB7-C464-45DA-94B5-B7188AED39A7] <>
Re: split large sstable
Dne 17.11.2011 17:42, Dan Hendry napsal(a): What do you mean by ' better file offset caching'? Presumably you mean 'better page cache hit rate'? fs metadata used to find blocks in smaller files are cached better. Large files are using indirect blocks and you need more reads to find correct block during seek syscall. For example if large file is using 3 indirect levels, you need 3xdisk seek to find correct block. http://computer-forensics.sans.org/blog/2008/12/24/understanding-indirect-blocks-in-unix-file-systems/ Metadata caching in OS is far worse then file caching - one "find /" will effectively nullify metadata cache. If cassandra could use raw storage. it will eliminate fs overhead and it could be over 100% faster on reads because fragmentation will be an exception - no need to design fs like FAT or UFS where designers expects files to be stored in non continuous area on disk. Implementing something log based like - http://logfs.sourceforge.net/ will be enough. Cleaning will not be much needed because compaction will clean it naturally. Perhaps what you are actually seeing is row fragmentation across your SSTables? Easy to check with nodetool cfhistograms (SSTables column). i have 1.5% hitrate to 2 sstables and 3% to hit 3 sstables. Its pretty low with min. compaction set to 5, i will probably set it to 6. I would really like to see tests with user defined sizes and file counts used for tiered compaction because it work best if you do not leave largest file alone in bucket. If your data in cassandra are not growing, it can be better fine tuned. i havent done experiments with it but maybe max sstable size defined per cf will be enough. Lets say i have 5 GB data per CF - ideal setting will be max sstable size to slightly less then 1 GB. Cassandra will not keep old data stuck in one 4 GB compacted sstable waiting for other 4 GB sstables to be created before compaction will remove old data. To answer your question, I know of no tools to split SSTables. If you want to switch compaction strategies, levelled compaction (1.0.x) creates many smaller sstables instead of fewer, bigger ones. I dont use levelled compaction, it compacts too often. It might get better if it can be tuned how many and how large files to use at each level. But i will try to switch to levelled compaction and back again it might do what i want.
read performance problem
Hi, On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am testing the performance of Cassandra. The write performance is good: It can write a million records in 10 minutes. However, the query performance is poor and it takes 10 minutes to read 10K records with sequential keys from 0 to (about 100 QPS). This is far away from the 3,xxx QPS found on the net. Cassandra decided to use 1G as the Java heap size which seems to be fine as at the end of the benchmark the swap was barely used (only 1M used). I understand that my computer may be not as powerful as those used in the other benchmarks, but it shouldn't be that far off (1:30), right? Any suggestion? Thanks in advance!
Re: read performance problem
Try to see if there is a lot of paging going on, and run some benchmarks on the disk itself. Are you running Windows or Linux? Do you think the disk may be fragmented? Maxim On 11/19/2011 8:58 PM, Kent Tong wrote: Hi, On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am testing the performance of Cassandra. The write performance is good: It can write a million records in 10 minutes. However, the query performance is poor and it takes 10 minutes to read 10K records with sequential keys from 0 to (about 100 QPS). This is far away from the 3,xxx QPS found on the net. Cassandra decided to use 1G as the Java heap size which seems to be fine as at the end of the benchmark the swap was barely used (only 1M used). I understand that my computer may be not as powerful as those used in the other benchmarks, but it shouldn't be that far off (1:30), right? Any suggestion? Thanks in advance!