Re: Datastructure time tracking

2011-11-19 Thread RobinUs2
Thank you very much. This was very helpfull. I'll post an update here when I
managed to finish my datastructure design.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Datastructure-time-tracking-tp7005672p7011370.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What sort of load do the tombstones create on the cluster?

2011-11-19 Thread Aaron Turner
What do you mean "performance loss"?  For example are you seeing it on
the read or write side?  During compactions? Are deletions them selves
expensive (they shouldn't be) but if you have a lot of tombstones that
haven't been compacted away that will make reads slower since there is
more data to scan.  One thing to try is kicking of major compactions
more often so they're smaller (less load) and clean out the deleted
data more often.

You should be able to tell if it is disk or CPU pretty easily via the
JMX interface (jconsole or OpsCenter can read those values) or
something like iostat.  Basically look for high disk IO wait... if you
see that it is disk.  If not, it's CPU.

One optimization I'm doing in my application is choosing row keys so
that I can delete an entire row at a time rather then individual
columns so there is only one tombstone for the whole row.  This isn't
always possible, but if you can layout your data in a way that makes
this possible, it's a good optimization.



On Thu, Nov 17, 2011 at 10:01 AM, Maxim Potekhin  wrote:
> In view of my unpleasant discovery last week that deletions in Cassandra
> lead to a very real
> and serious performance loss, I'm working on a strategy of moving forward.
>
> If the tombstones do cause such problem, where should I be looking for
> performance bottlenecks?
> Is it disk, CPU or something else? Thing is, I don't see anything
> outstanding in my Ganglia plots.
>
> TIA,
>
> Maxim
>
>



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"


storage space and compaction speed

2011-11-19 Thread Thorsten von Eicken
I recently changed the default_validation_class on a bunch of CFs from
BytesType to UTF8Type and I observed two things: first I saw a number of
compactions during the migration that showed ~200% to ~400% of original
in the log entry. Second, it seems that compaction speed has now halved.
I'm using v1.0.1, level compaction and compression. Before I create
tests I thought I'd quickly ask: is there any difference in storage
efficiency between BytesType, UTF8Type, and AsciiType when storing plain
us-ascii strings? And is there any expected compaction speed difference?
(It would be nice to have some docs about the expected storage space
used for the various data types.)
Thanks much!
Thorsten


Re: storage space and compaction speed

2011-11-19 Thread Jonathan Ellis
I'm guessing something else is responsible for the compaction
difference you're seeing -- Bytes, UTF8, and Ascii types all use the
same lexical byte comparison code.  The only place you should expect
to lose a small amount of performance by using the latter two is on
insert when it sanity-checks the input.

On Sat, Nov 19, 2011 at 12:43 PM, Thorsten von Eicken
 wrote:
> I recently changed the default_validation_class on a bunch of CFs from
> BytesType to UTF8Type and I observed two things: first I saw a number of
> compactions during the migration that showed ~200% to ~400% of original
> in the log entry. Second, it seems that compaction speed has now halved.
> I'm using v1.0.1, level compaction and compression. Before I create
> tests I thought I'd quickly ask: is there any difference in storage
> efficiency between BytesType, UTF8Type, and AsciiType when storing plain
> us-ascii strings? And is there any expected compaction speed difference?
> (It would be nice to have some docs about the expected storage space
> used for the various data types.)
> Thanks much!
> Thorsten
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Added column does not sort as the last column

2011-11-19 Thread huyle
Hi,

We got "Added column does not sort as the last column" error in the logs
after upgrading to cass 1.0.3 from 0.6.13.  After running scrub, we still
getting the error.   

Here is stack trace:

java.lang.AssertionError: Added column does not sort as the last column
at
org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:126)
at
org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:122)
at
org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:117)
at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:147)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:231)
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:115)
at
org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:102)
at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:127)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:102)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:87)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:172)
at
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:132)
at
org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


Does anyone have any idea what might be causing this issue?We use
CompositeType class from
https://github.com/edanuff/CassandraCompositeType/commit/a584bf2dadd3e6bb6071db7cf181e1546d8c93db.
 
Would it have anything to do with the error?  Thanks!

Huy

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Added-column-does-not-sort-as-the-last-column-tp7012104p7012104.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Causes of a High Memtable Live Ratio

2011-11-19 Thread Caleb Rackliffe
Hi All,

>From what I've read in the source, a Memtable's "live ratio" is the ratio of 
>Memtable usage to the current write throughput.  If this is too high, I 
>imagine the system could be in a possibly unsafe state, as the comment in 
>Memtable.java indicates.

Today, while bulk loading some data, I got the following message:

WARN [pool-1-thread-1] 2011-11-18 21:08:57,331 Memtable.java (line 172) setting 
live ratio to maximum of 64 instead of 78.87903667214012

Should I be worried?  If so, does anybody have any suggestions for how to 
address it?

Thanks :)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:88029BB7-C464-45DA-94B5-B7188AED39A7]
<>

Re: split large sstable

2011-11-19 Thread Radim Kolar

Dne 17.11.2011 17:42, Dan Hendry napsal(a):

What do you mean by ' better file offset caching'? Presumably you mean
'better page cache hit rate'?
fs metadata used to find blocks in smaller files are cached better. 
Large files are using indirect blocks and you need more reads to find 
correct block during seek syscall. For example if large file is using 3 
indirect levels, you need 3xdisk seek to find correct block. 
http://computer-forensics.sans.org/blog/2008/12/24/understanding-indirect-blocks-in-unix-file-systems/ 
Metadata caching in OS is far worse then file caching - one "find /" 
will effectively nullify metadata cache.


If cassandra could use raw storage. it will eliminate fs overhead and it 
could be over 100% faster on reads because fragmentation will be an 
exception - no need to design fs like FAT or UFS where designers expects 
files to be stored in non continuous area on disk.  Implementing 
something log based like - http://logfs.sourceforge.net/ will be enough. 
Cleaning will not be much needed because compaction will clean it naturally.



Perhaps what you are actually seeing is row fragmentation across your
SSTables? Easy to check with nodetool cfhistograms (SSTables column).
i have 1.5% hitrate to 2 sstables and 3% to hit 3 sstables. Its pretty 
low with min. compaction set to 5, i will probably set it to 6.


I would really like to see tests with user defined sizes and file counts 
used for tiered compaction because it work best if you do not leave 
largest file alone in bucket. If your data in cassandra are not growing, 
it can be better fine tuned. i havent done experiments with it but maybe 
max sstable size defined per cf will be enough. Lets say i have 5 GB 
data per CF - ideal setting will be max sstable size to slightly less 
then 1 GB. Cassandra will not keep old data stuck in one 4 GB compacted 
sstable waiting for other 4 GB sstables to be created before compaction 
will remove old data.



To answer your question, I know of no tools to split SSTables. If you want
to switch compaction strategies, levelled compaction (1.0.x) creates many
smaller sstables instead of fewer, bigger ones.
I dont use levelled compaction, it compacts too often. It might get 
better if it can be tuned how many and how large files to use at each 
level. But i will try to switch to levelled compaction and back again it 
might do what i want.


read performance problem

2011-11-19 Thread Kent Tong
Hi,

On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am testing 
the 
performance of Cassandra. The write performance is good: It can write a million 
records 
in 10 minutes. However, the query performance is poor and it takes 10 minutes 
to read 
10K records with sequential keys from 0 to  (about 100 QPS). This is far 
away from 
the 3,xxx QPS found on the net.

Cassandra decided to use 1G as the Java heap size which seems to be fine as at 
the end
of the benchmark the swap was barely used (only 1M used).

I understand that my computer may be not as powerful as those used in the 
other benchmarks, 
but it shouldn't be that far off (1:30), right?

Any suggestion? Thanks in advance!


Re: read performance problem

2011-11-19 Thread Maxim Potekhin

Try to see if there is a lot of paging going on,
and run some benchmarks on the disk itself.
Are you running Windows or Linux? Do you think
the disk may be fragmented?


Maxim


On 11/19/2011 8:58 PM, Kent Tong wrote:

Hi,

On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am 
testing the
performance of Cassandra. The write performance is good: It can write 
a million records
in 10 minutes. However, the query performance is poor and it takes 10 
minutes to read
10K records with sequential keys from 0 to  (about 100 QPS). 
This is far away from

the 3,xxx QPS found on the net.

Cassandra decided to use 1G as the Java heap size which seems to be 
fine as at the end

of the benchmark the swap was barely used (only 1M used).

I understand that my computer may be not as powerful as those used in 
the other benchmarks,

but it shouldn't be that far off (1:30), right?

Any suggestion? Thanks in advance!