Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread Andras Szerdahelyi
Aaron,

What version are you using ?

1.1.9

Have you changed the bf_ chance ? The sstables need to be rebuilt for it to 
take affect.

I did ( several times ) and I ran upgradesstables after

Not sure what this means.
Are you saying it's in a boat on a river, with tangerine trees and marmalade 
skies ?

You nailed it. A significant number of reads are done from hundreds of sstables 
( I have to add, compaction is apparently constantly 6000-7000 tasks behind and 
the vast majority of the reads access recently written data )

Take a look at the nodetool cfhistograms to get a better idea of the row size 
and use that info when consdiering the sstable size.

It's around 1-20K, what should I optimise the LCS sstable size for? I suppose 
"I want to fit as many complete rows as possible in to a single sstable to keep 
file count down while avoiding compactions of oversized ( double digit 
gigabytes? ) sstables at higher levels ? "
Do I have to run a major compaction after a change to sstable_size_in_mb ? The 
larger sstable size wouldn't really affect sstables on levels above L0 , would 
it?



Thanks!!
Andras


From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday 26 March 2013 21:46
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

What version are you using ?
1.2.0 allowed a null bf chance, and I think it returned .1 for LCS and .01 for 
STS compaction.
Have you changed the bf_ chance ? The sstables need to be rebuilt for it to 
take affect.

and sstables read is in the skies
Not sure what this means.
Are you saying it's in a boat on a river, with tangerine trees and marmalade 
skies ?

SSTable count: 22682
Lots of files there, I imagine this would dilute the effectiveness of the key 
cache. It's caching (sstable, key) tuples.
You may want to look at increasing the sstable_size with LCS.

Compacted row minimum size: 104
Compacted row maximum size: 263210
Compacted row mean size: 3041
Take a look at the nodetool cfhistograms to get a better idea of the row size 
and use that info when consdiering the sstable size.

Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 6:16 AM, Andras Szerdahelyi 
mailto:andras.szerdahe...@ignitionone.com>> 
wrote:

Hello list,

Could anyone shed some light on how an FP chance of 0.01 coexist with a 
measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the 
requests hitting the bloom filter create a false positive while the "target" 
false ratio is 0.01?
( Also key cache hit ratio is around 0.001 and sstables read is in the skies ( 
non-exponential (non-) drop off  for LCS )  but that should be filed under 
"effect" and not "cause"? )

[default@unknown] use KS;
Authenticated to keyspace: KS
[default@KS] describe CF;
ColumnFamily: CF
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  GC grace seconds: 691200
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Replicate on write: true
  Caching: ALL
  Bloom Filter FP chance: 0.01
  Built indexes: []
  Compaction Strategy: 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy
  Compaction Strategy Options:
sstable_size_in_mb: 5
  Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

Keyspace: KS
Read Count: 628950
Read Latency: 93.19921121869784 ms.
Write Count: 1219021
Write Latency: 0.14352380885973254 ms.
Pending Tasks: 0
Column Family: CF
SSTable count: 22682
Space used (live): 119771434915
Space used (total): 119771434915
Number of Keys (estimate): 203837952
Memtable Columns Count: 13125
Memtable Data Size: 33212827
Memtable Switch Count: 15
Read Count: 629009
Read Latency: 88.434 ms.
Write Count: 1219038
Write Latency: 0.095 ms.
Pending Tasks: 0
Bloom Filter False Positives: 37939419
Bloom Filter False Ratio: 0.97928
Bloom Filter Space Used: 261572784
Compacted row minimum size: 104
Compacted row maximum size: 263210
Compacted row mean size: 3041

I upgraded sstables after changing the FP chance

Thanks!
Andras



Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Has anyone else experienced this?  After upgrading to VNodes, I am having
Repair issues.

If I run `nodetool -h localhost repair`, then it will repair only the first
Keyspace and then hang... I let it go for a week and nothing.

If I run `nodetool -h localhost repair -pr`, then it appears to only repair
the first VNode range, but does do all keyspaces...

I can't find anything in my cassandra logs to point to a problem for either
scenario.

My work around is to run a repair command independently for my different
keyspaces

nodetool -h localhost repair Keyspace1
nodetool -h localhost repair Keyspace2
...

But that is silly!

I am using Cassandra 1.2.2

Thanks!
Ryan


Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Marco Matarazzo
> If I run `nodetool -h localhost repair`, then it will repair only the first 
> Keyspace and then hang... I let it go for a week and nothing.

Does node logs show any error ?

> If I run `nodetool -h localhost repair -pr`, then it appears to only repair 
> the first VNode range, but does do all keyspaces…

As far as I know, this is fixed in cassandra 1.2.3


--
Marco Matarazzo



Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Marco,

No there are no errors... the last line I see in my logs related to repair
is :

[repair #...] Sending completed merkle tree to /[node] for
(keyspace1,columnfamily1)

Ryan



On Wed, Mar 27, 2013 at 8:49 AM, Marco Matarazzo <
marco.matara...@hexkeep.com> wrote:

> > If I run `nodetool -h localhost repair`, then it will repair only the
> first Keyspace and then hang... I let it go for a week and nothing.
>
> Does node logs show any error ?
>
> > If I run `nodetool -h localhost repair -pr`, then it appears to only
> repair the first VNode range, but does do all keyspaces…
>
> As far as I know, this is fixed in cassandra 1.2.3
>
>
> --
> Marco Matarazzo
>
>


Re: TimeUUID Order Partitioner

2013-03-27 Thread Lanny Ripple
A type 4 UUID can be created from two Longs.  You could MD5 your strings giving 
you 128 hashed bits and then make UUIDs out of that.  Using Scala:
 
   import java.nio.ByteBuffer
   import java.security.MessageDigest
   import java.util.UUID

   val key = "Hello, World!"

   val md = MessageDigest.getInstance("MD5")
   val dig = md.digest(key.getBytes("UTF-8"))
   val bb = ByteBuffer.wrap(dig)

   val msb = bb.getLong
   val lsb = bb.getLong

   val uuid = new UUID(msb, lsb)


On Mar 26, 2013, at 3:22 PM, aaron morton  wrote:

>> Any idea?
> Not off the top of my head.
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 26/03/2013, at 2:13 AM, Carlos Pérez Miguel  wrote:
> 
>> Yes it does. Thank you Aaron.
>> 
>> Now I realized that the system keyspace uses string as keys, like "Ring" or 
>> "ClusterName", and I don't know how to convert these type of keys into UUID. 
>> Any idea?
>> 
>> 
>> Carlos Pérez Miguel
>> 
>> 
>> 2013/3/25 aaron morton 
>> The best thing to do is start with a look at ByteOrderedPartitoner and 
>> AbstractByteOrderedPartitioner. 
>> 
>> You'll want to create a new TimeUUIDToken extends Token and a new 
>> UUIDPartitioner that extends AbstractPartitioner<>
>> 
>> Usual disclaimer that ordered partitioners cause problems with load 
>> balancing. 
>> 
>> Hope that helps. 
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 25/03/2013, at 1:12 AM, Carlos Pérez Miguel  wrote:
>> 
>>> Hi,
>>> 
>>> I store in my system rows where the key is a UUID version1, TimeUUID. I 
>>> would like to maintain rows ordered by time. I know that in this case, it 
>>> is recomended to use an external CF where column names are UUID ordered by 
>>> time. But in my use case this is not possible, so I would like to use a 
>>> custom Partitioner in order to do this. If I use ByteOrderedPartitioner 
>>> rows are not correctly ordered because of the way a UUID stores the 
>>> timestamp. What is needed in order to implement my own Partitioner?
>>> 
>>> Thank you.
>>> 
>>> Carlos Pérez Miguel
>> 
>> 
> 



Re: TimeUUID Order Partitioner

2013-03-27 Thread Lanny Ripple
Ah. TimeUUID.  Not as useful for you then but still something for the toolbox.

On Mar 27, 2013, at 8:42 AM, Lanny Ripple  wrote:

> A type 4 UUID can be created from two Longs.  You could MD5 your strings 
> giving you 128 hashed bits and then make UUIDs out of that.  Using Scala:
> 
>   import java.nio.ByteBuffer
>   import java.security.MessageDigest
>   import java.util.UUID
> 
>   val key = "Hello, World!"
> 
>   val md = MessageDigest.getInstance("MD5")
>   val dig = md.digest(key.getBytes("UTF-8"))
>   val bb = ByteBuffer.wrap(dig)
> 
>   val msb = bb.getLong
>   val lsb = bb.getLong
> 
>   val uuid = new UUID(msb, lsb)
> 
> 
> On Mar 26, 2013, at 3:22 PM, aaron morton  wrote:
> 
>>> Any idea?
>> Not off the top of my head.
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 26/03/2013, at 2:13 AM, Carlos Pérez Miguel  wrote:
>> 
>>> Yes it does. Thank you Aaron.
>>> 
>>> Now I realized that the system keyspace uses string as keys, like "Ring" or 
>>> "ClusterName", and I don't know how to convert these type of keys into 
>>> UUID. Any idea?
>>> 
>>> 
>>> Carlos Pérez Miguel
>>> 
>>> 
>>> 2013/3/25 aaron morton 
>>> The best thing to do is start with a look at ByteOrderedPartitoner and 
>>> AbstractByteOrderedPartitioner. 
>>> 
>>> You'll want to create a new TimeUUIDToken extends Token and a new 
>>> UUIDPartitioner that extends AbstractPartitioner<>
>>> 
>>> Usual disclaimer that ordered partitioners cause problems with load 
>>> balancing. 
>>> 
>>> Hope that helps. 
>>> 
>>> -
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 25/03/2013, at 1:12 AM, Carlos Pérez Miguel  wrote:
>>> 
 Hi,
 
 I store in my system rows where the key is a UUID version1, TimeUUID. I 
 would like to maintain rows ordered by time. I know that in this case, it 
 is recomended to use an external CF where column names are UUID ordered by 
 time. But in my use case this is not possible, so I would like to use a 
 custom Partitioner in order to do this. If I use ByteOrderedPartitioner 
 rows are not correctly ordered because of the way a UUID stores the 
 timestamp. What is needed in order to implement my own Partitioner?
 
 Thank you.
 
 Carlos Pérez Miguel
>>> 
>>> 
>> 
> 



Re: java.io.IOException: FAILED_TO_UNCOMPRESS(5) exception when running nodetool rebuild

2013-03-27 Thread Ondřej Černoš
Hi Aaron,

I switched to 1.2.3 with no luck. I created
https://issues.apache.org/jira/browse/CASSANDRA-5391 describing the
problem. Maybe it's related to the EOFException problem, but I am not
sure - I don't know Cassandra internals well and I have never seen the
EOFException.

regards,

ondrej

On Tue, Mar 26, 2013 at 9:26 PM, aaron morton  wrote:
> If you are still on 1.2.1 may be this
>
> https://issues.apache.org/jira/browse/CASSANDRA-5105
>
> Fixed in 1.2.2
>
> If you are on 1.2.3 there is also
> https://issues.apache.org/jira/browse/CASSANDRA-5381
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26/03/2013, at 5:10 AM, Ondřej Černoš  wrote:
>
> Hi all,
>
> I am still unable to move forward with this issue.
>
> - when I switch SSL off in inter-DC communication, nodetool rebuild  works
> well
> - when I switch internode_compression off, I still get
> java.io.IOException: FAILED_TO_UNCOMPRESS exception. Does
> internode_compression: none really switch off the snappy compression
> of the internode communication? The stacktrace - see the previous mail
> - clearly demonstrates some compression is involved
> - I managed to trigger another exception:
>
> java.lang.RuntimeException: javax.net.ssl.SSLException: bad record MAC
> at com.google.common.base.Throwables.propagate(Throwables.java:160)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: javax.net.ssl.SSLException: bad record MAC
> at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
> at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1607)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:859)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
> at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
> at
> org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> ... 1 more
>
> I managed to trigger this exception only once however.
>
> The fact the transfer works when SSL is off and fails with SSL is
> another strange thing with this issue.
>
> Any ideas or hints?
>
> regards,
>
> Ondrej Cernos
>
> On Tue, Mar 19, 2013 at 5:51 PM, Ondřej Černoš  wrote:
>
> Hi all,
>
> I am running into strange error when bootstrapping Cassandra cluster
> in multiple datacenter setup.
>
> The setup is as follows: 3 nodes in AWS east, 3 nodes somewhere on
> Rackspace/Openstack. I use my own snitch based on EC2MultiRegionSnitch
> (it just adds some ec2 avalability zone parsing capabilities). Nodes
> in the cluster connect to each other and all seems ok.
>
> When I start the Rackspace cluster first, populate it with data and
> then let the AWS cluster bootstrap from it, it works great. However
> the other way round it just breaks.
>
> The breakage demonstrates as follows:
>
> - nodetool rebuild us-east command hangs
> - cassandra's log contains the following:
>
> 2013-03-19 12:42:15.796+0100 [Thread-14] [DEBUG]
> IncomingTcpConnection.java(63)
> org.apache.cassandra.net.IncomingTcpConnection: Connection version 6
> from ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com/xxx.xxx.xxx.xxx
> 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG]
> StreamInSession.java(104)
> org.apache.cassandra.streaming.StreamInSession: Adding file
> /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-2-Data.db
> to Stream Request queue
> 2013-03-19 12:42:15.803+0100 [Thread-14] [DEBUG]
> StreamInSession.java(104)
> org.apache.cassandra.streaming.StreamInSession: Adding file
> /path/to/cassandra/data/key_space/column_family/key_space-column_family-ib-1-Data.db
> to Stream Request queue
> 2013-03-19 12:42:15.806+0100 [Thread-14] [DEBUG]
> IncomingStreamReader.java(112)
> org.apache.cassandra.streaming.IncomingStreamReader: Receiving stream
> 2013-03-19 12:42:15.807+0100 [Thread-14] [DEBUG]
> IncomingStreamReader.java(113)
> org.apache.cassandra.streaming.IncomingStreamReader: Creating file for
> /path/to/cassandra/data/key_space/column_family/key_space-column_family-tmp-ib-2-Data.db
> with 7808 estimat
> ed keys
> 2013-03-19 12:42:15.808+0100 [Thread-14] [DEBUG]
> ColumnFamilyStore.java(863) org.apache.cassandra.db.ColumnFamilyStore:
> component=key_space Checking for sstables overlapping []
> 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110)
> org.apache.cassandra.io.util.FileUtils: Deleting
> key_space-column_family-tmp-ib-2-Data.db
> 2013-03-19 12:42:15.962+0100 [Thread-14] [DEBUG] FileUtils.java(110)
> org.apache.cassandra.io.util.FileUtils: Deleting
> key_space-column_family-tmp-ib-2-CompressionInfo.db
> 2013-03-19 12:42:15.962+0100 [Thread-14]

Re: Clearing tombstones

2013-03-27 Thread Joel Samuelsson
I see. The cleanup operation took several minutes though. This doesn't seem
normal then? My replication settings should be very normal (simple strategy
and replication factor 1).


2013/3/26 Tyler Hobbs 

>
> On Tue, Mar 26, 2013 at 5:39 AM, Joel Samuelsson <
> samuelsson.j...@gmail.com> wrote:
>
>> Sorry. I failed to mention that all my CFs had a gc_grace_seconds of 0
>> since it's a 1 node cluster. I managed to accomplish what I wanted by first
>>  running cleanup and then compact. Is there any logic to this or should my
>> tombstones be cleared by just running compact?
>
>
> There's nothing for cleanup to do on a single node cluster (unless you've
> changed your replication settings in a strange way, like setting no
> replicas for a keyspace).  Just doing a major compaction will take care of
> tombstones that are gc_grace_seconds old.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Digest Query Seems to be corrupt on certain cases

2013-03-27 Thread Ravikumar Govindarajan
We started receiving OOMs in our cassandra grid and took a heap dump. We
are running version 1.0.7 with LOCAL_QUORUM from both reads/writes.

After some analysis, we kind of identified the problem, with
SliceByNamesReadCommand, involving a single Super-Column. This seems to be
happening only in digest query and not during actual reads.

I am pasting the serialized byte array of SliceByNamesReadCommand, which
seems to be corrupt on issuing certain digest queries.

//Type is SliceByNamesReadCommand
body[0] = (byte)1;
 //This is a digest query here.
body[1] = (byte)1;

//Table-Name from 2-8 bytes

//Key-Name from 9-18 bytes

//QueryPath deserialization here

 //CF-Name from 19-30 bytes

//Super-Col-Name from 31st byte onwards, but gets
corrupt as found in heap dump

//body[32-37] = 0, body[38] = 1, body[39] = 0.  This
causes the SliceByNamesDeserializer to mark both ColName=NULL and
SuperColName=NULL, fetching entire wide-row!!!

   //Actual super-col-name starts only from byte 40,
whereas it should have started from 31st byte itself

Has someone already encountered such an issue? Why is the super-col-name
not correctly de-serialized during digest query.

--
Ravi


Re: TimeUUID Order Partitioner

2013-03-27 Thread Carlos Pérez Miguel
Thanks, Lanny. That is what I am doing.

Actually I'm having another problem. My UUIDOrderedPartitioner doesn't
order by time. Instead, it orders by byte order and I cannot find why.
Which are the functions that control ordering between tokens? I have
implemented time ordering in the "compareTo" function of my UUID token
class, but it seems that Cassandra is ignoring it. For example:

Let's suppouse that I have a Users CF where each row represents a user in a
cluster of 1 node. Rows are ordered by TimeUUID. I create some users in the
next order:

user a created with user_id: eac850fa-96f4-11e2-9f22-72ad6af0e500
user b created with user_id: f17f9ae8-96f4-11e2-98aa-421151417092
user c created with user_id: f82fccfa-96f4-11e2-8d99-26f8461d074c
user d created with user_id: fee21cec-96f4-11e2-945b-f9a2a2e32308
user e created with user_id: 058ec180-96f5-11e2-8c88-4aaf94e4f04e
user f created with user_id: 0c5032ba-96f5-11e2-95a5-60a128c0b3f4
user g created with user_id: 13036b86-96f5-11e2-80dd-566654c686cb
user h created with user_id: 19b245f6-96f5-11e2-9c8f-b315f455e5e0

That is the order I would expect to find if I read the CF, but if I do, I
obtain (with any client or library I've tried):

user_id: 058ec180-96f5-11e2-8c88-4aaf94e4f04e name:"e"
user_id: 0c5032ba-96f5-11e2-95a5-60a128c0b3f4 name:"f"
user_id: 13036b86-96f5-11e2-80dd-566654c686cb name:"g"
user_id: 19b245f6-96f5-11e2-9c8f-b315f455e5e0 name:"h"
user_id: eac850fa-96f4-11e2-9f22-72ad6af0e500 name:"a"
user_id: f17f9ae8-96f4-11e2-98aa-421151417092 name:"b"
user_id: f82fccfa-96f4-11e2-8d99-26f8461d074c name:"c"
user_id: fee21cec-96f4-11e2-945b-f9a2a2e32308 name:"d"

Any idea what's happening?


Carlos Pérez Miguel


2013/3/27 Lanny Ripple 

> Ah. TimeUUID.  Not as useful for you then but still something for the
> toolbox.
>
> On Mar 27, 2013, at 8:42 AM, Lanny Ripple  wrote:
>
> > A type 4 UUID can be created from two Longs.  You could MD5 your strings
> giving you 128 hashed bits and then make UUIDs out of that.  Using Scala:
> >
> >   import java.nio.ByteBuffer
> >   import java.security.MessageDigest
> >   import java.util.UUID
> >
> >   val key = "Hello, World!"
> >
> >   val md = MessageDigest.getInstance("MD5")
> >   val dig = md.digest(key.getBytes("UTF-8"))
> >   val bb = ByteBuffer.wrap(dig)
> >
> >   val msb = bb.getLong
> >   val lsb = bb.getLong
> >
> >   val uuid = new UUID(msb, lsb)
> >
> >
> > On Mar 26, 2013, at 3:22 PM, aaron morton 
> wrote:
> >
> >>> Any idea?
> >> Not off the top of my head.
> >>
> >> Cheers
> >>
> >> -
> >> Aaron Morton
> >> Freelance Cassandra Consultant
> >> New Zealand
> >>
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 26/03/2013, at 2:13 AM, Carlos Pérez Miguel 
> wrote:
> >>
> >>> Yes it does. Thank you Aaron.
> >>>
> >>> Now I realized that the system keyspace uses string as keys, like
> "Ring" or "ClusterName", and I don't know how to convert these type of keys
> into UUID. Any idea?
> >>>
> >>>
> >>> Carlos Pérez Miguel
> >>>
> >>>
> >>> 2013/3/25 aaron morton 
> >>> The best thing to do is start with a look at ByteOrderedPartitoner and
> AbstractByteOrderedPartitioner.
> >>>
> >>> You'll want to create a new TimeUUIDToken extends Token and a
> new UUIDPartitioner that extends AbstractPartitioner<>
> >>>
> >>> Usual disclaimer that ordered partitioners cause problems with load
> balancing.
> >>>
> >>> Hope that helps.
> >>>
> >>> -
> >>> Aaron Morton
> >>> Freelance Cassandra Consultant
> >>> New Zealand
> >>>
> >>> @aaronmorton
> >>> http://www.thelastpickle.com
> >>>
> >>> On 25/03/2013, at 1:12 AM, Carlos Pérez Miguel 
> wrote:
> >>>
>  Hi,
> 
>  I store in my system rows where the key is a UUID version1, TimeUUID.
> I would like to maintain rows ordered by time. I know that in this case, it
> is recomended to use an external CF where column names are UUID ordered by
> time. But in my use case this is not possible, so I would like to use a
> custom Partitioner in order to do this. If I use ByteOrderedPartitioner
> rows are not correctly ordered because of the way a UUID stores the
> timestamp. What is needed in order to implement my own Partitioner?
> 
>  Thank you.
> 
>  Carlos Pérez Miguel
> >>>
> >>>
> >>
> >
>
>


Re: Repair hangs after Upgrade to VNodes & 1.2.2

2013-03-27 Thread Ryan Lowe
Upgrading to 1.2.3 fixed the -pr Repair.. I'll just use that from now on
(which is what I prefer!)

Thanks,
Ryan


On Wed, Mar 27, 2013 at 9:11 AM, Ryan Lowe  wrote:

> Marco,
>
> No there are no errors... the last line I see in my logs related to repair
> is :
>
> [repair #...] Sending completed merkle tree to /[node] for
> (keyspace1,columnfamily1)
>
> Ryan
>
>
>
> On Wed, Mar 27, 2013 at 8:49 AM, Marco Matarazzo <
> marco.matara...@hexkeep.com> wrote:
>
>> > If I run `nodetool -h localhost repair`, then it will repair only the
>> first Keyspace and then hang... I let it go for a week and nothing.
>>
>> Does node logs show any error ?
>>
>> > If I run `nodetool -h localhost repair -pr`, then it appears to only
>> repair the first VNode range, but does do all keyspaces…
>>
>> As far as I know, this is fixed in cassandra 1.2.3
>>
>>
>> --
>> Marco Matarazzo
>>
>>
>


Re: recv_describe_keyspace bug in org.apache.cassandra.thrift.Cassandra ?

2013-03-27 Thread cscetbon.ext
Okay. I found an issue already opened for that 
https://issues.apache.org/jira/browse/CASSANDRA-5234 and added it my comment as 
it's labeled as 'Not a problem'

thanks
--
Cyril SCETBON

On Mar 26, 2013, at 9:24 PM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:

Is there a way to have the column family defined the new way in a DC and the 
old way (WITH COMPACT STORAGE) in another DC ?
No.

Try a search of https://issues.apache.org/jira/browse/CASSANDRA to see if there 
is an existing ticket for PIG to support CQL 3. If not, raise one describing 
your use case and is possible offering to help.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 2:52 AM, 
cscetbon@orange.com wrote:

No one else concerned by the fact that we must define the column families the 
old way to access it with Pig ?
Is there a way to have the column family defined the new way in a DC and the 
old way (WITH COMPACT STORAGE) in another DC ?

Thanks
--
Cyril SCETBON

On Mar 20, 2013, at 9:59 AM, 
cscetbon@orange.com wrote:

On Mar 20, 2013, at 5:21 AM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:

By design. There may be a plan to change in the future, I'm not aware of one 
though.
bad news. If someone else has more information about that, don't hesitate !
Do you know how hard it would be to change this behaviour ? to not skip tables 
without compact storage format

CQL 3 tables created without COMPACT STORAGE store all keys and columns using 
Composite Types. They also store some additional columns you may not expect.
I suppose that if we are aware of that we can take it into account. And that's 
the job of the Pig script to take  only the columns it wants

If you want to interrop with thrift based API's like PIG it's best to use 
COMPACT STORAGE.
yes, but it means that I must recreate tables in production, and that rows will 
be stored in a single column on disk which may hurt performance. It's said in 
the documentation that this format is an old one that should be avoided. I 
suppose there are other issues with it that could  be found ??

You can always create CF's the old way using the cassandra-cli.

Regards

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/03/2013, at 12:09 AM, 
cscetbon@orange.com wrote:

Hi,

I'm testing Pig (0.11) with Cassandra (1.2.2). I've noticed that when the 
column family is created without WITH COMPACT STORAGE clause, Pig can't find it 
:(
After searching in the code, I've found that the issue comes from the function 
recv_describe_keyspace. This function returns a KsDef with an empty cf_defs 
array when there is no column family with COMPACT STORAGE clause. I conclude 
that all column families that must be accessed by Pig must be defined with this 
storage clause, but I wandering if it is a bug ? I suppose ..

Thanks.
--
Cyril SCETBON

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.




_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread Wei Zhu
Welcome to the wonderland of SSTableSize of LCS. There is some discussion 
around it, but no guidelines yet. 

I asked the people in the IRC, someone is running as high as 128M on the 
production with no problem. I guess you have to test it on your system and see 
how it performs. 

Attached is the related thread for your reference.

-Wei

- Original Message -
From: "Andras Szerdahelyi" 
To: user@cassandra.apache.org
Sent: Wednesday, March 27, 2013 1:19:06 AM
Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01


Aaron, 




What version are you using ? 


1.1.9 





Have you changed the bf_ chance ? The sstables need to be rebuilt for it to 
take affect. 


I did ( several times ) and I ran upgradesstables after 





Not sure what this means. 
Are you saying it's in a boat on a river, with tangerine trees and marmalade 
skies ? 


You nailed it. A significant number of reads are done from hundreds of sstables 
( I have to add, compaction is apparently constantly 6000-7000 tasks behind and 
the vast majority of the reads access recently written data ) 





Take a look at the nodetool cfhistograms to get a better idea of the row size 
and use that info when consdiering the sstable size. 


It's around 1-20K, what should I optimise the LCS sstable size for? I suppose 
"I want to fit as many complete rows as possible in to a single sstable to keep 
file count down while avoiding compactions of oversized ( double digit 
gigabytes? ) sstables at higher levels ? " 
Do I have to run a major compaction after a change to sstable_size_in_mb ? The 
larger sstable size wouldn't really affect sstables on levels above L0 , would 
it? 






Thanks!! 
Andras 






From: aaron morton < aa...@thelastpickle.com > 
Reply-To: " user@cassandra.apache.org " < user@cassandra.apache.org > 
Date: Tuesday 26 March 2013 21:46 
To: " user@cassandra.apache.org " < user@cassandra.apache.org > 
Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01 




What version are you using ? 
1.2.0 allowed a null bf chance, and I think it returned .1 for LCS and .01 for 
STS compaction. 
Have you changed the bf_ chance ? The sstables need to be rebuilt for it to 
take affect. 





and sstables read is in the skies Not sure what this means. 
Are you saying it's in a boat on a river, with tangerine trees and marmalade 
skies ? 





SSTable count: 22682 

Lots of files there, I imagine this would dilute the effectiveness of the key 
cache. It's caching (sstable, key) tuples. 
You may want to look at increasing the sstable_size with LCS. 





Compacted row minimum size: 104 
Compacted row maximum size: 263210 


Compacted row mean size: 3041 
Take a look at the nodetool cfhistograms to get a better idea of the row size 
and use that info when consdiering the sstable size. 


Cheers 








- 
Aaron Morton 
Freelance Cassandra Consultant 
New Zealand 


@aaronmorton 
http://www.thelastpickle.com 


On 26/03/2013, at 6:16 AM, Andras Szerdahelyi < 
andras.szerdahe...@ignitionone.com > wrote: 




Hello list, 


Could anyone shed some light on how an FP chance of 0.01 coexist with a 
measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the 
requests hitting the bloom filter create a false positive while the "target" 
false ratio is 0.01? 
( Also key cache hit ratio is around 0.001 and sstables read is in the skies ( 
non-exponential (non-) drop off for LCS ) but that should be filed under 
"effect" and not "cause"? ) 



[default@unknown] use KS; 
Authenticated to keyspace: KS 
[default@KS] describe CF; 
ColumnFamily: CF 
Key Validation Class: org.apache.cassandra.db.marshal.BytesType 
Default column value validator: org.apache.cassandra.db.marshal.BytesType 
Columns sorted by: org.apache.cassandra.db.marshal.BytesType 
GC grace seconds: 691200 
Compaction min/max thresholds: 4/32 
Read repair chance: 0.1 
DC Local Read repair chance: 0.0 
Replicate on write: true 
Caching: ALL 
Bloom Filter FP chance: 0.01 
Built indexes: [] 
Compaction Strategy: 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy 
Compaction Strategy Options: 
sstable_size_in_mb: 5 
Compression Options: 
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor 



Keyspace: KS 
Read Count: 628950 
Read Latency: 93.19921121869784 ms. 
Write Count: 1219021 
Write Latency: 0.14352380885973254 ms. 
Pending Tasks: 0 
Column Family: CF 
SSTable count: 22682 
Space used (live): 119771434915 
Space used (total): 119771434915 
Number of Keys (estimate): 203837952 
Memtable Columns Count: 13125 
Memtable Data Size: 33212827 
Memtable Switch Count: 15 
Read Count: 629009 
Read Latency: 88.434 ms. 
Write Count: 1219038 
Write Latency: 0.095 ms. 
Pending Tasks: 0 
Bloom Filter False Positives: 37939419 
Bloom Filter False Ratio: 0.97928 
Bloom Filter Space Used: 261572784 
Compacted row minimum size: 104 
Compacted row maximum size: 263210 
Compacted row mean size: 3041 


I upgraded sstables after chang

Timeseries data

2013-03-27 Thread Kanwar Sangha
Hi - I have a query on Read with Cassandra. We are planning to have dynamic 
column family and each column would be on based a timeseries.

Inserting data - key => ‘xxx′, {column_name => TimeUUID(now), :column_value 
=> ‘value’ }, {column_name => TimeUUID(now), :column_value => ‘value’ },..

Now this key might be spread across multiple SSTables over a period of days. 
When we do a READ query to fetch say a slice of data from this row based on 
time X->Y , would it need to get data from ALL sstables ?

Thanks,
Kanwar



Re: Timeseries data

2013-03-27 Thread Bryan Talbot
In the worst case, that is possible, but compaction strategies try to
minimize the number of SSTables that a row appears in so a row being in ALL
SStables is not likely for most cases.

-Bryan



On Wed, Mar 27, 2013 at 12:17 PM, Kanwar Sangha  wrote:

>  Hi – I have a query on Read with Cassandra. We are planning to have
> dynamic column family and each column would be on based a timeseries. 
>
> ** **
>
> Inserting data — key => ‘xxx′, {column_name => TimeUUID(now),
> :column_value => ‘value’ }, {column_name => TimeUUID(now), :column_value =>
> ‘value’ },..
>
> ** **
>
> Now this key might be spread across multiple SSTables over a period of
> days. When we do a READ query to fetch say a slice of data from this row
> based on time X->Y , would it need to get data from ALL sstables ? 
>
> ** **
>
> Thanks,
>
> Kanwar
>
> ** **
>


Re: cfhistograms

2013-03-27 Thread aaron morton
> I think we all go through this learning curve.  Here is the answer I gave 
> last time this question was asked:
+1

> What I don't understand hete is "Row Size" column. Why is it  always 0?

Is it zero all the way down? 
What does cfstats say about the compacted max row size?

Cheers 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 6:37 AM, Derek Williams  wrote:

> On Mon, Mar 25, 2013 at 10:36 AM, Brian Tarbox  
> wrote:
> I think we all go through this learning curve.  Here is the answer I gave 
> last time this question was asked:
> 
> The output of this command seems to make no sense unless I think of it as 5 
> completely separate histograms that just happen to be displayed together.
> 
> Using this example output should I read it as: my reads all took either 1 or 
> 2 sstable.  And separately, I had write latencies of 3,7,19.  And separately 
> I had read latencies of 2, 8,69, etc?
> 
> Little correction: The actual value is in the Offset column, all the other 
> columns are the count for that bucket of the histogram. For example in write 
> latency the 3, 7, and 19 refer to how many requests had that latency. 3 write 
> requests took 17us, 7 requests took 20us, and 19 took 24us. 
> 
> -- 
> Derek Williams



Re: nodetool repair hung?

2013-03-27 Thread aaron morton
> > nodetool repair is not coming back on the command line
As a side, nodetool command makes a call to the server for each KS you are 
repairing. The calls are done in serial and if your terminal session times out 
the repair will stop after the last call nodetool made. 

If I'm manually running nodetool I run it in a screen . 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 4:00 PM, S C  wrote:

> Thank you. It helped me.
> 
> > Date: Mon, 25 Mar 2013 15:22:32 -0700
> > From: wz1...@yahoo.com
> > Subject: Re: nodetool repair hung?
> > To: user@cassandra.apache.org
> > 
> > check nodetool tpstats and looking for AntiEntropySessions/AntiEntropyStages
> > grep the log and looking for "repair" and "merkle tree"
> > 
> > - Original Message -
> > From: "S C" 
> > To: user@cassandra.apache.org
> > Sent: Monday, March 25, 2013 2:55:30 PM
> > Subject: nodetool repair hung?
> > 
> > 
> > I am using Cassandra 1.1.5. 
> > 
> > 
> > nodetool repair is not coming back on the command line. Did it ran 
> > successfully? Did it hang? How do you find if the repair was successful? 
> > I did not find anything in the logs."nodetool compactionstats" and 
> > "nodetool netstats" are clean. 
> > 
> > 
> > 
> > nodetool compactionstats 
> > pending tasks: 0 
> > Active compaction remaining time : n/a 
> > 
> > 
> > 
> > 
> > 
> > nodetool netstats 
> > Mode: NORMAL 
> > Not sending any streams. 
> > Not receiving any streams. 
> > Pool Name Active Pending Completed 
> > Commands n/a 0 121103621 
> > Responses n/a 0 209564496 
> > 
> > 
> > 
> > 
> > 
> >



Re: Delete Issues with cassandra cluster

2013-03-27 Thread aaron morton
> Node1 seeds Node2
> Node2 seeds Node1
> Node3 seeds Node1
General best practice is to have the same seed list for all nodes. You want 2 
or 3 seeds per data centre. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 7:04 PM, Byron Wang  wrote:

> I've actually tried all or 1. Anyway I think I've solved the issue. Seems 
> like node1 is having some issues with regards to connections.  
> 
> Thanks!  
> 
> --  
> Byron Wang
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 
> 
> On Monday, March 25, 2013 at 9:11 PM, Víctor Hugo Oliveira Molinar wrote:
> 
>> What is the consistence level of your read and write operations?
>> 
>> On Mon, Mar 25, 2013 at 8:39 AM, Byron Wang > (mailto:byron.w...@woowteam.com)> wrote:
>>> Hi,
>>> 
>>> I'm using cassandra 1.2.3.
>>> 
>>> I've successfully clustered 3 machines and created a keyspace with 
>>> replication factor 3.
>>> 
>>> Node1 seeds Node2
>>> Node2 seeds Node1
>>> Node3 seeds Node1
>>> 
>>> I insert an entry using node1.
>>> 
>>> Using cqlsh from another node, I try to delete the item by sending out the 
>>> delete command.
>>> 
>>> After sending the command, there seems to be no error but when I try to 
>>> select the item it is still there.
>>> 
>>> When I try to send the same delete command from node1 cqlsh it seems to 
>>> work.
>>> 
>>> Basically any delete command i send from the other nodes doesn't work 
>>> unless i use it using node1. However I can select the items using the other 
>>> nodes.
>>> 
>>> Is this a problem? I can't seem to modify objects from node1 using other 
>>> nodes. Truncate works though.
>>> 
>>> Please help
>>> 
>>> Thanks!
>>> Byron
>> 
> 
> 
> 



Re: Multiple Primary Keys on an IN clause or 2i?

2013-03-27 Thread aaron morton
> CREATE TABLE msg_archive(
> thread_id varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
This with reversed clustering so the most recent columns are at the start 
(makes it quicker to get the last X messages) see 
http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_TABLE#cql-create-columnfamily

>  SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) 
> AND ts < 1234567890;
> 

This but do not request 1000 rows per call. Each row becomes a request on a 
node, and there is a max of 32 threads processing those reads. Requests of 50 
to 100 are reasonable depending on the number of nodes you have. If you only 
have 3 nodes I would start smaller. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 7:17 PM, Byron Wang  wrote:

> Hi,  
> 
> I'm currently trying to implement an offline message retrieval solution 
> wherein I retrieve messages after a particular timestamp for specific users. 
> My question is will what route should I go for…multple primary keys on an IN 
> clause or using 2i
> 
> 
> The current model of the messages table looks something like this
> 
> CREATE TABLE msg_archive(
> thread_id varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
> 
> 
> 
> where thread_id is an alphabetized order of sender and recipient such as 
> "brian|john"
> 
> Now, in order to retrieve the messages, I will have to retrieve them based on 
> the number of contacts you have and as such the query will look something 
> like this
> 
> SELECT * FROM msg_archive WHERE thread_id IN ('brian|john', 'brian|james'….) 
> AND ts < 1234567890;
> 
> Ofcourse the list of friends a user can have can potentially reach around 500 
> or even worse 1000 so the IN clause can potentially have these large amount 
> of primary keys.
> 
> 
> 
> 
> The question is will this work well or do I have to modify the schema such 
> that we should incorporate secondary indexes And look something like this 
> instead?
> 
> CREATE TABLE msg_archive(
> thread_id varchar,
> recipient varchar,
> ts timestamp,
> msg blob,
> PRIMARY KEY (thread_id, ts))
> 
> 
> 
> CREATE INDEX ON msg_archive (recipient);
> 
> For the select statement, ofcourse it will be as simple as
> 
> SELECT * FROM msg_archive WHERE recipient = 'brian' AND ts < 1234567890;
> 
> 
> 
> Which is actually better in terms of performance? Or are there other 
> suggestions to this kind of model?
> 
> Thanks!
> Byron
> 
> 
> 
> --  
> Byron Wang
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> 
> 



Re: Infinit Loop in CompactionExecutor

2013-03-27 Thread aaron morton
>  Is there a workaround beside upgrading? We are not ready to upgrade just yet.

Cannot see one. 

Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 7:42 PM, Arya Goudarzi  wrote:

> Hi,
> 
> I am experiencing this bug on our 1.1.6 cluster:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-4765
> 
> The pending compactions has been stuck on a constant value, so I suppose 
> something is not compacting due to this. Is there a workaround beside 
> upgrading? We are not ready to upgrade just yet.
> 
> Thanks,
> -Arya 



Re: schema disagrement exception

2013-03-27 Thread aaron morton
Your cluster is angry http://wiki.apache.org/cassandra/FAQ#schema_disagreement

If your are just starting I suggest blasting it away and restarting. 
 
Hope that helps

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 8:57 PM, zg  wrote:

> Hi,
> I just try to set up a 2 nodes cluster. It seems work,but when I use CLI to 
> create a keyspace I meet an error "SchemaDisagreementException()". Does 
> anyone know how to solve it?
> 
> Thanks
> 
> 



Re: weird behavior with RAID 0 on EC2

2013-03-27 Thread aaron morton
I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well, 1 or 
2 disks in a raid 0 running at 85 to 100% the others 35 to 50ish. 

Have not looked into it. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ  wrote:

> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd, xvde 
> parts of a logical Raid0 (md0).
> 
> I use to see their use increasing in the same way. This morning there was a 
> normal minor compaction followed by messages dropped on one node (out of 12).
> 
> Looking closely at this node I saw the following:
> 
> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
> 
> On this node, one of the four disks (xvdd) started working hardly while other 
> worked less intensively.
> 
> This is quite weird since I always saw this 4 disks being used the exact same 
> way at every moment (as you can see on 5 other nodes or when the node ".239" 
> come back to normal).
> 
> Any idea on what happened and on how it can be avoided ?
> 
> Alain



Re: nodetool status inconsistencies, repair performance and system keyspace compactions

2013-03-27 Thread aaron morton
> During one of my tests - see this thread in this mailing list:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
That thread has been updated, check the bug ondrej created. 

> How will this perform in production with much bigger data if repair
> takes 25 minutes on 7MB and 11k compactions were triggered by the
> repair run?
Seems a little odd. 
See what happens the next time you run repair. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/03/2013, at 2:36 AM, Ondřej Černoš  wrote:

> Hi all,
> 
> I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and 
> writes.
> 
> Currently I test various operational qualities of the setup.
> 
> During one of my tests - see this thread in this mailing list:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
> - I ran into this situation:
> 
> - all nodes have all data and agree on it:
> 
> [user@host1-dc1:~] nodetool status
> 
> Datacenter: na-prod
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad Tokens  Owns
> (effective)  Host IDRack
> UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
> 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
> UN  XXX.XXX.XXX.XXX   7.74 MB256 100.0%
> 039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
> UN  XXX.XXX.XXX.XXX   7.72 MB256 100.0%
> 007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
> Datacenter: us-east
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad Tokens  Owns
> (effective)  Host IDRack
> UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
> a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
> UN  XXX.XXX.XXX.XXX7.73 MB256 100.0%
> ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
> UN  XXX.XXX.XXX.XXX 7.73 MB256 100.0%
> f53fd294-16cc-497e-9613-347f07ac3850  1d
> 
> - only one node disagrees:
> 
> [user@host1-dc2:~] nodetool status
> Datacenter: us-east
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens   Owns   Host ID
>  Rack
> UN  XXX.XXX.XXX.XXX7.73 MB256 17.6%
> a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
> UN  XXX.XXX.XXX.XXX7.75 MB256 16.4%
> ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
> UN  XXX.XXX.XXX.XXX 7.73 MB256 15.7%
> f53fd294-16cc-497e-9613-347f07ac3850  1d
> Datacenter: na-prod
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens   Owns   Host ID
>  Rack
> UN  XXX.XXX.XXX.XXX   7.74 MB256 16.9%
> 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
> UN  XXX.XXX.XXX.XXX   7.72 MB256 17.1%
> 007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
> UN  XXX.XXX.XXX.XXX   7.73 MB256 16.3%
> 039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
> 
> I tried to rebuild the node from scratch, repair the node, no results.
> Still the same owns stats.
> 
> The cluster is built from cassandra 1.2.3 and uses vnodes.
> 
> 
> On the related note: the data size, as you can see, is really small.
> The cluster was created by setting up the us-east datacenter,
> populating it with the dataset, then building the na-prod datacenter
> and running nodetool rebuild us-east. When I tried to run nodetool
> repair it took 25 minutes to finish, on this small dataset. Is this
> ok?
> 
> One other think I notices is the amount of compactions on the system keyspace:
> 
> /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt
> /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db
> 
> This is just after running the repair. Is this ok, considering the
> dataset is 7MB and during the repair no operations were running
> against the database, neither read, nor write, nothing?
> 
> How will this perform in production with much bigger data if repair
> takes 25 minutes on 7MB and 11k compactions were triggered by the
> repair run?
> 
> regards,
> 
> Ondrej Cernos



Re: CQL vs. non-CQL data models

2013-03-27 Thread aaron morton
> Is this data model defined by Thrift? How closely does it reflect the
> Cassandra internal data model?
Yes. 
Astynax is a thrift based API, and the thrift model closely matches the 
internal model. 
CQL 3 provides some abstractions on top of the internal model. 

> Is there any documentation or other online pointers describing these
> different data models?
See data modelling http://www.datastax.com/docs

> Can I use both access methods for a particular column family?
If is the CQL 3 table is created with COMPACT STORAGE

> Can a column family that was created using CQL have columns added to
> it dynamically?
It can have the schema updated without locking. 
It cannot have new column names added on a per row basis, like the dynamic 
schema with the thrift API. 
However if you squint hard enough the Grouping Columns in the CREATE TABLE 
statement look like they create new columns (they do internally). 

> Can I add CQL required metadata to column families created using
> Cassandra CLI later, so that they can be accessed via CQL?
No

> I'm trying to access data created using CQL by using non-CQL based
> access methods and vice versa and I'm confused about the following
> behaviour:
Unless you have a need for this I would avoid it. 

> - trying to use MutationBatch for modifying a rows in column family
> that has been created using CQL results in:
>  InvalidRequestException(why:Not enough bytes to read value of component 0)
Asynax is not handling the way CQL 3 stores things, see above. 

> - column family created using CQL is not visible via Cassandra CLI
>   [default@test1] list employees1;
>   employees1 not found in current keyspace.
Mmmm, it used to be. 
Is it there when you do show schema ? 

There is some interop between the two, but unless you really need to I would 
suggest avoiding it. 

Hope that helps.


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/03/2013, at 5:08 AM, Marko Asplund  wrote:

> Hi,
> 
> I'm experimenting with CQL3 and the non-CQL Cassandra data access
> methods through Astyanax client API. Being new to Cassandra I'm a bit
> puzzled by differences between the CQL3 data model and the non-CQL
> based data model exposed by the Astyanax client API.
> Is this data model defined by Thrift? How closely does it reflect the
> Cassandra internal data model?
> 
> Is there any documentation or other online pointers describing these
> different data models?
> Can I use both access methods for a particular column family?
> Can a column family that was created using CQL have columns added to
> it dynamically?
> Can I add CQL required metadata to column families created using
> Cassandra CLI later, so that they can be accessed via CQL?
> 
> I'm trying to access data created using CQL by using non-CQL based
> access methods and vice versa and I'm confused about the following
> behaviour:
> 
> - trying to use MutationBatch for modifying a rows in column family
> that has been created using CQL results in:
>  InvalidRequestException(why:Not enough bytes to read value of component 0)
> 
> - column family created using CQL is not visible via Cassandra CLI
>   [default@test1] list employees1;
>   employees1 not found in current keyspace.
> 
> - row data in column family created using Cassandra CLI is not
> deserialized when read using cqlsh (select * from X)
> 
> - when accessing data in a column family that was created using CQL
>   keyspace.prepareQuery(CF).getKey(id)
>  the column names seem to be encoded strangely and can't be identified
> 
> - in CQL query a result row a call on Row.getKey() returns null
> 
> 
> I'm using Cassandra v1.2.3 and Astyanax v1.56.31.
> 
> thanks,
> 
> marko



Re: Returning A Generated Id From An Insert

2013-03-27 Thread aaron morton
>> Is it possible to do something similar with CQL (e.g. could I be
>> returned the generated timeuuid from
>> now() somehow?)
No.
Writes to not read, and the state of the row after your write may or may not be 
one include all of the columns your write included. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/03/2013, at 8:56 AM, "Hiller, Dean"  wrote:

> Not really but things like PlayOrm generate your id for you and set your
> id when you have @NoSqlId (that is, if you are in java).
> 
> Later,
> Dean
> 
> On 3/26/13 1:42 PM, "Gareth Collins"  wrote:
> 
>> Hi,
>> 
>> I have a question on if I could do something in Cassandra similar to
>> what I can do in SQL.
>> 
>> In SQL (e.g. SQL Server), if I have a generated primary key, I can get
>> the generated primary key
>> back as a result for the insert statement.
>> 
>> Is it possible to do something similar with CQL (e.g. could I be
>> returned the generated timeuuid from
>> now() somehow?). It certainly makes my client code cleaner if this
>> were possible (it is a "nice to have").
>> 
>> thanks in advance,
>> Gareth
> 



Re: For clients, which node to connect too? (Python, CQL 1.4 driver)

2013-03-27 Thread aaron morton
Heard back from one of the contributors, the interface as outlined by Python 
DB-API 2.0 so it's by design. 

You choices are add it to the library and push a patch or handle it outside of 
the library. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/03/2013, at 9:19 AM, aaron morton  wrote:

>> It doesn't give me the option to specify multiple client addresses, just 
>> one. Will this be an issue?
> No, but it's good to have a list of servers to balance the load or work when 
> the node is not running. 
> 
> I'm sure there is an approach to getting it in there. Will try to ping one of 
> the contributors. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 26/03/2013, at 1:57 AM, Adam Venturella  wrote:
> 
>> I am currently running 4 nodes, @ 1.2.2.
>> 
>> I was curious if it mattered what node I have my clients connect to. Using 
>> the python cql driver :
>> https://pypi.python.org/pypi/cql/
>> 
>> It doesn't give me the option to specify multiple client addresses, just 
>> one. Will this be an issue?
>> 
>> My assumption is that given that the data is spread out across the ring, it 
>> shouldn't really matter since chances are it will need to pull data from 
>> another node anyway.
>> 
>> 
> 



Re: Insert v/s Update performance

2013-03-27 Thread aaron morton
* Check for GC activity in the logs
* check the volume the commit log is on to see it it's over utilised. 
* check if the dropped messages correlate to compaction, look at the 
compaction_* settings in yaml and consider reducing the throughput. 

Like Dean says if you have existing data it will result in more compaction. You 
may be able to get a lot of writes through in a clean new cluster, but it also 
has to work when compaction and repair are running. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/03/2013, at 1:43 PM, Jay Svc  wrote:

> Thanks Dean again!
>  
> My use case is high number of reads and writes out of that I am just focusing 
> on write now. I thought LCS is a suitable for my situation. I tried simillar 
> on STCS and results are same.
>  
> I ran nodetool for tpstats and MutationStage pending are very high. At the 
> same time the SSTable count and Pending Compaction are high too during my 
> updates.
>  
> Please find the snapshot of my syslog.
>  
> INFO [ScheduledTasks:1] 2013-03-26 15:05:48,560 StatusLogger.java (line 116) 
> OpsCenter.rollups864000,0
> INFO [FlushWriter:55] 2013-03-26 15:05:48,608 Memtable.java (line 264) 
> Writing Memtable-InventoryPrice@1051586614(11438914/129587272 serialized/live 
> bytes, 404320 ops)
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,561 MessagingService.java (line 
> 658) 2701 MUTATION messages dropped in last 5000ms
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,562 StatusLogger.java (line 57) 
> Pool NameActive   Pending   Blocked
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,563 StatusLogger.java (line 72) 
> ReadStage 0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,568 StatusLogger.java (line 72) 
> RequestResponseStage  0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,627 StatusLogger.java (line 72) 
> ReadRepairStage   0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,627 StatusLogger.java (line 72) 
> MutationStage32 19967 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,628 StatusLogger.java (line 72) 
> ReplicateOnWriteStage 0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,628 StatusLogger.java (line 72) 
> GossipStage   0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,628 StatusLogger.java (line 72) 
> AntiEntropyStage  0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,629 StatusLogger.java (line 72) 
> MigrationStage0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,629 StatusLogger.java (line 72) 
> StreamStage   0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,629 StatusLogger.java (line 72) 
> MemtablePostFlusher   1 1 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,673 StatusLogger.java (line 72) 
> FlushWriter   1 1 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,673 StatusLogger.java (line 72) 
> MiscStage 0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,673 StatusLogger.java (line 72) 
> commitlog_archiver0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,674 StatusLogger.java (line 72) 
> InternalResponseStage 0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,674 StatusLogger.java (line 72) 
> HintedHandoff 0 0 0
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,674 StatusLogger.java (line 77) 
> CompactionManager 127
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,675 StatusLogger.java (line 89) 
> MessagingServicen/a  0,22
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,724 StatusLogger.java (line 99) 
> Cache Type Size Capacity   
> KeysToSave Provider
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,725 StatusLogger.java (line 100) 
> KeyCache 142315  2118997  
> all
>  INFO [ScheduledTasks:1] 2013-03-26 15:05:53,725 StatusLogger.java (line 106) 
> RowCache  00  
> all  org.apache.cassandra.cache.SerializingCacheProvider
> INFO [ScheduledTasks:1] 2013-03-26 15:05:53,725 StatusLogger.java (line 113) 
> ColumnFamilyMemtable ops,data
> INFO [ScheduledTasks:1] 2013-03-26 15:0
>  
> Thanks,
> Jay
>  
>  
> 
> 
> On Tue, Mar 26, 2013 at 7:15 PM, Hiller, Dean  wrote:
> LCS is generally used for high read vs. write ratio though it sounds like you 

Re: Clearing tombstones

2013-03-27 Thread aaron morton
> The cleanup operation took several minutes though. This doesn't seem normal 
> then
It read all the data and made sure the node was a replica for it. Since a 
single node cluster replicas all data, there was not a lot to throw away. 

> My replication settings should be very normal (simple strategy and 
> replication factor 1).
Most people use the Network Topology Strategy and RF 3, even if they dont have 
multiple DC's. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 3:34 AM, Joel Samuelsson  wrote:

> I see. The cleanup operation took several minutes though. This doesn't seem 
> normal then? My replication settings should be very normal (simple strategy 
> and replication factor 1).
> 
> 
> 2013/3/26 Tyler Hobbs 
> 
> On Tue, Mar 26, 2013 at 5:39 AM, Joel Samuelsson  
> wrote:
> Sorry. I failed to mention that all my CFs had a gc_grace_seconds of 0 since 
> it's a 1 node cluster. I managed to accomplish what I wanted by first  
> running cleanup and then compact. Is there any logic to this or should my 
> tombstones be cleared by just running compact?
> 
> There's nothing for cleanup to do on a single node cluster (unless you've 
> changed your replication settings in a strange way, like setting no replicas 
> for a keyspace).  Just doing a major compaction will take care of tombstones 
> that are gc_grace_seconds old.
> 
> 
> -- 
> Tyler Hobbs
> DataStax
> 



Re: Digest Query Seems to be corrupt on certain cases

2013-03-27 Thread aaron morton
> We started receiving OOMs in our cassandra grid and took a heap dump
What are the JVM settings ? 
What was the error stack? 

> I am pasting the serialized byte array of SliceByNamesReadCommand, which 
> seems to be corrupt on issuing certain digest queries.
Sorry I don't follow what you are saying here. 
Can you can you enable DEBUG logging and identify the behaviour you think is 
incorrect ?

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 4:15 AM, Ravikumar Govindarajan 
 wrote:

> We started receiving OOMs in our cassandra grid and took a heap dump. We are 
> running version 1.0.7 with LOCAL_QUORUM from both reads/writes.
> 
> After some analysis, we kind of identified the problem, with 
> SliceByNamesReadCommand, involving a single Super-Column. This seems to be 
> happening only in digest query and not during actual reads.
> 
> I am pasting the serialized byte array of SliceByNamesReadCommand, which 
> seems to be corrupt on issuing certain digest queries.
> 
>   //Type is SliceByNamesReadCommand
>   body[0] = (byte)1;
>   
>   //This is a digest query here.
>   body[1] = (byte)1;
> 
> //Table-Name from 2-8 bytes
> 
> //Key-Name from 9-18 bytes
> 
> //QueryPath deserialization here
>  
>  //CF-Name from 19-30 bytes
> 
> //Super-Col-Name from 31st byte onwards, but gets corrupt 
> as found in heap dump
> 
> //body[32-37] = 0, body[38] = 1, body[39] = 0.  This 
> causes the SliceByNamesDeserializer to mark both ColName=NULL and 
> SuperColName=NULL, fetching entire wide-row!!!
> 
>//Actual super-col-name starts only from byte 40, whereas 
> it should have started from 31st byte itself
> 
> Has someone already encountered such an issue? Why is the super-col-name not 
> correctly de-serialized during digest query.
> 
> --
> Ravi
> 



Re: TimeUUID Order Partitioner

2013-03-27 Thread aaron morton
> That is the order I would expect to find if I read the CF, but if I do, I 
> obtain (with any client or library I've tried):
> 
What happens if you export sstables with sstable2json ?

Put some logging in Memtable.FlushRunnable.writeSortedContents to see the order 
the rows are written 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 5:05 AM, Carlos Pérez Miguel  wrote:

> Thanks, Lanny. That is what I am doing.
> 
> Actually I'm having another problem. My UUIDOrderedPartitioner doesn't order 
> by time. Instead, it orders by byte order and I cannot find why. Which are 
> the functions that control ordering between tokens? I have implemented time 
> ordering in the "compareTo" function of my UUID token class, but it seems 
> that Cassandra is ignoring it. For example:
> 
> Let's suppouse that I have a Users CF where each row represents a user in a 
> cluster of 1 node. Rows are ordered by TimeUUID. I create some users in the 
> next order:
> 
> user a created with user_id: eac850fa-96f4-11e2-9f22-72ad6af0e500
> user b created with user_id: f17f9ae8-96f4-11e2-98aa-421151417092
> user c created with user_id: f82fccfa-96f4-11e2-8d99-26f8461d074c
> user d created with user_id: fee21cec-96f4-11e2-945b-f9a2a2e32308
> user e created with user_id: 058ec180-96f5-11e2-8c88-4aaf94e4f04e
> user f created with user_id: 0c5032ba-96f5-11e2-95a5-60a128c0b3f4
> user g created with user_id: 13036b86-96f5-11e2-80dd-566654c686cb
> user h created with user_id: 19b245f6-96f5-11e2-9c8f-b315f455e5e0
> 
> That is the order I would expect to find if I read the CF, but if I do, I 
> obtain (with any client or library I've tried):
> 
> user_id: 058ec180-96f5-11e2-8c88-4aaf94e4f04e name:"e"
> user_id: 0c5032ba-96f5-11e2-95a5-60a128c0b3f4 name:"f"
> user_id: 13036b86-96f5-11e2-80dd-566654c686cb name:"g"
> user_id: 19b245f6-96f5-11e2-9c8f-b315f455e5e0 name:"h"
> user_id: eac850fa-96f4-11e2-9f22-72ad6af0e500 name:"a"
> user_id: f17f9ae8-96f4-11e2-98aa-421151417092 name:"b"
> user_id: f82fccfa-96f4-11e2-8d99-26f8461d074c name:"c"
> user_id: fee21cec-96f4-11e2-945b-f9a2a2e32308 name:"d"
> 
> Any idea what's happening?
> 
> 
> Carlos Pérez Miguel
> 
> 
> 2013/3/27 Lanny Ripple 
> Ah. TimeUUID.  Not as useful for you then but still something for the toolbox.
> 
> On Mar 27, 2013, at 8:42 AM, Lanny Ripple  wrote:
> 
> > A type 4 UUID can be created from two Longs.  You could MD5 your strings 
> > giving you 128 hashed bits and then make UUIDs out of that.  Using Scala:
> >
> >   import java.nio.ByteBuffer
> >   import java.security.MessageDigest
> >   import java.util.UUID
> >
> >   val key = "Hello, World!"
> >
> >   val md = MessageDigest.getInstance("MD5")
> >   val dig = md.digest(key.getBytes("UTF-8"))
> >   val bb = ByteBuffer.wrap(dig)
> >
> >   val msb = bb.getLong
> >   val lsb = bb.getLong
> >
> >   val uuid = new UUID(msb, lsb)
> >
> >
> > On Mar 26, 2013, at 3:22 PM, aaron morton  wrote:
> >
> >>> Any idea?
> >> Not off the top of my head.
> >>
> >> Cheers
> >>
> >> -
> >> Aaron Morton
> >> Freelance Cassandra Consultant
> >> New Zealand
> >>
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 26/03/2013, at 2:13 AM, Carlos Pérez Miguel  wrote:
> >>
> >>> Yes it does. Thank you Aaron.
> >>>
> >>> Now I realized that the system keyspace uses string as keys, like "Ring" 
> >>> or "ClusterName", and I don't know how to convert these type of keys into 
> >>> UUID. Any idea?
> >>>
> >>>
> >>> Carlos Pérez Miguel
> >>>
> >>>
> >>> 2013/3/25 aaron morton 
> >>> The best thing to do is start with a look at ByteOrderedPartitoner and 
> >>> AbstractByteOrderedPartitioner.
> >>>
> >>> You'll want to create a new TimeUUIDToken extends Token and a new 
> >>> UUIDPartitioner that extends AbstractPartitioner<>
> >>>
> >>> Usual disclaimer that ordered partitioners cause problems with load 
> >>> balancing.
> >>>
> >>> Hope that helps.
> >>>
> >>> -
> >>> Aaron Morton
> >>> Freelance Cassandra Consultant
> >>> New Zealand
> >>>
> >>> @aaronmorton
> >>> http://www.thelastpickle.com
> >>>
> >>> On 25/03/2013, at 1:12 AM, Carlos Pérez Miguel  
> >>> wrote:
> >>>
>  Hi,
> 
>  I store in my system rows where the key is a UUID version1, TimeUUID. I 
>  would like to maintain rows ordered by time. I know that in this case, 
>  it is recomended to use an external CF where column names are UUID 
>  ordered by time. But in my use case this is not possible, so I would 
>  like to use a custom Partitioner in order to do this. If I use 
>  ByteOrderedPartitioner rows are not correctly ordered because of the way 
>  a UUID stores the timestamp. What is needed in order to implement my own 
>  Partitioner?
> 
>  Thank you.
> 
>  Carlos Pérez Miguel
> >>>
> >>>
> >>
> >
> 
> 



Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-27 Thread aaron morton
> You nailed it. A significant number of reads are done from hundreds of 
> sstables ( I have to add, compaction is apparently constantly 6000-7000 tasks 
> behind and the vast majority of the reads access recently written data )
So that's not good. 
If IO is saturated then maybe LCS is not for you, remember is used more IO than 
STS. 
Otherwise look at the compaction yaml settings to see if you can make it go 
faster but watch out that you don't hurt normal requests. 

CHeers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 7:00 AM, Wei Zhu  wrote:

> Welcome to the wonderland of SSTableSize of LCS. There is some discussion 
> around it, but no guidelines yet. 
> 
> I asked the people in the IRC, someone is running as high as 128M on the 
> production with no problem. I guess you have to test it on your system and 
> see how it performs. 
> 
> Attached is the related thread for your reference.
> 
> -Wei
> 
> - Original Message -
> From: "Andras Szerdahelyi" 
> To: user@cassandra.apache.org
> Sent: Wednesday, March 27, 2013 1:19:06 AM
> Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01
> 
> 
> Aaron, 
> 
> 
> 
> 
> What version are you using ? 
> 
> 
> 1.1.9 
> 
> 
> 
> 
> 
> Have you changed the bf_ chance ? The sstables need to be rebuilt for it to 
> take affect. 
> 
> 
> I did ( several times ) and I ran upgradesstables after 
> 
> 
> 
> 
> 
> Not sure what this means. 
> Are you saying it's in a boat on a river, with tangerine trees and marmalade 
> skies ? 
> 
> 
> You nailed it. A significant number of reads are done from hundreds of 
> sstables ( I have to add, compaction is apparently constantly 6000-7000 tasks 
> behind and the vast majority of the reads access recently written data ) 
> 
> 
> 
> 
> 
> Take a look at the nodetool cfhistograms to get a better idea of the row size 
> and use that info when consdiering the sstable size. 
> 
> 
> It's around 1-20K, what should I optimise the LCS sstable size for? I suppose 
> "I want to fit as many complete rows as possible in to a single sstable to 
> keep file count down while avoiding compactions of oversized ( double digit 
> gigabytes? ) sstables at higher levels ? " 
> Do I have to run a major compaction after a change to sstable_size_in_mb ? 
> The larger sstable size wouldn't really affect sstables on levels above L0 , 
> would it? 
> 
> 
> 
> 
> 
> 
> Thanks!! 
> Andras 
> 
> 
> 
> 
> 
> 
> From: aaron morton < aa...@thelastpickle.com > 
> Reply-To: " user@cassandra.apache.org " < user@cassandra.apache.org > 
> Date: Tuesday 26 March 2013 21:46 
> To: " user@cassandra.apache.org " < user@cassandra.apache.org > 
> Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01 
> 
> 
> 
> 
> What version are you using ? 
> 1.2.0 allowed a null bf chance, and I think it returned .1 for LCS and .01 
> for STS compaction. 
> Have you changed the bf_ chance ? The sstables need to be rebuilt for it to 
> take affect. 
> 
> 
> 
> 
> 
> and sstables read is in the skies Not sure what this means. 
> Are you saying it's in a boat on a river, with tangerine trees and marmalade 
> skies ? 
> 
> 
> 
> 
> 
> SSTable count: 22682 
> 
> Lots of files there, I imagine this would dilute the effectiveness of the key 
> cache. It's caching (sstable, key) tuples. 
> You may want to look at increasing the sstable_size with LCS. 
> 
> 
> 
> 
> 
> Compacted row minimum size: 104 
> Compacted row maximum size: 263210 
> 
> 
> Compacted row mean size: 3041 
> Take a look at the nodetool cfhistograms to get a better idea of the row size 
> and use that info when consdiering the sstable size. 
> 
> 
> Cheers 
> 
> 
> 
> 
> 
> 
> 
> 
> - 
> Aaron Morton 
> Freelance Cassandra Consultant 
> New Zealand 
> 
> 
> @aaronmorton 
> http://www.thelastpickle.com 
> 
> 
> On 26/03/2013, at 6:16 AM, Andras Szerdahelyi < 
> andras.szerdahe...@ignitionone.com > wrote: 
> 
> 
> 
> 
> Hello list, 
> 
> 
> Could anyone shed some light on how an FP chance of 0.01 coexist with a 
> measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the 
> requests hitting the bloom filter create a false positive while the "target" 
> false ratio is 0.01? 
> ( Also key cache hit ratio is around 0.001 and sstables read is in the skies 
> ( non-exponential (non-) drop off for LCS ) but that should be filed under 
> "effect" and not "cause"? ) 
> 
> 
> 
> [default@unknown] use KS; 
> Authenticated to keyspace: KS 
> [default@KS] describe CF; 
> ColumnFamily: CF 
> Key Validation Class: org.apache.cassandra.db.marshal.BytesType 
> Default column value validator: org.apache.cassandra.db.marshal.BytesType 
> Columns sorted by: org.apache.cassandra.db.marshal.BytesType 
> GC grace seconds: 691200 
> Compaction min/max thresholds: 4/32 
> Read repair chance: 0.1 
> DC Local Read repair chance: 0.0 
> Replicate on write: true 
> Caching: ALL 
> Bloom Filter FP chance:

Re: Timeseries data

2013-03-27 Thread aaron morton
sstablekey can help you find which sstables your keys are in. 

But yes, a slice call will need to read from all sstables the row has a 
fragment in. This is one reason we normally suggest partitioning time series 
data by month or year or something sensible in your problem domain. 

You will probably also want to use reversed comparators so you do not have to 
use reversed in your query. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 8:25 AM, Bryan Talbot  wrote:

> In the worst case, that is possible, but compaction strategies try to 
> minimize the number of SSTables that a row appears in so a row being in ALL 
> SStables is not likely for most cases.
> 
> -Bryan
> 
> 
> 
> On Wed, Mar 27, 2013 at 12:17 PM, Kanwar Sangha  wrote:
> Hi – I have a query on Read with Cassandra. We are planning to have dynamic 
> column family and each column would be on based a timeseries. 
> 
>  
> 
> Inserting data — key => ‘xxx′, {column_name => TimeUUID(now), 
> :column_value => ‘value’ }, {column_name => TimeUUID(now), :column_value => 
> ‘value’ },..
> 
>  
> 
> Now this key might be spread across multiple SSTables over a period of days. 
> When we do a READ query to fetch say a slice of data from this row based on 
> time X->Y , would it need to get data from ALL sstables ?
> 
>  
> 
> Thanks,
> 
> Kanwar
> 
>  
> 
> 



Re: Digest Query Seems to be corrupt on certain cases

2013-03-27 Thread Ravikumar Govindarajan
VM Settings are
-javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms8G -Xmx8G -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

error stack was containing 2 threads for the same key, stalling on digest
query

The below bytes which I referred is the actual value of "_body" variable in
org.apache.cassandra.net.Message object got from the heap dump.

As I understand from the code, ReadVerbHandler will deserialize this
"_body" variable into a SliceByNamesReadCommand object.

When I manually inspected this byte array, it seems hold all details
correctly, except the super-column name, causing it to fetch the entire
wide row.

--
Ravi

On Thu, Mar 28, 2013 at 8:36 AM, aaron morton wrote:

> We started receiving OOMs in our cassandra grid and took a heap dump
>
> What are the JVM settings ?
> What was the error stack?
>
> I am pasting the serialized byte array of SliceByNamesReadCommand, which
> seems to be corrupt on issuing certain digest queries.
>
> Sorry I don't follow what you are saying here.
> Can you can you enable DEBUG logging and identify the behaviour you think
> is incorrect ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/03/2013, at 4:15 AM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> We started receiving OOMs in our cassandra grid and took a heap dump. We
> are running version 1.0.7 with LOCAL_QUORUM from both reads/writes.
>
> After some analysis, we kind of identified the problem, with
> SliceByNamesReadCommand, involving a single Super-Column. This seems to be
> happening only in digest query and not during actual reads.
>
> I am pasting the serialized byte array of SliceByNamesReadCommand, which
> seems to be corrupt on issuing certain digest queries.
>
> //Type is SliceByNamesReadCommand
>  body[0] = (byte)1;
>  //This is a digest query here.
>  body[1] = (byte)1;
>
> //Table-Name from 2-8 bytes
>
> //Key-Name from 9-18 bytes
>
> //QueryPath deserialization here
>
>  //CF-Name from 19-30 bytes
>
> //Super-Col-Name from 31st byte onwards, but gets
> corrupt as found in heap dump
>
> //body[32-37] = 0, body[38] = 1, body[39] = 0.  This
> causes the SliceByNamesDeserializer to mark both ColName=NULL and
> SuperColName=NULL, fetching entire wide-row!!!
>
>//Actual super-col-name starts only from byte 40,
> whereas it should have started from 31st byte itself
>
> Has someone already encountered such an issue? Why is the super-col-name
> not correctly de-serialized during digest query.
>
> --
> Ravi
>
>
>