Re: C++ Thrift client

2013-05-21 Thread aaron morton
> Aaron, whenever I get a GCInspector event log, will it means that I'm having 
> a GC pause?
Messages about ParNew are GS pauses that went over 200ms. 
CMS GC has to relatively small pauses during it's cycle. 

All ParNew GC pauses the application threads, Cassandra is just logging the 
ones that take over 200ms. If you want to see them all enable the GC logging in 
/etc/cassanda/cassandra-env.sh

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/05/2013, at 10:29 AM, Sorin Manolache  wrote:

> On 2013-05-16 02:58, Bill Hastings wrote:
>> Hi All
>> 
>> I am doing very small inserts into Cassandra in the range of say 64
>> bytes. I use a C++ Thrift client and seem consistently get latencies
>> anywhere between 35-45 ms. Could some one please advise as to what
>> might be happening?
> 
> Sniff the network traffic in order to check whether you use the same 
> connection or you open a new connection for each new insert.
> 
> Also check if the client does a set_keyspace (or "use keyspace") before every 
> insert. That would be wasteful too.
> 
> In the worst case, the client would perform an authentication too.
> 
> Inspect timestamps of the network packets in the capture file in order to 
> determine which part takes too long: the connection phase? The 
> authentication? The interval between sending the request and getting the 
> response?
> 
> I do something similar (C++ Thrift, small inserts of roughly the same size as 
> you) and I get response times of 100ms for the first request when opening the 
> connection, authentifying, and setting the keyspace. But subsequent requests 
> on the same connection have response times in the range of 8-11ms.
> 
> Sorin
> 



Re: Multiple cursors

2013-05-21 Thread aaron morton
> We were successfully using a sync thrift client. With it we could send 
> multiple requests through the single connection and wait for answers.
> 
> 
Can you provide an example ? 

With sync the server thread that handles your client socket blocks waiting for 
the request to complete.  There is also state associated with the connection 
that from memory is considered to be essentially request state. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/05/2013, at 9:57 PM, Vitalii Tymchyshyn  wrote:

> We were successfully using a sync thrift client. With it we could send 
> multiple requests through the single connection and wait for answers.
> 
> 17 трав. 2013 02:51, "aaron morton"  напис.
> We don't have cursors in the RDBMS sense of things.
> 
> If you are using thrift the recommendation is to use connection pooling and 
> re-use connections for different requests. Note that you can not multiplex 
> queries over the same thrift connection, you must wait for the response 
> before issuing another request. The native binary transport allows 
> multiplexing though. 
> 
> In general you should use one of the pre build client libraries as they will 
> take care of connection pooling etc for you 
> https://wiki.apache.org/cassandra/ClientOptions
> 
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/05/2013, at 9:03 AM, Sam Mandes  wrote:
> 
>> Hello All,
>> 
>> Is using multiple cursors simultaneously on the same C* connection a good 
>> practice?
>> 
>> I've an internal api for a project running thrift, I then need to query 
>> something from C*. I do not like to create a new connection for every api 
>> request. Thus, when my service initially starts I open a connection to C* 
>> and with every request I create a new cursor.
>> 
>> Thanks a lot
> 



Re: Cassandra read reapair

2013-05-21 Thread aaron morton
> Only some keys of one CF are corrupt. 
Checking you do not mean the row key is corrupt and cannot be read. 

> I thought using CF ALL, would correct the problem with READ REPAIR, but by 
> returning to CL QUORUM, the problem persists.
> 

By default in 1.X and beyond the default read repair chance is 0.1, so it's 
only enabled on 10% of requests. 


In the absence of further writes all reads (at any CL) should return the same 
value. 

What CL are you writing at ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/05/2013, at 1:28 AM, Kais Ahmed  wrote:

> Hi all,
> 
> I encountered a consistency problem one some keys using phpcassa and 
> Cassandra 1.2.3 since a server crash 
> 
> Only some keys of one CF are corrupt. 
> 
> I lauched a nodetool repair that successfully completed but don't correct the 
> issue.
> 
> 
> 
> When i try to get a corrupt Key with :
> 
> CL ONE, the result contains 7 or 8 or 9 columns
> 
> CL QUORUM, result contains 8 or 9 columns
> 
> CL ALL, the data is consistent and returns always 9 columns
> 
> 
> 
> I thought using CF ALL, would correct the problem with READ REPAIR, but by 
> returning to CL QUORUM, the problem persists.
> 
> 
> Thank you for your help
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Repair of tombstones

2013-05-21 Thread aaron morton
Because that ticket is closed I think they best way to have a conversation is 
to create a new ticket to back port it to 1.1. 

Given that 1.2 has been out for a while it may be a tough sell, and it depends 
on the complexity of the back port. But on the other side 
https://issues.apache.org/jira/browse/CASSANDRA-4905?focusedCommentId=13493206&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13493206

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/05/2013, at 3:30 AM, Michael Theroux  wrote:

> There has been a lot of discussion on the list recently concerning issues 
> with repair, runtime, etc.
> 
> We recently have had issues with this cassandra bug:
> 
>   https://issues.apache.org/jira/browse/CASSANDRA-4905
> 
> Basically, if you do regular staggered repairs, and you have tombstones that 
> can be gc_graced, those tombstones may never be cleaned up if those 
> tombstones don't get compacted away before the next repair.  This is because 
> these tombstones are essentially recopied to other nodes during the next 
> repair.  This has been fixed in 1.2, however, we aren't ready to make the 
> jump to 1.2 yet.
> 
> Is there a reason why this hasn't been back-ported to 1.1?  Is it a risky 
> change? Although not a silver bullet, it seems it may help a lot of people 
> with repair issues (certainly seems it would help us),
> 
> -Mike
> 



Re: Logging Cassandra queries

2013-05-21 Thread aaron morton
>  (but still the list of classes would be helpful :) )
The source code is the most up to date list of the classes.

This is a talk I did on Cassandra Internals at Apache Con in Feb 2013, 
unfortunately it looks like the videos will never be released owing to 
technical snafu's http://www.slideshare.net/aaronmorton/apachecon-nafeb2013

Cass 1.2 also has probabilistic logging of queries. 

I'm doing a cut down version at the Cassandra SF conference in June. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/05/2013, at 10:28 AM, Ilya Kirnos  wrote:

> sure, i think it'd be a useful feature
> 
> 
> On Sat, May 18, 2013 at 4:13 PM, Tomàs Núnez  
> wrote:
> If you're looking for logging like "get keyX with CL quorum and slice Y took 
> n millis" 
> 
> That would be even better! Maybe should I file a ticket in Cassandra Jira for 
> this feature? Do you think it would be helpful?
> 
> BTW, just "get keyX" or "set keyX" would work for me. I'll check 
> org.apache.cassandra.thrift.CassandraServer as Aaron suggested (but still the 
> list of classes would be helpful :) )
> 
> Thanks!
> 
> 2013/5/19 Ilya Kirnos 
> If you're looking for logging like "get keyX with CL quorum and slice Y took 
> n millis" there's nothing like that from what I could find.  We had to modify 
> c* source (CassandraServer.java) to add this query logging to the thrift 
> codepath.
> 
> On May 18, 2013 3:20 PM, "Tomàs Núnez"  wrote:
> Yes, I read how to do that here, as well:
> http://www.datastax.com/docs/1.1/configuration/logging_options
> 
> But I didn't know what classes to enable logging for the queries... Is there 
> any document with the list of classes with a bit explanation for each of 
> them? I can't find any, and I don't understand java enough to dive through 
> the code
> 
> Thanks!
> 
> 
> 2013/5/17 aaron morton 
>> And... could I be more precise when enabling logging? Because right now, 
>> with log4j.rootLogger=DEBUG,stdout,R I'm getting a lot of information I 
>> won't use ever, and I'd like to enable just what I need to see gets and 
>> seds….
> 
> see the example at the bottom of this file about setting the log level for a 
> single class 
> https://github.com/apache/cassandra/blob/trunk/conf/log4j-server.properties
> 
> You probably want to set it for the 
> org.apache.cassandra.thrift.CassandraServer class. But I cannot remember what 
> the logging is like in 0.8. 
> 
> Cassandra gets faster in the later versions, which normally means doing less 
> work. Upgrading to 1.1 would be the first step I would take in improving 
> performance.  
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/05/2013, at 4:00 AM, Tomàs Núnez  wrote:
> 
>> Hi!
>> 
>> For quite time I've been having some unexpected loadavg in the cassandra 
>> servers. I suspect there are lots of uncontrolled queries to the cassandra 
>> servers causing this load, but the developers say that there are none, and 
>> the load is due to cassandra internal processes. 
>> 
>> Trying to get to the bottom, I've been looking into completed ReadStage and 
>> MutationStage through JMX, and the numbers seem to confirm my theory, but 
>> I'd like to go one step forward and, if possible, list all the queries from 
>> the webservers to the cassandra cluster (just one node would be enough). 
>> 
>> I've been playing with cassandra loglevels, and I can see when a Read or a 
>> Write is done, but it would be better if I could knew the CF of the query. 
>> For my tests I've put the in the log4j.server " 
>> log4j.rootLogger=DEBUG,stdout,R", writing and reading a test CF, and I can't 
>> see the name of it anywhere.
>> 
>> For the test I'm using Cassandra 0.8.4 (yes, still), as my production 
>> servers, and also 1.0.11. Maybe this changes in 1.1? Maybe I'm doing 
>> something wrong? Any hint?
>> 
>> And... could I be more precise when enabling logging? Because right now, 
>> with log4j.rootLogger=DEBUG,stdout,R I'm getting a lot of information I 
>> won't use ever, and I'd like to enable just what I need to see gets and 
>> seds
>> 
>> Thanks in advance, 
>> Tomàs
>> 
> 
> 
> 
> 
> 
> -- 
> -ilya



Re: Cassandra read reapair

2013-05-21 Thread Kais Ahmed
> Checking you do not mean the row key is corrupt and cannot be read.
Yes, i can read it but all read don't return the same result except for CL
ALL

> By default in 1.X and beyond the default read repair chance is 0.1, so
it's only enabled on 10% of requests.
You are right read repair chance is set to 0.1, but i launched a read
repair which did not solved the problem. Any idea?

>What CL are you writing at ?
All write are in CL QUORUM

thank you aaron for your answer.


2013/5/21 aaron morton 

> Only some keys of one CF are corrupt.
>
> Checking you do not mean the row key is corrupt and cannot be read.
>
> I thought using CF ALL, would correct the problem with READ REPAIR, but by
> returning to CL QUORUM, the problem persists.
>
> By default in 1.X and beyond the default read repair chance is 0.1, so
> it's only enabled on 10% of requests.
>
> In the absence of further writes all reads (at any CL) should return the
> same value.
>
> What CL are you writing at ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/05/2013, at 1:28 AM, Kais Ahmed  wrote:
>
> Hi all,
>
> I encountered a consistency problem one some keys using phpcassa and
> Cassandra 1.2.3 since a server crash
>
> Only some keys of one CF are corrupt.
>
> I lauched a nodetool repair that successfully completed but don't correct
> the issue.
>
>
>
> When i try to get a corrupt Key with :
>
> CL ONE, the result contains 7 or 8 or 9 columns
>
> CL QUORUM, result contains 8 or 9 columns
>
> CL ALL, the data is consistent and returns always 9 columns
>
>
> I thought using CF ALL, would correct the problem with READ REPAIR, but by
> returning to CL QUORUM, the problem persists.
>
>
> Thank you for your help
>
>
>
>
>
>
>
>
>
>
>


Cassandra hangs on large hinted handoffs

2013-05-21 Thread Vladimir Volkov
Hello.

I'm stress-testing our Cassandra (version 1.0.9) cluster, and tried turning
off two of the four nodes for half an hour under heavy load. As a result I
got a large volume of hints on the alive nodes - HintsColumnFamily takes
about 1.5 GB disk space on each of the nodes. It seems, these hints are
never replayed successfully.

After I bring other nodes back online, tpstats shows active handoffs, but I
can't see any writes on the target nodes.
The log indicates memory pressure - the heap is >80% full (heap size is 8GB
total, 1GB young).

A fragment of the log:
 INFO 18:34:05,513 Started hinted handoff for token: 1 with IP: /
84.201.162.144
 INFO 18:34:06,794 GC for ParNew: 300 ms for 1 collections, 5974181760
used; max is 8588951552
 INFO 18:34:07,795 GC for ParNew: 263 ms for 1 collections, 6226018744
used; max is 8588951552
 INFO 18:34:08,795 GC for ParNew: 256 ms for 1 collections, 6559918392
used; max is 8588951552
 INFO 18:34:09,796 GC for ParNew: 231 ms for 1 collections, 6846133712
used; max is 8588951552
 WARN 18:34:09,805 Heap is 0.7978131149667941 full.  You may need to reduce
memtable and/or cache sizes.  Cassandra will now flush up to the two
largest memtables to free up memory.
 WARN 18:34:09,805 Flushing CFS(Keyspace='test', ColumnFamily='t2') to
relieve memory pressure
 INFO 18:34:09,806 Enqueuing flush of Memtable-t2@639524673(60608588/571839171
serialized/live bytes, 743266 ops)
 INFO 18:34:09,807 Writing Memtable-t2@639524673(60608588/571839171
serialized/live bytes, 743266 ops)
 INFO 18:34:11,018 GC for ParNew: 449 ms for 2 collections, 6573394480
used; max is 8588951552
 INFO 18:34:12,019 GC for ParNew: 265 ms for 1 collections, 6820930056
used; max is 8588951552
 INFO 18:34:13,112 GC for ParNew: 331 ms for 1 collections, 6900566728
used; max is 8588951552
 INFO 18:34:14,181 GC for ParNew: 269 ms for 1 collections, 7101358936
used; max is 8588951552
 INFO 18:34:14,691 Completed flushing
/mnt/raid/cassandra/data/test/t2-hc-244-Data.db (56156246 bytes)
 INFO 18:34:15,381 GC for ParNew: 280 ms for 1 collections, 7268441248
used; max is 8588951552
 INFO 18:34:35,306 InetAddress /84.201.162.144 is now dead.
 INFO 18:34:35,306 GC for ConcurrentMarkSweep: 19223 ms for 1 collections,
3774714808 used; max is 8588951552
 INFO 18:34:35,309 InetAddress /84.201.162.144 is now UP

After taking off the load and restatring the service, I still see pending
handoffs:
$ nodetool -h localhost tpstats
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 01004257
0 0
RequestResponseStage  0 0  92555
0 0
MutationStage 0 0  6
0 0
ReadRepairStage   0 0  57773
0 0
ReplicateOnWriteStage 0 0  0
0 0
GossipStage   0 0 143332
0 0
AntiEntropyStage  0 0  0
0 0
MigrationStage0 0  0
0 0
MemtablePostFlusher   0 0  2
0 0
StreamStage   0 0  0
0 0
FlushWriter   0 0  2
0 0
MiscStage 0 0  0
0 0
InternalResponseStage 0 0  0
0 0
HintedHandoff 1 3 15
0 0

These 3 handoffs remain pending for a long time (>12 hours).
Most of the time Cassandra uses 100% of one CPU core, the stack trace of
the busy thread is:
"HintedHandoff:1" daemon prio=10 tid=0x01220800 nid=0x3843 runnable
[0x7fa1e1146000]
   java.lang.Thread.State: RUNNABLE
at java.util.ArrayList$Itr.remove(ArrayList.java:808)
at
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedSuper(ColumnFamilyStore.java:908)
at
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:857)
at
org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:850)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1150)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpointInternal(HintedHandOffManager.java:324)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:256)
at
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:84)
at
org.apache.cassandra.db.HintedHandOffManager$3.runMayThrow(HintedHandOffManager.java:437)
at
org.apache.cassandra.utils.WrappedRunnable.r

Re: Problem with streaming data from Hadoop: DecoratedKey(-1, )

2013-05-21 Thread Michal Michalski
I've finally had some time to experiment a bit with this problem (it 
occured twice again) and here's what I found:


1. So far (three occurences in total), *when* it happened, it happened 
only for streaming to  *one* specific C* node (but it works on this node 
too for 99,9% of the time)
2. It happens with compression turned on 
(cassandra.output.compression.class set to 
org.apache.cassandra.io.compress.DeflateCompressor, but it doesn't 
matter what the chunk length is)

3. Everything works fine when compression is turned off.

So it looks like I have a workaround for now, but I don't really 
understand the root cause of this problem and what's the "right" 
solution if we want to keep using compression.


Anyway, the thing that interests me the most is why does it fail so 
rarely and - assuming it's not a coincidence - why only for one C* node...


May it be a DeflateCompressor's bug?
Any other ideas?

Regards,
Michał


W dniu 31.03.2013 12:01, aaron morton pisze:

  but yesterday one of 600 mappers failed


:)


 From what I can understand by looking into the C* source, it seems to me that 
the problem is caused by a empty (or surprisingly finished?) input buffer (?) 
causing token to be set to -1 which is improper for RandomPartitioner:

Yes, there is a zero length key which as a -1 token.


However, I can't figure out what's the root cause of this problem.
Any ideas?

mmm, the BulkOutputFormat uses a SSTableSimpleUnsortedWriter and neither of 
them check for zero length row keys. I would look there first.

There is no validation in the  AbstractSSTableSimpleWriter, not sure if that is 
by design or an oversight. Can you catch the zero length key in your map job ?

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/03/2013, at 2:26 PM, Michal Michalski  wrote:


We're streaming data to Cassandra directly from MapReduce job using 
BulkOutputFormat. It's been working for more than a year without any problems, 
but yesterday one of 600 mappers faild and we got a strange-looking exception 
on one of the C* nodes.

IMPORTANT: It happens on one node and on one cluster only. We've loaded the 
same data to test cluster and it worked.


ERROR [Thread-1340977] 2013-03-28 06:35:47,695 CassandraDaemon.java (line 133) 
Exception in thread Thread[Thread-1340977,5,main]
java.lang.RuntimeException: Last written key 
DecoratedKey(5664330507961197044404922676062547179, 
302c6461696c792c32303133303332352c312c646f6d61696e2c756e6971756575736572732c633a494e2c433a6d63635f6d6e635f636172726965725f43656c6c4f6e655f4b61726e6174616b615f2842616e67616c6f7265295f494e2c643a53616d73756e675f47542d49393037302c703a612c673a3133)
 >= current key DecoratedKey(-1, ) writing into 
/cassandra/production/IndexedValues/production-IndexedValues-tmp-ib-240346-Data.db
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133)
at 
org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:209)
at 
org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:226)
at 
org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:166)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


 From what I can understand by looking into the C* source, it seems to me that 
the problem is caused by a empty (or surprisingly finished?) input buffer (?) 
causing token to be set to -1 which is improper for RandomPartitioner:

public BigIntegerToken getToken(ByteBuffer key)
{
if (key.remaining() == 0)
return MINIMUM; // Which is -1
return new BigIntegerToken(FBUtilities.hashToBigInteger(key));
}

However, I can't figure out what's the root cause of this problem.
Any ideas?

Of course I can't exclude a bug in my code which streams these data, but - as I 
said - it works when loading the same data to test cluster (which has different 
number of nodes, thus different token assignment, which might be a case too).

Michał







bootstrapping a new node...

2013-05-21 Thread Hiller, Dean
We are using 1.2.2 cassandra and have rolled on 3 additionals nodes to our 6 
node cluster(totalling 9 so far).  We are trying to roll on node 10 but during 
the streaming a compaction kicked off which seemed very odd to us.  "nodetool 
netstats" still reported tons of files that were not transferred yet.  Is this 
normal that compaction might kick off during bootstrapping a new node.  Our 
node still says "Joining" in "nodetool netstats" as well.  The ring does not 
show the new node yet either.  Lastly, "nodetool netstats" reports 0% on EVERY 
single file and this doesn't seem to change.  The bootstrap node seems hung so 
a few questions

 1.  Is compaction supposed to go off during a bootstrapping node?
 2.  I seem to recall a bootstrap node setting in cassandra.yaml but that was 
not one of the steps I recall in the datastax docs we went off of……in 1.2.2, is 
there any setting we need to set for a bootstrapping node that we missed(our 
other nodes joined just fine though and seem to be working great).
 3.  What can I do to get this node to start streaming files again …can I just 
reboot the cassandra or should I start from scratch somehow?
 4.  IF I need to start from scratch, I assume I a) stop the node, b) wipe 
commitlog and data directories, c) start the node back up.  Would that be 
correct?  After all, the other nodes don't seem to know about this new node 
according to "nodetool ring" command.

Thanks for any help on this one,
Dean


Re: Repair of tombstones

2013-05-21 Thread Edward Capriolo
I would not make any bets on 1.1. Ironically 1.1 seems to be fairly stable
and 1.2.X has been a bit "hairly" in terms of the releases and the scope of
the bugs fixed in each of the minors. However not having many shiny new
buttons makes the release less attractive I guess.
On Tue, May 21, 2013 at 4:27 AM, aaron morton wrote:

> Because that ticket is closed I think they best way to have a conversation
> is to create a new ticket to back port it to 1.1.
>
> Given that 1.2 has been out for a while it may be a tough sell, and it
> depends on the complexity of the back port. But on the other side
> https://issues.apache.org/jira/browse/CASSANDRA-4905?focusedCommentId=13493206&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13493206
>
> Hope that helps.
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/05/2013, at 3:30 AM, Michael Theroux  wrote:
>
> There has been a lot of discussion on the list recently concerning issues
> with repair, runtime, etc.
>
> We recently have had issues with this cassandra bug:
>
> https://issues.apache.org/jira/browse/CASSANDRA-4905
>
> Basically, if you do regular staggered repairs, and you have tombstones
> that can be gc_graced, those tombstones may never be cleaned up if those
> tombstones don't get compacted away before the next repair.  This is
> because these tombstones are essentially recopied to other nodes during the
> next repair.  This has been fixed in 1.2, however, we aren't ready to make
> the jump to 1.2 yet.
>
> Is there a reason why this hasn't been back-ported to 1.1?  Is it a risky
> change? Although not a silver bullet, it seems it may help a lot of people
> with repair issues (certainly seems it would help us),
>
> -Mike
>
>
>


Cassandra 1.2 TTL histogram problem

2013-05-21 Thread cem
Hi all,

I have a question about ticket
https://issues.apache.org/jira/browse/CASSANDRA-3442

Why does Cassandra single table compaction skips the keys that are in the
other sstables? Please correct if I am wrong.

I also dont understand why we have this line in worthDroppingTombstones
method:

double remainingColumnsRatio = ((double) columns) / (sstable.
getEstimatedColumnCount
().count
() * 
sstable.getEstimatedColumnCount
().mean
());

remainingColumnsRatio  is always *0 *in my case and the droppableRatio
 is *0.9.
Cassandra skips all sstables which are already expired.*

This line was introduced by
https://issues.apache.org/jira/browse/CASSANDRA-4022.

Best Regards,
Cem


Re: Cassandra 1.2 TTL histogram problem

2013-05-21 Thread Yuki Morishita
> Why does Cassandra single table compaction skips the keys that are in the 
> other sstables?

because we don't want to resurrect deleted columns. Say, sstable A has
the column with timestamp 1, and sstable B has the same column which
deleted at timestamp 2. Then if we purge that column only from sstable
B, we would see the column with timestamp 1 again.

> I also dont understand why we have this line in worthDroppingTombstones method

What the method is trying to do is to "guess" how many columns that
are not in the rows that don't overlap, without actually going through
every rows in the sstable. We have statistics like column count
histogram, min and max row token for every sstables, we use those in
the method to estimate how many columns the two sstables overlap.
You may have remainingColumnsRatio of 0 when the two sstables overlap
almost entirely.


On Tue, May 21, 2013 at 3:43 PM, cem  wrote:
> Hi all,
>
> I have a question about ticket
> https://issues.apache.org/jira/browse/CASSANDRA-3442
>
> Why does Cassandra single table compaction skips the keys that are in the
> other sstables? Please correct if I am wrong.
>
> I also dont understand why we have this line in worthDroppingTombstones
> method:
>
> double remainingColumnsRatio = ((double) columns) /
> (sstable.getEstimatedColumnCount().count() *
> sstable.getEstimatedColumnCount().mean());
>
> remainingColumnsRatio  is always 0 in my case and the droppableRatio  is
> 0.9. Cassandra skips all sstables which are already expired.
>
> This line was introduced by
> https://issues.apache.org/jira/browse/CASSANDRA-4022.
>
> Best Regards,
> Cem



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Cassandra 1.2 TTL histogram problem

2013-05-21 Thread cem
Thank you very much for the swift answer.

I have one more question about the second part. Can method calculate
non-overlapping keys as overlapping? I mean it uses max and min tokens and
column count. They can be very close to each other if random keys are used.

In my use case I generate a GUID for each key and send a single write
request.

Cem

On Tue, May 21, 2013 at 11:13 PM, Yuki Morishita  wrote:

> > Why does Cassandra single table compaction skips the keys that are in
> the other sstables?
>
> because we don't want to resurrect deleted columns. Say, sstable A has
> the column with timestamp 1, and sstable B has the same column which
> deleted at timestamp 2. Then if we purge that column only from sstable
> B, we would see the column with timestamp 1 again.
>
> > I also dont understand why we have this line in worthDroppingTombstones
> method
>
> What the method is trying to do is to "guess" how many columns that
> are not in the rows that don't overlap, without actually going through
> every rows in the sstable. We have statistics like column count
> histogram, min and max row token for every sstables, we use those in
> the method to estimate how many columns the two sstables overlap.
> You may have remainingColumnsRatio of 0 when the two sstables overlap
> almost entirely.
>
>
> On Tue, May 21, 2013 at 3:43 PM, cem  wrote:
> > Hi all,
> >
> > I have a question about ticket
> > https://issues.apache.org/jira/browse/CASSANDRA-3442
> >
> > Why does Cassandra single table compaction skips the keys that are in the
> > other sstables? Please correct if I am wrong.
> >
> > I also dont understand why we have this line in worthDroppingTombstones
> > method:
> >
> > double remainingColumnsRatio = ((double) columns) /
> > (sstable.getEstimatedColumnCount().count() *
> > sstable.getEstimatedColumnCount().mean());
> >
> > remainingColumnsRatio  is always 0 in my case and the droppableRatio  is
> > 0.9. Cassandra skips all sstables which are already expired.
> >
> > This line was introduced by
> > https://issues.apache.org/jira/browse/CASSANDRA-4022.
> >
> > Best Regards,
> > Cem
>
>
>
> --
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)
>