high context switches

2014-11-21 Thread Jan Karlsson
Hello,

We are running a 3 node cluster with RF=3 and 5 clients in a test environment. 
The C* settings are mostly default. We noticed quite high context switching 
during our tests. On 100 000 000 keys/partitions we averaged around 260 000 cs 
(with a max of 530 000).

We were running 12 000~ transactions per second. 10 000 reads and 2000 updates.

Nothing really wrong with that however I would like to understand why these 
numbers are so high. Have others noticed this behavior? How much context 
switching is expected and why? What are the variables that affect this?

/J


Re: A questiion to adding a new data center

2014-11-21 Thread Mark Reddy
Hi Boying,

I'm not sure I fully understand your question here, so some clarification
may be needed. However, if you are asking what steps need to be performed
on the current datacenter or on the new datacenter:

Step 1 - Current DC
Step 2 - New DC
Step 3 - Depending on the snitch you may need to make changes on both the
current and new DCs
Step 4 - Client config
Step 5 - Client config
Step 6 - New DC
Step 7 - New DC
Step 8 - New DC


Mark

On 21 November 2014 03:27, Lu, Boying  wrote:

> Hi, all,
>
>
>
> I read the document about how to adding a new data center to existing
> clusters posted at
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>
> But I have a question: Are all those steps executed *only* at the new
> adding cluster or on  existing clusters also? ( Step 7 is to be executed on
> the new cluster according to the document).
>
>
>
> Thanks
>
>
>
> Boying
>
>
>


[jira] Akhtar Hussain shared a search result with you

2014-11-21 Thread Akhtar Hussain (JIRA)
Akhtar Hussain shared a search result with you
-


https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=reporter+%3D+currentUser%28%29+ORDER+BY+createdDate+DESC

We have a Geo-red setup with 2 Data centers having 3 nodes each. When we 
bring down a single Cassandra node down in DC2 by kill -9 , 
reads fail on DC1 with TimedOutException for a brief amount of time (15-20 
sec~). 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Repair completes successfully but data is still inconsistent

2014-11-21 Thread André Cruz
On 19 Nov 2014, at 19:53, Robert Coli  wrote:
> 
> My hunch is that you originally triggered this by picking up some obsolete 
> SSTables during the 1.2 era. Probably if you clean up the existing zombies 
> you will not encounter them again, unless you encounter another "obsolete 
> sstables marked live" bug. I agree that your compaction exceptions in the 1.2 
> era are likely implicated.

I’m still in the “1.2 era”, I upgraded from 1.2.16 to 1.2.19. You mentioned 
that obsolete sstables may have been picked up, and each node had 3-9 
exceptions like this when they were brought down prior to being updated:

ERROR [CompactionExecutor:15173] 2014-10-21 15:04:01,923 CassandraDaemon.java 
(line 191) Exception in thread Thread[CompactionExecutor:15173,1,main]
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@1dad30c0 
rejected from 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@555b9c78[Terminated,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 14052]
   at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown 
Source)
   at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
   at 
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(Unknown Source)
   at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(Unknown 
Source)
   at java.util.concurrent.ScheduledThreadPoolExecutor.submit(Unknown 
Source)
   at 
org.apache.cassandra.io.sstable.SSTableDeletingTask.schedule(SSTableDeletingTask.java:65)

Can it be that they were all in the middle of a compaction (Leveled compaction) 
and the new sstables were written but the old ones were not deleted? Will 
Cassandra blindly pick up old and new sstables when it restarts?

If so, I have a few questions:

1- What is the correct sequence of commands to bring down a node safely? I know 
that “drain" was used here, because it is in the log. I’ve read somewhere that 
drain should not be used and “disablethrift”, “disablegossip”, “flush” and then 
waiting a while is the correct way.

2- Why won’t repair propagate this column value to the other nodes? Repairs 
have run everyday and the value is still missing on the other nodes.

3- If I have these rogue sstables loaded this seems like a time bomb. Down the 
line I will again delete columns that will reappear after some time. Is there a 
way I can find these sstables that should not be there? I thought the timestamp 
of the file would help but this zombie column is present on one of the latest 
sstables.

Thanks,
André

Data not replicating consistently

2014-11-21 Thread Rahul Neelakantan
I have a setup that looks like this

Dc1: 9 nodes
Dc2: 9 nodes
Dc3: 9 nodes
C* version: 2.0.10
RF: 2 in each DC
Empty CF with no data at the beginning of the test

Scenario 1 (happy path): I connect to a node in DC1 using CQLsh, validate that 
I am using CL=1, insert 10 rows.
Then using CQLsh connect to one node in each of the 3 DCs and with CL=1, select 
* on the table, each DC shows all 10 rows.

Scenario 1: using a program based on datastax drivers, write 10 rows to DC1. 
The program uses CL=1, also does a read after write.
Then using CQLsh connect to one node in each of the 3 DCs and with CL=1 or 
LocalQuorum, select * on the table,
DC1 shows all 10 rows.
DC2 shows 8 or 9 rows
DC3 shows 8 or 9 rows 
The missing rows never show up in DC2 and DC3 unless I do a CQLsh lookup with 
CL=all

Why is there a difference in the replication between writes performed using the 
datastax drivers and while using CQLsh?


Rahul

Re: [jira] Akhtar Hussain shared a search result with you

2014-11-21 Thread Mark Reddy
I believe you were attempting to share:
https://issues.apache.org/jira/browse/CASSANDRA-8352

Your cassandra logs outputs the following:

> DEBUG [Thrift:4] 2014-11-20 15:36:50,653 CustomTThreadPoolServer.java
> (line 204) Thrift transport error occurred during processing of message.
> org.apache.thrift.transport.TTransportException: Cannot read. Remote side
> has closed. Tried to read 4 bytes, but only got 0 bytes. (This is often
> indicative of an internal error on the server side. Please check your
> server logs.)


Which indicates that your server is under pressure at that moment and
points you to look at your server logs for further diagnosis.


Mark

On 21 November 2014 11:15, Akhtar Hussain (JIRA)  wrote:

> Akhtar Hussain shared a search result with you
> -
>
>
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=reporter+%3D+currentUser%28%29+ORDER+BY+createdDate+DESC
>
> We have a Geo-red setup with 2 Data centers having 3 nodes each. When
> we bring down a single Cassandra node down in DC2 by kill -9
> , reads fail on DC1 with TimedOutException for a brief
> amount of time (15-20 sec~).
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


Re: [jira] Akhtar Hussain shared a search result with you

2014-11-21 Thread Akhtar Hussain
Thats true.Will re look in to server logs and get back.

Br/Akhtar

On Fri, Nov 21, 2014 at 5:09 PM, Mark Reddy  wrote:

> I believe you were attempting to share:
> https://issues.apache.org/jira/browse/CASSANDRA-8352
>
> Your cassandra logs outputs the following:
>
>> DEBUG [Thrift:4] 2014-11-20 15:36:50,653 CustomTThreadPoolServer.java
>> (line 204) Thrift transport error occurred during processing of message.
>> org.apache.thrift.transport.TTransportException: Cannot read. Remote side
>> has closed. Tried to read 4 bytes, but only got 0 bytes. (This is often
>> indicative of an internal error on the server side. Please check your
>> server logs.)
>
>
> Which indicates that your server is under pressure at that moment and
> points you to look at your server logs for further diagnosis.
>
>
> Mark
>
> On 21 November 2014 11:15, Akhtar Hussain (JIRA)  wrote:
>
>> Akhtar Hussain shared a search result with you
>> -
>>
>>
>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=reporter+%3D+currentUser%28%29+ORDER+BY+createdDate+DESC
>>
>> We have a Geo-red setup with 2 Data centers having 3 nodes each. When
>> we bring down a single Cassandra node down in DC2 by kill -9
>> , reads fail on DC1 with TimedOutException for a brief
>> amount of time (15-20 sec~).
>>
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>>
>
>


Re: Repair/Compaction Completion Confirmation

2014-11-21 Thread Paulo Ricardo Motta Gomes
Hey guys,

Just reviving this thread. In case anyone is using the
cassandra_range_repair tool (https://github.com/BrianGallew/cassandra_range_
repair), please sync your repositories because the tool was not working
before due to a critical bug on the token range definition method. For more
information on the bug please check here:
https://github.com/BrianGallew/cassandra_range_repair/pull/18

Cheers,

On Tue, Oct 28, 2014 at 7:53 AM, Colin  wrote:

> When I use virtual nodes, I typically use a much smaller number - usually
> in the range of 10.  This gives me the ability to add nodes easier without
> the performance hit.
>
>
>
> --
> *Colin Clark*
> +1-320-221-9531
>
>
> On Oct 28, 2014, at 10:46 AM, Alain RODRIGUEZ  wrote:
>
> I have been trying this yesterday too.
>
> https://github.com/BrianGallew/cassandra_range_repair
>
> "Not 100% bullet proof" --> Indeed I found that operations are done
> multiple times, so it is not very optimised. Though it is open sourced so
> I guess you can improve things as much as you want and contribute. Here is
> the issue I raised yesterday
> https://github.com/BrianGallew/cassandra_range_repair/issues/14.
>
> I am also trying to improve our repair automation since we now have
> multiple DC and up to 800 GB per node. Repairs are quite heavy right now.
>
> Good luck,
>
> Alain
>
> 2014-10-28 4:59 GMT+01:00 Ben Bromhead :
>
>> https://github.com/BrianGallew/cassandra_range_repair
>>
>> This breaks down the repair operation into very small portions of the
>> ring as a way to try and work around the current fragile nature of repair.
>>
>> Leveraging range repair should go some way towards automating repair
>> (this is how the automatic repair service in DataStax opscenter works, this
>> is how we perform repairs).
>>
>> We have had a lot of success running repairs in a similar manner against
>> vnode enabled clusters. Not 100% bullet proof, but way better than nodetool
>> repair
>>
>>
>>
>> On 28 October 2014 08:32, Tim Heckman  wrote:
>>
>>> On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli 
>>> wrote:
>>>
 On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman  wrote:

> I know that when issuing some operations via nodetool, the command
> blocks until the operation is finished. However, is there a way to 
> reliably
> determine whether or not the operation has finished without monitoring 
> that
> invocation of nodetool?
>
> In other words, when I run 'nodetool repair' what is the best way to
> reliably determine that the repair is finished without running something
> equivalent to a 'pgrep' against the command I invoked? I am curious about
> trying to do the same for major compactions too.
>

 This is beyond a FAQ at this point, unfortunately; non-incremental
 repair is awkward to deal with and probably impossible to automate.

 In The Future [1] the correct solution will be to use incremental
 repair, which mitigates but does not solve this challenge entirely.

 As brief meta commentary, it would have been nice if the project had
 spent more time optimizing the operability of the critically important
 thing you must do once a week [2].

 https://issues.apache.org/jira/browse/CASSANDRA-5483

 =Rob
 [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1
 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34
 days.

>>>
>>> Thank you for getting back to me so quickly. Not the answer that I was
>>> secretly hoping for, but it is nice to have confirmation. :)
>>>
>>> Cheers!
>>> -Tim
>>>
>>
>>
>>
>> --
>>
>> Ben Bromhead
>>
>> Instaclustr | www.instaclustr.com | @instaclustr
>>  | +61 415 936 359
>>
>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200


Re: high context switches

2014-11-21 Thread Nikolai Grigoriev
How do the clients connect, which protocol is used and do they use
keep-alive connections? Is it possible that the clients use Thrift and the
server type is "sync"? It is just my guess, but in this scenario with high
number of clients connecting-disconnecting often there may be high number
of context switching.

On Fri, Nov 21, 2014 at 4:21 AM, Jan Karlsson 
wrote:

>  Hello,
>
>
>
> We are running a 3 node cluster with RF=3 and 5 clients in a test
> environment. The C* settings are mostly default. We noticed quite high
> context switching during our tests. On 100 000 000 keys/partitions we
> averaged around 260 000 cs (with a max of 530 000).
>
>
>
> We were running 12 000~ transactions per second. 10 000 reads and 2000
> updates.
>
>
>
> Nothing really wrong with that however I would like to understand why
> these numbers are so high. Have others noticed this behavior? How much
> context switching is expected and why? What are the variables that affect
> this?
>
>
>
> /J
>



-- 
Nikolai Grigoriev
(514) 772-5178


Re: Cassandra backup via snapshots in production

2014-11-21 Thread Jens Rantil
> The main purpose is to protect us from human errors (eg. unexpected 
> manipulations: delete, drop tables, …).




If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml 
will be enough to protect you.




Regarding backup, I have a small script that creates a named snapshot and for 
each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It 
took me an hour to write and roll out to all our nodes. The whole process is 
currently logged, but eventually I will also send an e-mail if backup fails.


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO 
wrote:

> Hello all,
> We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 
> nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
> The main purpose is to protect us from human errors (eg. unexpected 
> manipulations: delete, drop tables, …).
> We are thinking of:
> -  Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
> -  Restore: load the most recent snapshots or latest “non-corrupted” 
> ones and replay missing data imports from other data source.
> We would like to know if somebody are using Cassandra’s backup feature in 
> production and could share your experience with us.
> Your help would be greatly appreciated.
> Best regards,
> Minh
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential. 
> If you receive this message in error,or are not the intended recipient(s), 
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its purpose, 
> dissemination or disclosure, either whole or partial, is prohibited. Since 
> the internet 
> cannot guarantee the integrity of this message which may not be reliable, BNP 
> PARIBAS 
> (and its subsidiaries) shall not be liable for the message if modified, 
> changed or falsified. 
> Do not print this message unless it is necessary,consider the environment.
> --
> Ce message et toutes les pieces jointes (ci-apres le "message") 
> sont etablis a l'intention exclusive de ses destinataires et sont 
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
> ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
> publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
> d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message 
> dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie. 
> N'imprimez ce message que si necessaire, pensez a l'environnement.

max ttl for column

2014-11-21 Thread Rajanish GJ
Does hector or cassandra imposes a limit on max ttl value for a column?

I am trying to insert record into one of the column family and seeing the
following error..
Cassandra version : 1.1.12
Hector  : 1.1-4

Any pointers appreciated.

me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:ttl is too large. requested (951027277) maximum
(63072))
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
~[hector-core-1.1-4.jar:na]
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260)
~[hector-core-1.1-4.jar:na]
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
~[hector-core-1.1-4.jar:na]
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
~[hector-core-1.1-4.jar:na]
at
me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeBatch(AbstractColumnFamilyTemplate.java:115)
~[hector-core-1.1-4.jar:na]
at
me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeIfNotBatched(AbstractColumnFamilyTemplate.java:163)
~[hector-core-1.1-4.jar:na]
at
me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.update(ColumnFamilyTemplate.java:69)
~[hector-core-1.1-4.jar:na]

=
*Also tried using cql, and it seems to hangs and not responding.. trying
again with few combinations*

*INSERT INTO users (key,id) values ('test6','951027277 secs') USING
TTL 951027277 ; *



Regards
Rajanish GJ
apigee | rajan...@apigee.com


Re: max ttl for column

2014-11-21 Thread Mark Reddy
Hi Rajanish,

Cassandra imposes a max TTL of 20 years.

public static final int MAX_TTL = 20 * 365 * 24 * 60 * 60; // 20 years in
> seconds


See:
https://github.com/apache/cassandra/blob/8d8fed52242c34b477d0384ba1d1ce3978efbbe8/src/java/org/apache/cassandra/db/ExpiringCell.java#L37



Mark

On 21 November 2014 17:29, Rajanish GJ  wrote:

>
> Does hector or cassandra imposes a limit on max ttl value for a column?
>
> I am trying to insert record into one of the column family and seeing the
> following error..
> Cassandra version : 1.1.12
> Hector  : 1.1-4
>
> Any pointers appreciated.
>
> me.prettyprint.hector.api.exceptions.HInvalidRequestException:
> InvalidRequestException(why:ttl is too large. requested (951027277) maximum
> (63072))
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeBatch(AbstractColumnFamilyTemplate.java:115)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeIfNotBatched(AbstractColumnFamilyTemplate.java:163)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.update(ColumnFamilyTemplate.java:69)
> ~[hector-core-1.1-4.jar:na]
>
> =
> *Also tried using cql, and it seems to hangs and not responding.. trying
> again with few combinations*
>
> *INSERT INTO users (key,id) values ('test6','951027277 secs') USING
> TTL 951027277 ; *
>
>
>
> Regards
> Rajanish GJ
> apigee | rajan...@apigee.com
>


Re: max ttl for column

2014-11-21 Thread Philip Thompson
With the newest versions of Cassandra, cql is not hanging, but returns the
same Invalid Query Exception you are seeing through hector. I would assume
from the exception that 63072 is in fact that largest TTL you can use.
What are you doing that you need to set a TTL of approximately 30 years?

On Fri, Nov 21, 2014 at 11:29 AM, Rajanish GJ  wrote:

>
> Does hector or cassandra imposes a limit on max ttl value for a column?
>
> I am trying to insert record into one of the column family and seeing the
> following error..
> Cassandra version : 1.1.12
> Hector  : 1.1-4
>
> Any pointers appreciated.
>
> me.prettyprint.hector.api.exceptions.HInvalidRequestException:
> InvalidRequestException(why:ttl is too large. requested (951027277) maximum
> (63072))
> at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeBatch(AbstractColumnFamilyTemplate.java:115)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.executeIfNotBatched(AbstractColumnFamilyTemplate.java:163)
> ~[hector-core-1.1-4.jar:na]
> at
> me.prettyprint.cassandra.service.template.ColumnFamilyTemplate.update(ColumnFamilyTemplate.java:69)
> ~[hector-core-1.1-4.jar:na]
>
> =
> *Also tried using cql, and it seems to hangs and not responding.. trying
> again with few combinations*
>
> *INSERT INTO users (key,id) values ('test6','951027277 secs') USING
> TTL 951027277 ; *
>
>
>
> Regards
> Rajanish GJ
> apigee | rajan...@apigee.com
>


bootstrapping node stuck in JOINING state

2014-11-21 Thread Chris Hornung

Hello,

I have been bootstrapping 4 new nodes into an existing production 
cluster. Each node was bootstrapped one at a time, the first 2 
completing without errors, but ran into issues with the 3rd one. The 4th 
node has not been started yet.


On bootstrapping the third node, the data steaming sessions completed 
without issue, but bootstrapping did not finish. The node is stuck in 
JOINING state even 19 hours or so after data streaming completed.


Other reports of this issue seem to be related either to network 
connectivity issues between nodes, or multiple nodes bootstrapping 
simultaneously. I haven't found any evidence of either of these 
situations, no errors or stracktraces in the logs.


I'm just looking for the safest way to proceed - I'm fine with removing 
the hanging node altogether, just looking for confirmation that wouldn't 
leave the cluster in a bad state, and what data points to be looking at 
to gauge the situation.


If removing the node and starting over is OK, is any other maintenance 
on the existing nodes recommended? I've read of people 
scrubbing/rebuilding nodes coming out of this situation, but not sure if 
that's necessary.


Please let me know if any additional info would be helpful.

Thanks!
--
Chris Hornung





Re: Cassandra backup via snapshots in production

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil  wrote:

> > The main purpose is to protect us from human errors (eg. unexpected
> manipulations: delete, drop tables, …).
>
> If that is the main purpose, having "auto_snapshot: true” in
> cassandra.yaml will be enough to protect you.
>

OP includes "delete" in their list of "unexpected manipulations", and
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba


Re: high context switches

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 1:21 AM, Jan Karlsson 
wrote:

>  Nothing really wrong with that however I would like to understand why
> these numbers are so high. Have others noticed this behavior? How much
> context switching is expected and why? What are the variables that affect
> this?
>

I +1 Nikolai's conjecture that you are probably using a very high number of
client threads.

However as a general statement Cassandra is highly multi-threaded. Threads
are assigned within thread pools and these thread pools can be thought of
as a type of processing pipeline, such that one is often the input to
another. When pushing Cassandra near its maximum capacity, you will
therefore spend a lot of time switching between threads.

=Rob
http://twitter.com/rcolidba


Re: bootstrapping node stuck in JOINING state

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 9:44 AM, Chris Hornung 
wrote:

> On bootstrapping the third node, the data steaming sessions completed
> without issue, but bootstrapping did not finish. The node is stuck in
> JOINING state even 19 hours or so after data streaming completed.
>

Stop the joining node. Wipe the data dir including system keyspace.
Re-bootstrap.

=Rob
http://twitter.com/rcolidba


Re: Data not replicating consistently

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 3:38 AM, Rahul Neelakantan  wrote:

> The missing rows never show up in DC2 and DC3 unless I do a CQLsh lookup
> with CL=all
>
> Why is there a difference in the replication between writes performed
> using the datastax drivers and while using CQLsh?
>

If reproducable, this sounds like a bug to me, all writes should show up in
all DCs in relatively short order.

http://issues.apache.org ?

=Rob


Re: Repair completes successfully but data is still inconsistent

2014-11-21 Thread Robert Coli
On Fri, Nov 21, 2014 at 3:11 AM, André Cruz  wrote:

> Can it be that they were all in the middle of a compaction (Leveled
> compaction) and the new sstables were written but the old ones were not
> deleted? Will Cassandra blindly pick up old and new sstables when it
> restarts?
>

Yes.

https://issues.apache.org/jira/browse/CASSANDRA-6756

Also note linked tickets there like :

https://issues.apache.org/jira/browse/CASSANDRA-6503

6503 is fix version 1.2.14 ...


> 1- What is the correct sequence of commands to bring down a node safely? I
> know that “drain" was used here, because it is in the log. I’ve read
> somewhere that drain should not be used and “disablethrift”,
> “disablegossip”, “flush” and then waiting a while is the correct way.
>

"drain" is the canonical answer. But "drain" has historically not always
worked. In general when it hasn't worked, it has flushed properly but not
marked clean properly.

https://issues.apache.org/jira/browse/CASSANDRA-4446

https://issues.apache.org/jira/browse/CASSANDRA-5911

5911 is "too late for 1.2" and will not be merged there.


> 2- Why won’t repair propagate this column value to the other nodes?
> Repairs have run everyday and the value is still missing on the other nodes.
>

No idea. Are you sure it's not expired via TTL or masked in some other way?
When you ask that node for it at CL.ONE, do you get this value?


> 3- If I have these rogue sstables loaded this seems like a time bomb. Down
> the line I will again delete columns that will reappear after some time. Is
> there a way I can find these sstables that should not be there? I thought
> the timestamp of the file would help but this zombie column is present on
> one of the latest sstables.
>

Unfortunately, not really. You'd need the patch from CASSANDRA-6568 and
you'd need to continuously have been capturing your live SStable set to
determine when

https://issues.apache.org/jira/browse/CASSANDRA-6568

The experience you're having is, AFAICT, irrecoverably fatal to
consistency. This is why, at the summit this year, when Apple announced it
had encountered and fixed like 5 bugs of this type, I summarized their talk
in chats as "the talk where Apple showed that, despite what you've heard
about quorum reads and writes, Cassandra has never stored data consistently
except by fortunate accident."

=Rob
http://twitter.com/rcolidba


Partial row read

2014-11-21 Thread Andy Stec
We're getting strange results when reading from Cassandra 2.0 in php using
this driver:

http://code.google.com/a/apache-extras.org/p/cassandra-pdo/


Here's the schema:

CREATE TABLE events (
  day text,
  last_event text,
  event_text text,
  mdn text,
  PRIMARY KEY ((day), last_event)
) WITH CLUSTERING ORDER BY (last_event DESC) AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.00 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};


Here's the database contents:

cqlsh:msa> SELECT * FROM events;
 day| last_event  | event_text | mdn
+-++
 141121 | 2014-46-21 20:46:45 | event text | 847111
(1 rows)


Here's the simple program in php that reads the database:

exec ("USE msa");
$stmt = $db->query ("SELECT day, last_event, mdn, event_text FROM events");
if ($stmt === false)
{
   var_dump ($db->errorInfo ());
}
else
{
   $stmt->execute ();
   var_dump ($stmt->fetchAll());
}
?>

And this is the output the program produces.  Why is it not returning the
full row?


array(1) {
  [0]=>
  array(6) {
["day"]=>
string(6) "141121"
[0]=>
string(6) "141121"
[""]=>
string(0) ""
[1]=>
string(0) ""
[2]=>
string(0) ""
[3]=>
string(0) ""
  }
}


Re: read repair across DC and latency

2014-11-21 Thread Tyler Hobbs
On Wed, Nov 19, 2014 at 4:51 PM, Jimmy Lin  wrote:

>
> #
> When you said send "read digest request" to the rest of the replica, do
> you mean all replica(s) in current and other DC? or just the one last
> replica in my current DC and one of the co-ordinate node in other DC?
>
> (our read and write is all "local_quorum" of replication factor of 3,
> local_dc_repair_chance=0))
>

For read_repair_chance, all replicas get one of two messages: a data
request, or a digest request.  Two of the nodes will get a data request,
the others will get digest requests.  The behavior is different for
local_dc_repair_chance.


>
> #
> Sending "read digest request" to other DC, happen sequently correct? If
> network latency between DC is bad during time, will that affect overall
> read latency?
>

No, the coordinator will not wait for those responses before replying to
the client.  It will only wait for enough responses to satisfy the
consistency level.  The rest are handled asynchronously in the background.


>
> #
> We observe that one of our cql query perform okay during normal load, but
> degrade greatly when we have batch of  same cql(looking for the exact
> columns and key) sending to server in short period of time(say 100 of them
> within a sec).
> Our other table or keyspace don't see any latency drop during the time, so
> i am not sure we are hitting the capacity yet. So we suspect read_repair
> chance may have something to do wit it.
> Anything we can look into and see what may cause the latency spike when we
> have large number of same cql hitting the server?
>

I doubt read repair is related.  I would try tracing a few of your queries.




-- 
Tyler Hobbs
DataStax