Re: Weird timeouts

2014-03-10 Thread Joel Samuelsson
I am on Cassandra 2.0.5. How can I use the trace functionality?

I did not check for exceptions. I will rerun and check.

Thanks for suggestions.

/Joel


2014-03-07 17:54 GMT+01:00 Duncan Sands :

> Hi Joel,
>
>
> On 07/03/14 15:22, Joel Samuelsson wrote:
>
>> I try to fetch all the row keys from a column family (there should only
>> be a
>> couple of hundred in that CF) in several different ways but I get timeouts
>> whichever way I try:
>>
>
> did you check the node logs for exceptions?  You can get this kind of
> thing if there is an assertion failure when reading a particular row due to
> corruption for example.
>
> Ciao, Duncan.
>
>
>
>> Through the cassandra cli:
>> Fetching 45 rows is fine:
>> list cf limit 46 columns 0;
>> .
>> .
>> .
>> 45 Rows Returned.
>> Elapsed time: 298 msec(s).
>>
>> Fetching 46 rows however gives me a timeout after a minute or so:
>> list cf limit 46 columns 0;
>> null
>> TimedOutException()...
>>
>> Through pycassa:
>> keys = cf.get_range(column_count = 1, buffer_size = 2)
>>
>> for key, val in keys:
>>   print key
>>
>> This prints some keys and then gets stuck at the same place each time and
>> then
>> timeouts.
>>
>> The columns (column names + value) in the rows should be less than 100
>> bytes
>> each, though there may be a lot of them on a particular row.
>>
>> To me it seems like one of the rows take too long time to fetch but I
>> don't know
>> why since I am limitiing the number of columns to 0. Without seeing the
>> row, I
>> have a hard time knowing what could be wrong. Do you have any ideas?
>>
>>
>>
>


Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-10 Thread Oleg Dulin

I get that :)

What I'd like to know is how to fix that :)

On 2014-03-09 20:24:54 +, Takenori Sato said:

You have millions of org.apache.cassandra.db.DeletedColumn instances on 
the snapshot.


This means you have lots of column tombstones, and I guess, which are 
read into memory by slice query. 



On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin  wrote:
I am trying to understand why one of my nodes keeps full GC.

I have Xmx set to 8gigs, memtable total size is 2 gigs.

Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ

--
Regards,
Oleg Dulin
http://www.olegdulin.com



--
Regards,
Oleg Dulin
http://www.olegdulin.com




RE: How many Data centers can Cassandra support?

2014-03-10 Thread Lu, Boying
This is very helpful.

Thanks a lot :)

From: Tupshin Harper [mailto:tups...@tupshin.com]
Sent: 2014年3月9日 23:36
To: user@cassandra.apache.org
Subject: Re: How many Data centers can Cassandra support?


20, easily.  Probably far more, but I lack data points beyond that.

-Tupshin
On Mar 9, 2014 10:26 AM, "Lu, Boying" 
mailto:boying...@emc.com>> wrote:
Hi, experts,

Since the Cassandra 2.x supports DB that across multiple DCs,  my question is 
how many DCs can Cassandra support in practice?

Thanks

Boying



Replication with virtual nodes

2014-03-10 Thread motta.lrd
Hello, 

I have just learnt about virtual nodes in Cassandra. 
Let's assume this scenario where we have 16 tokens, 4 physical nodes, and
each physical node is responsible of 4 tokens. 


 

If we have a replication factor of 2, how the replication works? 
Where does token *1* will be replicated? 

Thank you 



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replication-with-virtual-nodes-tp7593310.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
I installed DSC 2.0.5 on ubuntu 12.04 with Oracle JRE 7 but dsc 2.0.5 does
not start after installation. When I check the running status..

*$ sudo service cassandra status*


it says

> ** could not access pidfile for Cassandra*


& no other messages or anything in logs.

This is happening with 2.0.5 but not with 2.0.4.

Did anyone else came across this issue ? Any idea how to fix this ?


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
If it was not very clear, I could not manually start cassandra

*$ sudo service cassandra start*



** could not access pidfile for Cassandra*






On Mon, Mar 10, 2014 at 5:41 PM, user 01  wrote:

> I installed DSC 2.0.5 on ubuntu 12.04 with Oracle JRE 7 but dsc 2.0.5 does
> not start after installation. When I check the running status..
>
> *$ sudo service cassandra status*
>
>
> it says
>
>> ** could not access pidfile for Cassandra*
>
>
> & no other messages or anything in logs.
>
> This is happening with 2.0.5 but not with 2.0.4.
>
> Did anyone else came across this issue ? Any idea how to fix this ?
>
>
>
>
>
>


sending notifications through data replication on remote clusters

2014-03-10 Thread DE VITO Dominique
Hi,

I have the following use case:
If I update a data on DC1, I just want apps "connected-first" to DC2 to be 
informed when this data is available on DC2 after replication.

When using Thrift, one way could be to modify CassandraServer class, to send 
notification to apps according to data coming in into the coordinator node of 
DC2.
Is it "common" (~ the way to do it) ?
Is there another way to do so ?

When using CQL, is there a precise "src code" place to modify for the same 
purpose ?

Thanks.

Regards,
Dominique



Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-10 Thread Jonathan Lacefield
Hello,

  You have several options:

  1) going forward lower gc_grace_seconds
http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configStorage_r.html?pagename=docs&version=1.2&file=configuration/storage_configuration#gc-grace-seconds
   - this is very use case specific.  Default is 10 days.  Some users
will put this at 0 for specific use cases.
  2) you could also lower tombstone compaction threshold and interval to
get tombstone compaction to fire more often on your tables/cfs:
https://datastax.jira.com/wiki/pages/viewpage.action?pageId=54493436
  3) to clean out old tombstones you could always run a manual compaction,
those these aren't typically recommended though:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsNodetool_r.html

  For 1 and 2, be sure your disks can keep up with compaction to ensure
tombstone, or other, compaction fires regularly enough to clean out old
tombstones.  Also, you probably want to ensure you are using Level
Compaction:  http://www.datastax.com/dev/blog/when-to-use-leveled-compaction.
 Again, this assumes your disk system can handle the increased io from
Leveled Compaction.

  Also, you may be running into this with the older version of Cassandra:
https://issues.apache.org/jira/browse/CASSANDRA-6541

  Hope this helps.

Jonathan


Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487






On Mon, Mar 10, 2014 at 6:41 AM, Oleg Dulin  wrote:

> I get that :)
>
> What I'd like to know is how to fix that :)
>
> On 2014-03-09 20:24:54 +, Takenori Sato said:
>
>  You have millions of org.apache.cassandra.db.DeletedColumn instances on
>> the snapshot.
>>
>> This means you have lots of column tombstones, and I guess, which are
>> read into memory by slice query.
>>
>>
>> On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin  wrote:
>> I am trying to understand why one of my nodes keeps full GC.
>>
>> I have Xmx set to 8gigs, memtable total size is 2 gigs.
>>
>> Consider the top entries from jmap -histo:live @
>> http://pastebin.com/UaatHfpJ
>>
>> --
>> Regards,
>> Oleg Dulin
>> http://www.olegdulin.com
>>
>
>
> --
> Regards,
> Oleg Dulin
> http://www.olegdulin.com
>
>
>


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Duncan Sands

Hi user 01,

On 10/03/14 13:11, user 01 wrote:

I installed DSC 2.0.5 on ubuntu 12.04 with Oracle JRE 7 but dsc 2.0.5 does not
start after installation. When I check the running status..

*$ sudo service cassandra status*


it says

** could not access pidfile for Cassandra*


& no other messages or anything in logs.

This is happening with 2.0.5 but not with 2.0.4.

Did anyone else came across this issue ? Any idea how to fix this ?


this probably means that the Cassandra process exited.  For example, if there is 
an incorrect value in cassandra.yaml then the process will start (so "sudo 
service cassandra start" will return a success code), but will then exit once it 
discovers the wrong value.  I suggest you start Cassandra in the foreground 
(with -f IIRC), so you get all error output directly on your console.


Ciao, Duncan.












Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-10 Thread Keith Wright
I also want to point out an issue I filed that was closed as not an issue:  
CASSANDRA-6654   Basically if you’re using mixed TTLs on columns in a row, the 
“oldest” TTL is used to determine if tombstones of other columns can be 
removed. In other words, if you have a column with a 1 day TTL and a column 
with a 1 year TTL, the 1 day TTLed data (now tombstoned) will not be removed on 
compaction.

From: Jonathan Lacefield 
mailto:jlacefi...@datastax.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, March 10, 2014 at 8:33 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

Hello,

  You have several options:

  1) going forward lower gc_grace_seconds 
http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configStorage_r.html?pagename=docs&version=1.2&file=configuration/storage_configuration#gc-grace-seconds
   - this is very use case specific.  Default is 10 days.  Some users will 
put this at 0 for specific use cases.
  2) you could also lower tombstone compaction threshold and interval to get 
tombstone compaction to fire more often on your tables/cfs:  
https://datastax.jira.com/wiki/pages/viewpage.action?pageId=54493436
  3) to clean out old tombstones you could always run a manual compaction, 
those these aren't typically recommended though:  
http://www.datastax.com/documentation/cassandra/1.2/cassandra/tools/toolsNodetool_r.html

  For 1 and 2, be sure your disks can keep up with compaction to ensure 
tombstone, or other, compaction fires regularly enough to clean out old 
tombstones.  Also, you probably want to ensure you are using Level Compaction:  
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction.  Again, this 
assumes your disk system can handle the increased io from Leveled Compaction.

  Also, you may be running into this with the older version of Cassandra: 
https://issues.apache.org/jira/browse/CASSANDRA-6541

  Hope this helps.

Jonathan


Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
[http://s.c.lnkd.licdn.com/scds/common/u/img/logos/logo_linkedin_92x22.png]


[http://www.datastax.com/wp-content/themes/datastax-2013/images/email-sig/email-sig-virtual-training.png]


On Mon, Mar 10, 2014 at 6:41 AM, Oleg Dulin 
mailto:oleg.du...@gmail.com>> wrote:
I get that :)

What I'd like to know is how to fix that :)

On 2014-03-09 20:24:54 +, Takenori Sato said:

You have millions of org.apache.cassandra.db.DeletedColumn instances on the 
snapshot.

This means you have lots of column tombstones, and I guess, which are read into 
memory by slice query.


On Sun, Mar 9, 2014 at 10:55 PM, Oleg Dulin 
mailto:oleg.du...@gmail.com>> wrote:
I am trying to understand why one of my nodes keeps full GC.

I have Xmx set to 8gigs, memtable total size is 2 gigs.

Consider the top entries from jmap -histo:live @ http://pastebin.com/UaatHfpJ

--
Regards,
Oleg Dulin
http://www.olegdulin.com


--
Regards,
Oleg Dulin
http://www.olegdulin.com





Re: sending notifications through data replication on remote clusters

2014-03-10 Thread Michael Shuler

On 03/10/2014 07:49 AM, DE VITO Dominique wrote:

If I update a data on DC1, I just want apps “connected-first” to DC2 to
be informed when this data is available on DC2 after replication.


If I run a SELECT, I'm going to receive the latest data per the read 
conditions (ONE, TWO, QUORUM), regardless of location of the client 
connection. If using network aware topology, you'll get the most current 
data in that DC.



When using Thrift, one way could be to modify CassandraServer class, to
send notification to apps according to data coming in into the
coordinator node of DC2.

Is it “common” (~ the way to do it) ?

Is there another way to do so ?

When using CQL, is there a precise “src code” place to modify for the
same purpose ?


Notifying connected clients about random INSERT or UPDATE statements 
that ran somewhere seems to be far, far outside the scope of storing 
data. Just configure your client to SELECT in the manner that you need.


I may not fully understand your problem and could be simplifying things 
in my head, so feel free to expand.


--
Michael


RE: sending notifications through data replication on remote clusters

2014-03-10 Thread DE VITO Dominique
> On 03/10/2014 07:49 AM, DE VITO Dominique wrote:
> > If I update a data on DC1, I just want apps "connected-first" to DC2 
> > to be informed when this data is available on DC2 after replication.
> 
> If I run a SELECT, I'm going to receive the latest data per the read 
> conditions (ONE, TWO, QUORUM), regardless of location of the client 
> connection. If using > network aware topology, you'll get the most current 
> data in that DC.
> 
> > When using Thrift, one way could be to modify CassandraServer class, 
> > to send notification to apps according to data coming in into the 
> > coordinator node of DC2.
> >
> > Is it "common" (~ the way to do it) ?
> >
> > Is there another way to do so ?
> >
> > When using CQL, is there a precise "src code" place to modify for the 
> > same purpose ?
> 
> Notifying connected clients about random INSERT or UPDATE statements that ran 
> somewhere seems to be far, far outside the scope of storing data. Just 
> configure your client to SELECT in the manner that you need.
> 
> I may not fully understand your problem and could be simplifying things in my 
> head, so feel free to expand.
> 
> --
> Michael

First of all, thanks for you answer and your attention.

I know about SELECT.
The idea, here, is to avoid doing POLLING regularly, as it could be easily a 
performance nightmare.
The idea is to replace POLLING with PUSH, just like in many cases like SEDA 
architecture, or CQRS architecture, or continuous querying with some data 
stores.

So, following this PUSH idea, it would be nice to inform apps connected to a 
preferred DC that some new data have been replicated, and is now "available". 

I hope it's clearer.

Dominique




Re: sending notifications through data replication on remote clusters

2014-03-10 Thread Eric Plowe
You should be able to achieve what you're looking for with a trigger vs. a
modification to the core of Cassandra.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support




On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique <
dominique.dev...@thalesgroup.com> wrote:

> > On 03/10/2014 07:49 AM, DE VITO Dominique wrote:
> > > If I update a data on DC1, I just want apps "connected-first" to DC2
> > > to be informed when this data is available on DC2 after replication.
> >
> > If I run a SELECT, I'm going to receive the latest data per the read
> conditions (ONE, TWO, QUORUM), regardless of location of the client
> connection. If using > network aware topology, you'll get the most current
> data in that DC.
> >
> > > When using Thrift, one way could be to modify CassandraServer class,
> > > to send notification to apps according to data coming in into the
> > > coordinator node of DC2.
> > >
> > > Is it "common" (~ the way to do it) ?
> > >
> > > Is there another way to do so ?
> > >
> > > When using CQL, is there a precise "src code" place to modify for the
> > > same purpose ?
> >
> > Notifying connected clients about random INSERT or UPDATE statements
> that ran somewhere seems to be far, far outside the scope of storing data.
> Just configure your client to SELECT in the manner that you need.
> >
> > I may not fully understand your problem and could be simplifying things
> in my head, so feel free to expand.
> >
> > --
> > Michael
>
> First of all, thanks for you answer and your attention.
>
> I know about SELECT.
> The idea, here, is to avoid doing POLLING regularly, as it could be easily
> a performance nightmare.
> The idea is to replace POLLING with PUSH, just like in many cases like
> SEDA architecture, or CQRS architecture, or continuous querying with some
> data stores.
>
> So, following this PUSH idea, it would be nice to inform apps connected to
> a preferred DC that some new data have been replicated, and is now
> "available".
>
> I hope it's clearer.
>
> Dominique
>
>
>


about trigger execution ??? // RE: sending notifications through data replication on remote clusters

2014-03-10 Thread DE VITO Dominique
> You should be able to achieve what you're looking for with a trigger vs. a 
> modification to the core of Cassandra.
>
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support

Well, good point.

It leads to the question: (a) are triggers executed on all (local+remote) 
coordinator nodes (and then, N DC => N coordinator nodes => N executions of the 
triggers) ?

(b) Or are triggers executed only on the first coordinator node, and not the 
(next/remote DC) coordinator nodes ?

My opinion is (b), and in that case, triggers won't do the job.
(b) would make sense, because the first coordinator node would augment original 
row mutations and propagate them towards other coordinator nodes. Then, no need 
to execute triggers on other (remote) coordinator nodes.

Is there somebody knowing about trigger execution : is it (a) or (b) ?

Thanks.

Dominique




On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique 
 wrote:
> On 03/10/2014 07:49 AM, DE VITO Dominique wrote:
> > If I update a data on DC1, I just want apps "connected-first" to DC2
> > to be informed when this data is available on DC2 after replication.
>
> If I run a SELECT, I'm going to receive the latest data per the read 
> conditions (ONE, TWO, QUORUM), regardless of location of the client 
> connection. If using > network aware topology, you'll get the most current 
> data in that DC.
>
> > When using Thrift, one way could be to modify CassandraServer class,
> > to send notification to apps according to data coming in into the
> > coordinator node of DC2.
> >
> > Is it "common" (~ the way to do it) ?
> >
> > Is there another way to do so ?
> >
> > When using CQL, is there a precise "src code" place to modify for the
> > same purpose ?
>
> Notifying connected clients about random INSERT or UPDATE statements that ran 
> somewhere seems to be far, far outside the scope of storing data. Just 
> configure your client to SELECT in the manner that you need.
>
> I may not fully understand your problem and could be simplifying things in my 
> head, so feel free to expand.
>
> --
> Michael
First of all, thanks for you answer and your attention.

I know about SELECT.
The idea, here, is to avoid doing POLLING regularly, as it could be easily a 
performance nightmare.
The idea is to replace POLLING with PUSH, just like in many cases like SEDA 
architecture, or CQRS architecture, or continuous querying with some data 
stores.

So, following this PUSH idea, it would be nice to inform apps connected to a 
preferred DC that some new data have been replicated, and is now "available".

I hope it's clearer.

Dominique




Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
Thanks Duncan. very helpful response indeed! Now I can run cassandra very
properly in the foreground using* cassandra -f  *but when I try to run
cassandra as a service in ubuntu that fails. Initially I get the status as
*running* but very soon it says* * could not access pidfile for Cassandra*



On Mon, Mar 10, 2014 at 7:04 PM, Duncan Sands wrote:

> Hi user 01,
>
>
> On 10/03/14 13:11, user 01 wrote:
>
>> I installed DSC 2.0.5 on ubuntu 12.04 with Oracle JRE 7 but dsc 2.0.5
>> does not
>> start after installation. When I check the running status..
>>
>> *$ sudo service cassandra status*
>>
>>
>> it says
>>
>> ** could not access pidfile for Cassandra*
>>
>>
>>
>> & no other messages or anything in logs.
>>
>> This is happening with 2.0.5 but not with 2.0.4.
>>
>> Did anyone else came across this issue ? Any idea how to fix this ?
>>
>
> this probably means that the Cassandra process exited.  For example, if
> there is an incorrect value in cassandra.yaml then the process will start
> (so "sudo service cassandra start" will return a success code), but will
> then exit once it discovers the wrong value.  I suggest you start Cassandra
> in the foreground (with -f IIRC), so you get all error output directly on
> your console.
>
> Ciao, Duncan.
>
>
>
>>
>>
>>
>>
>>
>


Re: about trigger execution ??? // RE: sending notifications through data replication on remote clusters

2014-03-10 Thread Edward Capriolo
Just so you know you should probably apply the
[jira] [Commented] (*CASSANDRA*-6790) *Triggers* are broken in trunk
*...*

patch because triggers are currently only called on batch_mutate and will
fail if called on insert.


On Mon, Mar 10, 2014 at 10:50 AM, DE VITO Dominique <
dominique.dev...@thalesgroup.com> wrote:

> > You should be able to achieve what you're looking for with a trigger vs.
> a modification to the core of Cassandra.
>
> >
>
> >
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support
>
>
>
> Well, good point.
>
>
>
> It leads to the question: (a) are triggers executed on all (local+remote)
> coordinator nodes (and then, N DC => N coordinator nodes => N executions of
> the triggers) ?
>
>
>
> (b) Or are triggers executed only on the first coordinator node, and not
> the (next/remote DC) coordinator nodes ?
>
>
>
> My opinion is (b), and in that case, triggers won't do the job.
>
> (b) would make sense, because the first coordinator node would augment
> original row mutations and propagate them towards other coordinator nodes.
> Then, no need to execute triggers on other (remote) coordinator nodes.
>
>
>
> Is there somebody knowing about trigger execution : is it (a) or (b) ?
>
>
>
> Thanks.
>
>
>
> Dominique
>
>
>
>
>
>
>
>
>
> On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique <
> dominique.dev...@thalesgroup.com> wrote:
>
> > On 03/10/2014 07:49 AM, DE VITO Dominique wrote:
>
> > > If I update a data on DC1, I just want apps "connected-first" to DC2
>
> > > to be informed when this data is available on DC2 after replication.
>
> >
>
> > If I run a SELECT, I'm going to receive the latest data per the read
> conditions (ONE, TWO, QUORUM), regardless of location of the client
> connection. If using > network aware topology, you'll get the most current
> data in that DC.
>
> >
>
> > > When using Thrift, one way could be to modify CassandraServer class,
>
> > > to send notification to apps according to data coming in into the
>
> > > coordinator node of DC2.
>
> > >
>
> > > Is it "common" (~ the way to do it) ?
>
> > >
>
> > > Is there another way to do so ?
>
> > >
>
> > > When using CQL, is there a precise "src code" place to modify for the
>
> > > same purpose ?
>
> >
>
> > Notifying connected clients about random INSERT or UPDATE statements
> that ran somewhere seems to be far, far outside the scope of storing data.
> Just configure your client to SELECT in the manner that you need.
>
> >
>
> > I may not fully understand your problem and could be simplifying things
> in my head, so feel free to expand.
>
> >
>
> > --
>
> > Michael
>
> First of all, thanks for you answer and your attention.
>
>
>
> I know about SELECT.
>
> The idea, here, is to avoid doing POLLING regularly, as it could be easily
> a performance nightmare.
>
> The idea is to replace POLLING with PUSH, just like in many cases like
> SEDA architecture, or CQRS architecture, or continuous querying with some
> data stores.
>
>
>
> So, following this PUSH idea, it would be nice to inform apps connected to
> a preferred DC that some new data have been replicated, and is now
> "available".
>
>
>
> I hope it's clearer.
>
>
>
> Dominique
>
>
>
>
>


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Michael Shuler

As the cassandra user?  (is your install from packages or tar?)

$ sudo -iu cassandra
#  you should now be the cassandra user
$ cassandra -f -p /var/run/cassandra/cassandra.pid
#  not sure if -f also works with -p - might drop -f and look at
#  /var/log/cassandra/system.log

--
Michael

On 03/10/2014 10:38 AM, user 01 wrote:

Thanks Duncan. very helpful response indeed! Now I can run cassandra
very properly in the foreground using*cassandra -f *but when I try to
run cassandra as a service in ubuntu that fails. Initially I get the
status as *running* but very soon it says* * could not access pidfile
for Cassandra*



On Mon, Mar 10, 2014 at 7:04 PM, Duncan Sands mailto:duncan.sa...@gmail.com>> wrote:

Hi user 01,


On 10/03/14 13:11, user 01 wrote:

I installed DSC 2.0.5 on ubuntu 12.04 with Oracle JRE 7 but dsc
2.0.5 does not
start after installation. When I check the running status..

 *$ sudo service cassandra status*


it says

 ** could not access pidfile for Cassandra*



& no other messages or anything in logs.

This is happening with 2.0.5 but not with 2.0.4.

Did anyone else came across this issue ? Any idea how to fix this ?


this probably means that the Cassandra process exited.  For example,
if there is an incorrect value in cassandra.yaml then the process
will start (so "sudo service cassandra start" will return a success
code), but will then exit once it discovers the wrong value.  I
suggest you start Cassandra in the foreground (with -f IIRC), so you
get all error output directly on your console.

Ciao, Duncan.













[RELEASE] Apache Cassandra 2.0.6 released

2014-03-10 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.6.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/UXgyZh (CHANGES.txt)
[2]: http://goo.gl/VxSAiN (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


RE: about trigger execution ??? // RE: sending notifications through data replication on remote clusters

2014-03-10 Thread DE VITO Dominique
Thanks a lot.

[@@ THALES GROUP INTERNAL @@]

De : Edward Capriolo [mailto:edlinuxg...@gmail.com]
Envoyé : lundi 10 mars 2014 16:47
À : user@cassandra.apache.org
Objet : Re: about trigger execution ??? // RE: sending notifications through 
data replication on remote clusters

Just so you know you should probably apply the
[jira] [Commented] (CASSANDRA-6790) Triggers are broken in trunk 
...

patch because triggers are currently only called on batch_mutate and will fail 
if called on insert.

On Mon, Mar 10, 2014 at 10:50 AM, DE VITO Dominique 
mailto:dominique.dev...@thalesgroup.com>> 
wrote:
> You should be able to achieve what you're looking for with a trigger vs. a 
> modification to the core of Cassandra.
>
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support

Well, good point.

It leads to the question: (a) are triggers executed on all (local+remote) 
coordinator nodes (and then, N DC => N coordinator nodes => N executions of the 
triggers) ?

(b) Or are triggers executed only on the first coordinator node, and not the 
(next/remote DC) coordinator nodes ?

My opinion is (b), and in that case, triggers won't do the job.
(b) would make sense, because the first coordinator node would augment original 
row mutations and propagate them towards other coordinator nodes. Then, no need 
to execute triggers on other (remote) coordinator nodes.

Is there somebody knowing about trigger execution : is it (a) or (b) ?

Thanks.

Dominique




On Mon, Mar 10, 2014 at 10:06 AM, DE VITO Dominique 
mailto:dominique.dev...@thalesgroup.com>> 
wrote:
> On 03/10/2014 07:49 AM, DE VITO Dominique wrote:
> > If I update a data on DC1, I just want apps "connected-first" to DC2
> > to be informed when this data is available on DC2 after replication.
>
> If I run a SELECT, I'm going to receive the latest data per the read 
> conditions (ONE, TWO, QUORUM), regardless of location of the client 
> connection. If using > network aware topology, you'll get the most current 
> data in that DC.
>
> > When using Thrift, one way could be to modify CassandraServer class,
> > to send notification to apps according to data coming in into the
> > coordinator node of DC2.
> >
> > Is it "common" (~ the way to do it) ?
> >
> > Is there another way to do so ?
> >
> > When using CQL, is there a precise "src code" place to modify for the
> > same purpose ?
>
> Notifying connected clients about random INSERT or UPDATE statements that ran 
> somewhere seems to be far, far outside the scope of storing data. Just 
> configure your client to SELECT in the manner that you need.
>
> I may not fully understand your problem and could be simplifying things in my 
> head, so feel free to expand.
>
> --
> Michael
First of all, thanks for you answer and your attention.

I know about SELECT.
The idea, here, is to avoid doing POLLING regularly, as it could be easily a 
performance nightmare.
The idea is to replace POLLING with PUSH, just like in many cases like SEDA 
architecture, or CQRS architecture, or continuous querying with some data 
stores.

So, following this PUSH idea, it would be nice to inform apps connected to a 
preferred DC that some new data have been replicated, and is now "available".

I hope it's clearer.

Dominique





Re: Replication with virtual nodes

2014-03-10 Thread motta.lrd
I will reply to myself and raise a flag in case someone is interested.

Assuming the tokens are replicated as follows:
"""
Request to update row X
Compute the token from the row key  
Identify the server with that token  
Place one replica there  
Increment the token until you get to a different server  
Place the next token there  
"""

Then the number of virtual nodes may affect the availability.
If we consider the previous example.

Let’s call the tokens as Tx and the server as Sx
where x is the number of the token or server respectively.

With a RF = 3 this means that on S1:
T1 replicated to S2 (owner of T2) and S4 (owner of T3)
T5 replicated to S2 (owner of T6) and S4 (owner of T7)
T13 replicated to S2 (owner of T14) and S3 (owner of T15)
T9 replicated to S2 (owner of T10) and S3 (owner of T11)

These are the possible data loss scenarios involving S1:

I lose S1, S2 and S3  =>  I lost T13 and T9 
I lose S1, S2 and S4  =>  I lost T1 and T5 


==> This is a lower availability with respect to a scenario in which each
server has only one token.
Consider this simple case 

 

Again with RF = 3 I will have:
T1 is replicated in S2 and S3.

The only data loss scenario involving S1 is the following:
I lose S1, S2, and S3  =>  I lost token 1

which is 1 less then the previous case.
the gap between the two increases with the number of virtual nodes.

can anyone confirm this conjecture?
thanks



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replication-with-virtual-nodes-tp7593310p7593326.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


NULL in CQL

2014-03-10 Thread Rahul Gupta
Hi,

How do I check for NULL values in CQL3?
I am trying to write a CQL equivalent for below SQL:

SELECT * FROM table1 WHERE col2 IS NULL;

While inserting data into C*, CQL won't let me insert NULL, I have to pass '' 
for Strings and 0 for integers.
I have '' and 0s as valid data so that's conflicting with fillers I have to use 
instead of NULL.

Please advise.

Thanks,
Rahul Gupta
DEKA Research & Development
340 Commercial St  Manchester, NH  03101
P: 603.666.3908 extn. 6504 | C: 603.718.9676

This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.



This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
When running as root everything works fine, however when I run your
commands a normal user, here is what happens..



> user01@server1:~$
> *sudo -iu cassandra*

user01@server1:~$
> * cassandra -f -p /var/run/cassandra/cassandra.pid*

*...*

ERROR 16:46:09,355 Exception encountered during startup

java.lang.AssertionError: Directory /var/lib/cassandra/data is not
> accessible.

..

java.io.FileNotFoundException: /var/log/cassandra/system.log (Permission
> denied)

...

...



Here are the folder permissions..

user01@server1:~$ ls /var/lib/cassandra -lh

total 12K

drwxr-xr-x 2 cassandra cassandra 4.0K Mar 10 16:43 commitlog

drwxr-xr-x 4 cassandra cassandra 4.0K Mar 10 14:02 data

drwxr-xr-x 2 cassandra cassandra 4.0K Mar 10 11:34 saved_caches


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Michael Shuler

On 03/10/2014 11:56 AM, user 01 wrote:

When running as root everything works fine, however when I run your
commands a normal user, here is what happens..

user01@server1:~$ sudo -iu cassandra


Yeah, you never actually became the cassandra user.  Of course, root 
works ;)



user01@server1:~$ cassandra -f -p /var/run/cassandra/cassandra.pid


  ^^  user01

So, the rest is expected, since you have no permissions.


ERROR 16:46:09,355 Exception encountered during startup
java.lang.AssertionError: Directory /var/lib/cassandra/data is
not accessible.
..
java.io.FileNotFoundException: /var/log/cassandra/system.log
(Permission denied)
...

Here are the folder permissions..

user01@server1:~$ ls /var/lib/cassandra -lh
total 12K
drwxr-xr-x 2 cassandra cassandra 4.0K Mar 10 16:43 commitlog
drwxr-xr-x 4 cassandra cassandra 4.0K Mar 10 14:02 data
drwxr-xr-x 2 cassandra cassandra 4.0K Mar 10 11:34 saved_caches


maybe:
$ sudo su - cassandra

Make sure you are actually the cassandra user, then try again?

--
Michael



Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
>
> $* sudo su - cassandra*


I don't know why but this isn't actually working. It does not switch me to
*cassandra* user[btw .. should this actually switch me to cassandra user??
]. This user switching on my servers does not work for users like
*tomcat7*user,
*cassandra* user but works for users that were manually created by user.
Actually I tested this on two of my test servers but same results on both.


RE: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Donald Smith
You may need to do "chown -R cassandra /var/lib/cassandra /var/log/cassandra" .

Don

From: user 01 [mailto:user...@gmail.com]
Sent: Monday, March 10, 2014 10:23 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for 
Cassandra"

$ sudo su - cassandra

I don't know why but this isn't actually working. It does not switch me to 
cassandra user[btw .. should this actually switch me to cassandra user?? ]. 
This user switching on my servers does not work for users like tomcat7 user, 
cassandra user but works for users that were manually created by user. Actually 
I tested this on two of my test servers but same results on both.


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Sholes, Joshua
Depending on how it was installed, try making sure that the cassandra user has 
an actual login shell defined in /etc/passwd and not something like 
/sbin/nologin or /bin/false.


From: user 01 mailto:user...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, March 10, 2014 at 1:22 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for 
Cassandra"

$ sudo su - cassandra

I don't know why but this isn't actually working. It does not switch me to 
cassandra user[btw .. should this actually switch me to cassandra user?? ]. 
This user switching on my servers does not work for users like tomcat7 user, 
cassandra user but works for users that were manually created by user. Actually 
I tested this on two of my test servers but same results on both.


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
@Donald:
This did not worked out.. Also as I already have shown above (I think)
folder permissions seem to be correct for both *cassandra
/var/lib/cassandra*  & /*var/log/cassandra *folders



On Mon, Mar 10, 2014 at 11:06 PM, Donald Smith <
donald.sm...@audiencescience.com> wrote:

>  You may need to do "chown -R cassandra /var/lib/cassandra
> /var/log/cassandra" .
>
>
>
> Don
>
>
>
> *From:* user 01 [mailto:user...@gmail.com]
> *Sent:* Monday, March 10, 2014 10:23 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra DSC 2.0.5 not starting - "* could not access
> pidfile for Cassandra"
>
>
>
> $* sudo su - cassandra*
>
>
>
> I don't know why but this isn't actually working. It does not switch me to
> *cassandra* user[btw .. should this actually switch me to cassandra
> user?? ]. This user switching on my servers does not work for users like
> *tomcat7* user, *cassandra* user but works for users that were manually
> created by user. Actually I tested this on two of my test servers but same
> results on both.
>


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
@Sholes, Joshua: It was installed using Datastax DSC20 package for
cassandra 2.0.5.

Checked out * /etc/passwd . *An entry for cassandra user does seem to
exist. Looks like this:


*cassandra:x:107:113:Cassandra database,,,:/var/lib/cassandra:/bin/false*


On Mon, Mar 10, 2014 at 11:21 PM, user 01  wrote:

> @Donald:
> This did not worked out.. Also as I already have shown above (I think)
> folder permissions seem to be correct for both *cassandra
> /var/lib/cassandra*  & /*var/log/cassandra *folders
>
>
>
> On Mon, Mar 10, 2014 at 11:06 PM, Donald Smith <
> donald.sm...@audiencescience.com> wrote:
>
>>  You may need to do "chown -R cassandra /var/lib/cassandra
>> /var/log/cassandra" .
>>
>>
>>
>> Don
>>
>>
>>
>> *From:* user 01 [mailto:user...@gmail.com]
>> *Sent:* Monday, March 10, 2014 10:23 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Cassandra DSC 2.0.5 not starting - "* could not access
>> pidfile for Cassandra"
>>
>>
>>
>> $* sudo su - cassandra*
>>
>>
>>
>> I don't know why but this isn't actually working. It does not switch me
>> to *cassandra* user[btw .. should this actually switch me to cassandra
>> user?? ]. This user switching on my servers does not work for users like
>> *tomcat7* user, *cassandra* user but works for users that were manually
>> created by user. Actually I tested this on two of my test servers but same
>> results on both.
>>
>
>


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Sholes, Joshua
That’s why you can’t sudo, then.   You’d need to edit the end of that line to 
be /bin/bash instead of /bin/false.
--
Josh Sholes

From: user 01 mailto:user...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, March 10, 2014 at 1:55 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for 
Cassandra"

@Sholes, Joshua: It was installed using Datastax DSC20 package for cassandra 
2.0.5.

Checked out  /etc/passwd . An entry for cassandra user does seem to exist. 
Looks like this:

cassandra:x:107:113:Cassandra database,,,:/var/lib/cassandra:/bin/false



On Mon, Mar 10, 2014 at 11:21 PM, user 01 
mailto:user...@gmail.com>> wrote:
@Donald:
This did not worked out.. Also as I already have shown above (I think) folder 
permissions seem to be correct for both cassandra /var/lib/cassandra  & 
/var/log/cassandra folders



On Mon, Mar 10, 2014 at 11:06 PM, Donald Smith 
mailto:donald.sm...@audiencescience.com>> 
wrote:
You may need to do “chown –R cassandra /var/lib/cassandra /var/log/cassandra” .

Don

From: user 01 [mailto:user...@gmail.com]
Sent: Monday, March 10, 2014 10:23 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for 
Cassandra"

$ sudo su - cassandra

I don't know why but this isn't actually working. It does not switch me to 
cassandra user[btw .. should this actually switch me to cassandra user?? ]. 
This user switching on my servers does not work for users like tomcat7 user, 
cassandra user but works for users that were manually created by user. Actually 
I tested this on two of my test servers but same results on both.




Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Michael Shuler
That all makes sense, then. I, personally, would not give the c* user a 
valid shell, unless it was just for a little testing - then change it 
back to /bin/false as intended.


Since the initial problem was that the service commaned had trouble with 
the PID file, and OP has tinkered with starting as root, it's possible 
root owns the pid file or directory?


(or that 'service cassandra status' doesn't work as expected, and it's 
actually running fine?)


Honestly, there's too many unknowns for me to accurately be able to help 
without poking around a shell on the node - I'm just guessing.


--
Michael

On 03/10/2014 12:50 PM, Sholes, Joshua wrote:

Depending on how it was installed, try making sure that the cassandra
user has an actual login shell defined in /etc/passwd and not something
like /sbin/nologin or /bin/false.


From: user 01 mailto:user...@gmail.com>>
Reply-To: "user@cassandra.apache.org "
mailto:user@cassandra.apache.org>>
Date: Monday, March 10, 2014 at 1:22 PM
To: "user@cassandra.apache.org "
mailto:user@cassandra.apache.org>>
Subject: Re: Cassandra DSC 2.0.5 not starting - "* could not access
pidfile for Cassandra"

$*sudo su - cassandra*

*
*
I don't know why but this isn't actually working. It does not switch me
to *cassandra* user[btw .. should this actually switch me to cassandra
user?? ]. This user switching on my servers does not work for users like
*tomcat7* user, *cassandra* user but works for users that were manually
created by user. Actually I tested this on two of my test servers but
same results on both.




Authoritative failed write: Using paxos to "cancel" failed quorum writes

2014-03-10 Thread Wayne Schroeder
As I understand it, even though a quorum write fails, the data is still (more 
than likely) saved and will become eventually consistent through the well known 
mechanisms.  I have a case where I would rather this not happen--where I would 
prefer that if the quorum write fails, that data NEVER becomes consistent, and 
the old values remain.

After a bit of pondering, I came up the idea of simply making my write a 
conditional update based on a previous value.  In my use case, I will not be 
contending with any other writes of the same primary key, and this write 
operation is rare in the grand scheme of things.  Using this approach, the 
desired effect is that if the write fails, it will not eventually happen 
without the app's knowledge.

Is this approach sound?

If so, it sounds like a really cool potential addition to CQL like:  UPDATE tab 
SET col=? WHERE key=? AUTHORITATIVE

Thoughts?

Wayne



Re: Authoritative failed write: Using paxos to "cancel" failed quorum writes

2014-03-10 Thread Tupshin Harper
Take a 3 node cluster with RF=3, and QUORUM reads and writes. Consistency
is achieved by ensuring that at least two nodes acknowledge a write, and at
least two nodes have to participate in a read. As a result, you know that
at least one of the two nodes that you are reading from has received the
latest copy, and therefore its latest timestamp wins, even if the write
isn't fully propagated to all nodes.

>From a practical point of view, your approach to "cancel" a write would
have to involve the coordinator writing a second deletion mutation after
the first regular mutation in order to effectively cancel it.
The main problem with this is that you can't guarantee it. If the
coordinator fails after the first write goes out but before detecting the
failure, then the write will never be cancelled despite being only
partially written and marked "authoritatiuve"

If you really need to rely on this behavior, you should probably do the
whole write as a lightweight transaction, despite the additional overhead.
If you don't need to rely on this auto-cancelling behavior, I'd strongly
suggest that reacting to a failed write (or at least an exception on write)
 should be better handled in application code.

-Tupshin


On Mon, Mar 10, 2014 at 4:29 PM, Wayne Schroeder <
wschroe...@pinsightmedia.com> wrote:

> As I understand it, even though a quorum write fails, the data is still
> (more than likely) saved and will become eventually consistent through the
> well known mechanisms.  I have a case where I would rather this not
> happen--where I would prefer that if the quorum write fails, that data
> NEVER becomes consistent, and the old values remain.
>
> After a bit of pondering, I came up the idea of simply making my write a
> conditional update based on a previous value.  In my use case, I will not
> be contending with any other writes of the same primary key, and this write
> operation is rare in the grand scheme of things.  Using this approach, the
> desired effect is that if the write fails, it will not eventually happen
> without the app's knowledge.
>
> Is this approach sound?
>
> If so, it sounds like a really cool potential addition to CQL like:
>  UPDATE tab SET col=? WHERE key=? AUTHORITATIVE
>
> Thoughts?
>
> Wayne
>
>


Re: NULL in CQL

2014-03-10 Thread Mikhail Stepura
You can use blobAs functions to insert empty blobs/values right 
into your cells, for example blobAsInt(0x) will insert an *empty* integer.


http://cassandra.apache.org/doc/cql3/CQL.html#blobFun


On 3/10/14, 9:56, Rahul Gupta wrote:

Hi,

How do I check for NULL values in CQL3?

I am trying to write a CQL equivalent for below SQL:

SELECT * FROM table1 WHERE col2 IS NULL;

While inserting data into C*, CQL won’t let me insert NULL, I have to
pass ‘’ for Strings and 0 for integers.

I have ‘’ and 0s as valid data so that’s conflicting with fillers I have
to use instead of NULL.

Please advise.

Thanks,

*Rahul Gupta*
*DEKA**Research & Development* 

340 Commercial StManchester, NH  03101

P: 603.666.3908 extn. 6504 | C: 603.718.9676

This e-mail and the information, including any attachments, it contains
are intended to be a confidential communication only to the person or
entity to whom it is addressed and may contain information that is
privileged. If the reader of this message is not the intended recipient,
you are hereby notified that any dissemination, distribution or copying
of this communication is strictly prohibited. If you have received this
communication in error, please immediately notify the sender and destroy
the original message.



This e-mail and the information, including any attachments, it contains
are intended to be a confidential communication only to the person or
entity to whom it is addressed and may contain information that is
privileged. If the reader of this message is not the intended recipient,
you are hereby notified that any dissemination, distribution or copying
of this communication is strictly prohibited. If you have received this
communication in error, please immediately notify the sender and destroy
the original message.

Thank you.

Please consider the environment before printing this email.




Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread user 01
As I mentioned I had given ownership to cassandra user for the
cassandra.pid containing folder, var/lib/cassandra & /var/log/cassandra.
All of them are owned by cassandra as I verified.

However after suggestion by someone, I tried removing the
folders  var/lib/cassandra & /var/log/cassandra & restarted cassandra & it
seems to work fine now.. after this I can start the cassandra as a service
using

* sudo service cassandra start*



& can also see status using

*sudo service cassandra startus*


However is this a practical approach? should I just delete the entire
folders as such & let cassandra create new folders ? Wouldn't that reduce
some information from cassandra? (like the opscenter keyspaces which were
there by default but wont be created by cassandra itself ? or anything else
I dont know!).
However this means that folder permissions is not somehow correct, isn;t it
? Shouldn't dsc installer be taking care of configuring this properly ? Or
is it my responsibility ?


On Tue, Mar 11, 2014 at 12:20 AM, Michael Shuler wrote:

> That all makes sense, then. I, personally, would not give the c* user a
> valid shell, unless it was just for a little testing - then change it back
> to /bin/false as intended.
>
> Since the initial problem was that the service commaned had trouble with
> the PID file, and OP has tinkered with starting as root, it's possible root
> owns the pid file or directory?
>
> (or that 'service cassandra status' doesn't work as expected, and it's
> actually running fine?)
>
> Honestly, there's too many unknowns for me to accurately be able to help
> without poking around a shell on the node - I'm just guessing.
>
> --
> Michael
>
>
> On 03/10/2014 12:50 PM, Sholes, Joshua wrote:
>
>> Depending on how it was installed, try making sure that the cassandra
>> user has an actual login shell defined in /etc/passwd and not something
>> like /sbin/nologin or /bin/false.
>>
>>
>> From: user 01 mailto:user...@gmail.com>>
>> Reply-To: "user@cassandra.apache.org "
>> mailto:user@cassandra.apache.org>>
>>
>> Date: Monday, March 10, 2014 at 1:22 PM
>> To: "user@cassandra.apache.org "
>> mailto:user@cassandra.apache.org>>
>>
>> Subject: Re: Cassandra DSC 2.0.5 not starting - "* could not access
>> pidfile for Cassandra"
>>
>> $*sudo su - cassandra*
>>
>> *
>>
>> *
>> I don't know why but this isn't actually working. It does not switch me
>> to *cassandra* user[btw .. should this actually switch me to cassandra
>>
>> user?? ]. This user switching on my servers does not work for users like
>> *tomcat7* user, *cassandra* user but works for users that were manually
>>
>> created by user. Actually I tested this on two of my test servers but
>> same results on both.
>>
>
>


Re: GCInspector GC for ConcurrentMarkSweep running every 15 seconds

2014-03-10 Thread Jeremiah D Jordan
Also it might be:
https://issues.apache.org/jira/browse/CASSANDRA-6541

That is causing the high heap.

-Jeremiah

On Feb 18, 2014, at 5:01 PM, Jonathan Ellis  wrote:

> Sounds like you have CMSInitiatingOccupancyFraction set close to 60.
> You can raise that and/or figure out how to use less heap.
> 
> On Mon, Feb 17, 2014 at 5:06 PM, John Pyeatt  
> wrote:
>> I have a 6 node cluster running on AWS. We are using m1.large instances with
>> heap size set to 3G.
>> 
>> 5 of the 6 nodes seem quite healthy. The 6th one however is running
>> GCInspector GC for ConcurrentMarkSweep every 15 seconds or so. There is
>> nothing going on on this box. No repairs and almost not user activity. But
>> the CPU is almost continuously at 50% or more.
>> 
>> The only message in the log at all is the
>> INFO 2014-02-17 22:58:53,429 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 213 ms for 1 collections, 1964940024 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:07,431 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 250 ms for 1 collections, 1983269488 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:21,522 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 280 ms for 1 collections, 1998214480 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:36,527 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 305 ms for 1 collections, 2013065592 used; max is
>> 3200253952
>> INFO 2014-02-17 22:59:50,529 [ScheduledTasks:1] GCInspector GC for
>> ConcurrentMarkSweep: 334 ms for 1 collections, 2028069232 used; max is
>> 3200253952
>> 
>> We don't see any of these messages on the other nodes in the cluster.
>> 
>> We are seeing similar behaviour for both our production and QA clusters.
>> Production is running cassandra 1.2.9 and QA is running 1.2.13.
>> 
>> Here are some of the cassandra settings that I would think might be
>> relevant.
>> 
>> flush_largest_memtables_at: 0.75
>> reduce_cache_sizes_at: 0.85
>> reduce_cache_capacity_to: 0.6
>> in_memory_compaction_limit_in_mb: 64
>> 
>> Does anyone have any ideas why we are seeing this so selectively on one box?
>> 
>> Any cures???
>> --
>> John Pyeatt
>> Singlewire Software, LLC
>> www.singlewire.com
>> --
>> 608.661.1184
>> john.pye...@singlewire.com
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced



Re: NULL in CQL

2014-03-10 Thread DuyHai Doan
@Rahul

*Null *has a special semantics for CQL3. Setting a column value to *null *means
deleting it...


On Mon, Mar 10, 2014 at 5:56 PM, Rahul Gupta wrote:

>  Hi,
>
>
>
> How do I check for NULL values in CQL3?
>
> I am trying to write a CQL equivalent for below SQL:
>
>
>
> SELECT * FROM table1 WHERE col2 IS NULL;
>
>
>
> While inserting data into C*, CQL won't let me insert NULL, I have to pass
> '' for Strings and 0 for integers.
>
> I have '' and 0s as valid data so that's conflicting with fillers I have
> to use instead of NULL.
>
>
>
> Please advise.
>
>
>
> Thanks,
>
> *Rahul Gupta*
> *DEKA* *Research & Development* 
>
> 340 Commercial St  Manchester, NH  03101
>
> P: 603.666.3908 extn. 6504 | C: 603.718.9676
>
>
>
> This e-mail and the information, including any attachments, it contains
> are intended to be a confidential communication only to the person or
> entity to whom it is addressed and may contain information that is
> privileged. If the reader of this message is not the intended recipient,
> you are hereby notified that any dissemination, distribution or copying of
> this communication is strictly prohibited. If you have received this
> communication in error, please immediately notify the sender and destroy
> the original message.
>
>
>
> --
> This e-mail and the information, including any attachments, it contains
> are intended to be a confidential communication only to the person or
> entity to whom it is addressed and may contain information that is
> privileged. If the reader of this message is not the intended recipient,
> you are hereby notified that any dissemination, distribution or copying of
> this communication is strictly prohibited. If you have received this
> communication in error, please immediately notify the sender and destroy
> the original message.
>
> Thank you.
>
> Please consider the environment before printing this email.
>


Re: Authoritative failed write: Using paxos to "cancel" failed quorum writes

2014-03-10 Thread Tupshin Harper
Oh sorry,  I misunderstood. But now I'm confused about how what you are
trying to do is not accomplished with the existing IF NOT EXISTS syntax.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

-Tupshin
On Mar 10, 2014 4:24 PM, "Wayne Schroeder" 
wrote:

>  The plan IS to do the whole write as a lightweight transaction because I
> do need to rely on the behavior.  I am just vetting the expected
> behavior--that doing it as a conditional update, i.e. a light weight
> transaction, that I am not missing something and it will behave as I
> outlined without some other unrealized consequence.  Additionally, it
> sounds like a potential nice CQL level feature -- that a language keyword
> could be added to indicate that a LWT should be done to ensure that the
> quorum update is an all or nothing update at the expense of using LWT.
>
>  Wayne
>
>
>  On Mar 10, 2014, at 3:52 PM, Tupshin Harper  wrote:
>
> If you really need to rely on this behavior, you should probably do the
> whole write as a lightweight transaction, despite the additional overhead.
>
>
>


Re: Authoritative failed write: Using paxos to "cancel" failed quorum writes

2014-03-10 Thread Wayne Schroeder
The plan IS to do the whole write as a lightweight transaction because I do 
need to rely on the behavior.  I am just vetting the expected behavior--that 
doing it as a conditional update, i.e. a light weight transaction, that I am 
not missing something and it will behave as I outlined without some other 
unrealized consequence.  Additionally, it sounds like a potential nice CQL 
level feature -- that a language keyword could be added to indicate that a LWT 
should be done to ensure that the quorum update is an all or nothing update at 
the expense of using LWT.

Wayne


On Mar 10, 2014, at 3:52 PM, Tupshin Harper 
mailto:tups...@tupshin.com>> wrote:

If you really need to rely on this behavior, you should probably do the whole 
write as a lightweight transaction, despite the additional overhead.



Re: NULL in CQL

2014-03-10 Thread Tupshin Harper
And to be clear, and to elaborate, null is the default state for a
Cassandra cell if you don't write to it, so you can always create a row
with a null column by writing the row without that column being specified.

Additionally, cql's delete statement optionally takes a columns argument,
so if you want to set an existing column to null, just delete it.

DELETE [COLUMNS] FROM  [USING ] WHERE KEY =
keyname1 DELETE [COLUMNS] FROM  [USING ] WHERE
KEY IN (keyname1, keyname2);

-Tupshin
On Mar 10, 2014 4:31 PM, "DuyHai Doan"  wrote:

> @Rahul
>
> *Null *has a special semantics for CQL3. Setting a column value to *null 
> *means
> deleting it...
>
>
> On Mon, Mar 10, 2014 at 5:56 PM, Rahul Gupta wrote:
>
>>  Hi,
>>
>>
>>
>> How do I check for NULL values in CQL3?
>>
>> I am trying to write a CQL equivalent for below SQL:
>>
>>
>>
>> SELECT * FROM table1 WHERE col2 IS NULL;
>>
>>
>>
>> While inserting data into C*, CQL won't let me insert NULL, I have to
>> pass '' for Strings and 0 for integers.
>>
>> I have '' and 0s as valid data so that's conflicting with fillers I have
>> to use instead of NULL.
>>
>>
>>
>> Please advise.
>>
>>
>>
>> Thanks,
>>
>> *Rahul Gupta*
>> *DEKA* *Research & Development* 
>>
>> 340 Commercial St  Manchester, NH  03101
>>
>> P: 603.666.3908 extn. 6504 | C: 603.718.9676
>>
>>
>>
>> This e-mail and the information, including any attachments, it contains
>> are intended to be a confidential communication only to the person or
>> entity to whom it is addressed and may contain information that is
>> privileged. If the reader of this message is not the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying of
>> this communication is strictly prohibited. If you have received this
>> communication in error, please immediately notify the sender and destroy
>> the original message.
>>
>>
>>
>> --
>> This e-mail and the information, including any attachments, it contains
>> are intended to be a confidential communication only to the person or
>> entity to whom it is addressed and may contain information that is
>> privileged. If the reader of this message is not the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying of
>> this communication is strictly prohibited. If you have received this
>> communication in error, please immediately notify the sender and destroy
>> the original message.
>>
>> Thank you.
>>
>> Please consider the environment before printing this email.
>>
>
>


Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Michael Shuler

On 03/10/2014 04:12 PM, user 01 wrote:

As I mentioned I had given ownership to cassandra user for the
cassandra.pid containing folder, var/lib/cassandra & /var/log/cassandra.
All of them are owned by cassandra as I verified.

However after suggestion by someone, I tried removing the
folders  var/lib/cassandra & /var/log/cassandra & restarted cassandra &
it seems to work fine now.. after this I can start the cassandra as a
service using

* sudo service cassandra start*

& can also see status using

*sudo service cassandra startus*

*
*
However is this a practical approach? should I just delete the entire
folders as such & let cassandra create new folders ? Wouldn't that
reduce some information from cassandra? (like the opscenter keyspaces
which were there by default but wont be created by cassandra itself ? or
anything else I dont know!).
However this means that folder permissions is not somehow correct, isn;t
it ? Shouldn't dsc installer be taking care of configuring this properly
? Or is it my responsibility ?


You shouldn't have needed to set any permissions - the package is 
responsible for that. You shouldn't need to remove/remake directories. 
It should "Just Work".


I went through a scratch install of dsc20 on an Ubuntu LTS machine to 
see what the problem might be and attached the console log. I did find a 
bug with the same behavior you described. I have no idea why someone 
*removed* the dependency on a functional JRE from the cassandra package 
- this is *not* the same Depends: line as the upstream OSS cassandra 
package [0]. (You can see this in 'apt-cache show cassandra=2.0.5' as in 
my console output.)


Did you discover your problem with using 'service cassandra 
{start,status}' *before* you installed a JRE?  I did.


As soon as I installed openjdk-7-jdk, the cassandra service start/status 
"Just Worked", as I would expect.


I do see in the documentation, that the prerequisite lists "Java is 
installed." [1] - in my opinion, it should be installed via package 
dependency, just as the upstream package does. I will try to follow up 
on this bug.


I appreciate your dilligence and verifying as best you can.  Clear 
reproduction steps, commands run, output text, etc., as I'm attaching, 
is super helpful, for future reference  :)


[0] https://github.com/apache/cassandra/blob/cassandra-2.0/debian/control
[1] 
http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedDeb_t.html


--
Kind regards,
Michael
mshuler@hana:~$ ec2-run-instances ami-c9d7d1a0 -k mshuler_hana -t m1.medium
RESERVATION r-02eacc23  570516133972default
INSTANCEi-e47165c5  ami-c9d7d1a0pending 
mshuler_hana0   m1.medium   2014-03-10T21:46:21+
us-east-1d  aki-919dcaf8monitoring-disabled 
instance-store  
paravirtualxen  sg-70ed6619 default false

mshuler@hana:~$ ec2-describe-instances |grep mshuler_hana
INSTANCEi-e47165c5  ami-c9d7d1a0
ec2-54-204-51-35.compute-1.amazonaws.comip-10-164-0-109.ec2.internal
pending mshuler_hana0  m1.medium2014-03-10T21:46:21+
us-east-1d  aki-919dcaf8monitoring-disabled 
54.204.51.3510.164.0.109   instance-store   
paravirtual xen sg-70ed6619 default false

mshuler@hana:~$ ssh -i .ssh/mshuler_hana.pem 
ubu...@ec2-54-204-51-35.compute-1.amazonaws.com

ubuntu@ip-10-164-0-109:~$ sudo sh -c 'echo "deb 
http://debian.datastax.com/community stable main" >> /etc/apt/sources.list'

ubuntu@ip-10-164-0-109:~$ curl -L http://debian.datastax.com/debian/repo_key | 
sudo apt-key add -

ubuntu@ip-10-164-0-109:~$ sudo apt-get update -q2

ubuntu@ip-10-164-0-109:~$ sudo apt-get install dsc20
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following extra packages will be installed:
  cassandra libcap2 libjna-java libopts25 ntp python-support
Suggested packages:
  libjna-java-doc ntp-doc
The following NEW packages will be installed:
  cassandra dsc20 libcap2 libjna-java libopts25 ntp python-support
0 upgraded, 7 newly installed, 0 to remove and 11 not upgraded.
Need to get 15.4 MB of archives.
After this operation, 18.8 MB of additional disk space will be used.
Do you want to continue [Y/n]? 
Get:1 http://debian.datastax.com/community/ stable/main cassandra all 2.0.5 
[14.3 MB]
Get:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ precise/main libcap2 
amd64 1:2.22-1ubuntu3 [12.0 kB]
Get:3 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ precise/main libopts25 
amd64 1:5.12-0.1ubuntu1 [59.9 kB]
Get:4 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ precise-updates/main ntp 
amd64 1:4.2.6.p3+dfsg-1ubuntu3.1 [612 kB]
Get:5 http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ precise/universe 
libjna-

Re: Cassandra DSC 2.0.5 not starting - "* could not access pidfile for Cassandra"

2014-03-10 Thread Michael Shuler

On 03/10/2014 05:15 PM, Michael Shuler wrote:

I did find a bug with the same behavior you described. I have no idea
why someone *removed* the dependency on a functional JRE from the
cassandra package - this is *not* the same Depends: line as the
upstream OSS cassandra package


Quick follow-up on why this package deviates from the upstream deb 
package: this is done to prevent users from being force-installed the 
OpenJDK packages, since most users install the Oracle JRE from tar [0]. 
This installation of OpenJDK via package steps on the hand-installed 
java alternatives symlinks. I had assumed, incorrectly, that the 
cassandra package in the DSC repository was the identical package as in 
the Apache repository.


[0] 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJreDeb.html


--
Kind regards,
Michael