RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-22 Thread Phil Luckhurst
Definitely no TTL and records are only written once with no deletions.

Phil


DuyHai Doan wrote
> Are you sure there is no TTL set on your data? It might explain the shrink
> in sstable size after compaction.





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594644.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Change number of vnodes on an existing cluster

2014-05-22 Thread Phil Luckhurst
We have a 3 node 2.0.7 cluster with RF=3. At the moment these are configured
to have the default 256 vnodes we'd like to try reducing that to see what
effect it has on some of our CQL query times.

It seems from  CASSANDRA-7057
   that there is no
automatic method for this but is it possible to do this manually and if so
what are the steps required? Do we need to add new nodes with the number of
vnodes we require and then decommission the existing ones or is it possible
to do it just with our existing 3 nodes?

Thanks
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Change-number-of-vnodes-on-an-existing-cluster-tp7594646.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Managing truststores with inter-node encryption

2014-05-22 Thread Jeremy Jongsma
The docs say that each node needs every other node's certificate in its
local truststore:

http://www.datastax.com/documentation/cassandra/1.2/cassandra/security/secureSSLCertificates_t.html

This seems like a bit of a headache for adding nodes to a cluster. How do
others deal with this?

1) If I am self-signing the client certificates (with puppetmaster), is it
enough that the truststore just contain the CA certificate used to sign
them? This is the typical PKI mechanism for verifying trust, so I am hoping
it works here.

2) If not, can I use the same certificate for every node? If so, what is
the downside? I'm mainly concerned with encryption over public internet
links, not node identity verification.


RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-22 Thread Andreas Finke
Hi Phil,

I found an interesting blog entry that may address your problem.

http://www.datastax.com/dev/blog/optimizations-around-cold-sstables

It seems that compaction is skipped for stables which so mit satisfy a certain 
read rate. Please check.


Kind regards

Andreas Finke
Java Developer
Solvians IT-Solutions GmbH


 Phil Luckhurst wrote 

Definitely no TTL and records are only written once with no deletions.

Phil


DuyHai Doan wrote
> Are you sure there is no TTL set on your data? It might explain the shrink
> in sstable size after compaction.





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594644.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Change number of vnodes on an existing cluster

2014-05-22 Thread Robert Coli
On Thu, May 22, 2014 at 4:31 AM, Phil Luckhurst <
phil.luckhu...@powerassure.com> wrote:

> We have a 3 node 2.0.7 cluster with RF=3. At the moment these are
> configured
> to have the default 256 vnodes we'd like to try reducing that to see what
> effect it has on some of our CQL query times.
>

Because you have RF=N, all nodes have all data. This means that it is
actually irrelevant how many vnodes (or nodes) you have, you just get lose
from using them at all.

However to reduce the number of vnodes to a number that is ok in your case
but also might be reasonable when you have RF != N, you can just :

1) get a list of tokens per node via a one-liner like this :

nodetool info -T | grep Token | awk '{print $3}' | paste -s -d,

2) modify this list by removing however many tokens you want to get to the
new number of vnodes

3) insert this list into the initial_token line of cassandra.yaml on each
node [1]

4) rolling re-start nodes with auto_bootstrap:false [2]

My *belief* is that you do not need a step 3.5 "nuke the system keyspace
and reload schema, potentially with the entire cluster down", but it's
possible that other nodes may remember your old vnodes unless you do. Test
in a non-production environment, obviously.

If the above is too complicated and you have the spare hosts, adding 3 new
nodes and then decommissioning the old ones is a safe and simple way to
achieve the same goal.

=Rob
[1] Note that I recommend this as a best practice for the use of vnodes,
always populate initial_token.
[2]
https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/


Re: autoscaling cassandra cluster

2014-05-22 Thread Robert Coli
On Wed, May 21, 2014 at 4:35 AM, Jabbar Azam  wrote:

> Has anybody got a cassandra cluster which autoscales depending on load or
> times of the day?
>

Netflix probably does, managed with Priam.

In general I personally do not consider Cassandra's mechanisms for joining
and parting nodes to currently work well enough to consider designing a
production system which would do so as part of regular operation.

=Rob


Re: Is the tarball for a given release in a Maven repository somewhere?

2014-05-22 Thread Chris Burroughs
Maven central has "bin.tar.gz"  "src.tar.gz" downloads for the 
'apache-cassandra' artifact.  Does that work for your use case?


http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22apache-cassandra%22

On 05/20/2014 05:30 PM, Clint Kelly wrote:

Hi all,

I am using the maven assembly plugin to build a project that contains
a development environment for a project that we've built at work on
top of Cassandra.  I'd like this development environment to include
the latest release of Cassandra.

Is there a maven repo anywhere that contains an artifact with the
Cassandra release in it?  I'd like to have the same Cassandra tarball
that you can download from the website be a dependency for my project.
  I can then have the assembly plugin untar it and customize some of
the conf files before taring up our entire development environment.
That way, anyone using our development environment would have access
to the various shell scripts and tools.

I poked around online and could not find what I was looking for.  Any
help would be appreciated!

Best regards,
Clint





Re: Cassandra pre 2.1 vs 2.1 counter implementation

2014-05-22 Thread Robert Coli
On Wed, May 21, 2014 at 1:15 PM, Localhost shell <
universal.localh...@gmail.com> wrote:

>
> I am new to C* community.
>
> We are planning to use Datastax C* (pre 2.1) in production. We heavily
> use counters and it is mostly what we do apart from storing the few months
> raw logs in C*.
>

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

Most people using Cassandra Counters in production have some combo of low
accuracy requirements, low update rate, or high update rate behind an
in-memory accumulator which is flushed to Cassandra periodically.


> 2.  What were the major concerns (other than idempotency and overcount
> due to timeout exceptions) in the pre 2.1 counters architecture that led
> to a rewrite of counters implementation?
>

https://issues.apache.org/jira/browse/CASSANDRA-4417

And slowness...

https://issues.apache.org/jira/browse/CASSANDRA-4775
and
https://issues.apache.org/jira/browse/CASSANDRA-6504

have a lot of details/background.

=Rob


Re: How to enable a Cassandra node to participate in multiple cluster

2014-05-22 Thread Wojciech Meler
As far as I know nodes are identified by ip address and ports defined in
yaml. If you carefully prepare configuraton you should be able to run
multiple Cassandra processes on single box participating in different
clusters. If you decide to use separate ip address and default ports it
should be quite easy. Nowadays when docker is in place you could put
cassandra in container or use any kind of vms.
 21 maj 2014 20:07, "Salih Kardan"  napisał(a):

> Hello everyone,
>
> I want to use Cassandra cluster for some specific purpose across data
> centers. What I want to figure out is how can I enable a single Cassandra
> node to participate in multiple clusters at the same time? I googled it,
> however I could not find any use case of Cassandra as I mentioned above. Is
> this possible with the current architecture of Cassandra?
>
> Salih
>


Re: Is the tarball for a given release in a Maven repository somewhere?

2014-05-22 Thread Clint Kelly
Probably!  I'll give that a try, thanks!

On Thu, May 22, 2014 at 12:11 PM, Chris Burroughs
 wrote:
> Maven central has "bin.tar.gz"  "src.tar.gz" downloads for the
> 'apache-cassandra' artifact.  Does that work for your use case?
>
> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22apache-cassandra%22
>
>
> On 05/20/2014 05:30 PM, Clint Kelly wrote:
>>
>> Hi all,
>>
>> I am using the maven assembly plugin to build a project that contains
>> a development environment for a project that we've built at work on
>> top of Cassandra.  I'd like this development environment to include
>> the latest release of Cassandra.
>>
>> Is there a maven repo anywhere that contains an artifact with the
>> Cassandra release in it?  I'd like to have the same Cassandra tarball
>> that you can download from the website be a dependency for my project.
>>   I can then have the assembly plugin untar it and customize some of
>> the conf files before taring up our entire development environment.
>> That way, anyone using our development environment would have access
>> to the various shell scripts and tools.
>>
>> I poked around online and could not find what I was looking for.  Any
>> help would be appreciated!
>>
>> Best regards,
>> Clint
>>
>


Re: autoscaling cassandra cluster

2014-05-22 Thread Jabbar Azam
Netflix uses Scryer
http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.htmlfor
predictive and reactive autoscaling but they only refer to EC2
instances. They don't mention anything about cassandra scaling or adding
and removing nodes.

I've just looked at the priam wiki and it also doesn't mention scaling. It
also mentions that vnodes aren't fully supported. That's no use for me as
I'm using 2.x. The other issue, rather feature of cassandra, is that adding
a new node increases the load on the system so this surge would need to be
taken into account.

I think I'll leave this problem for more intelligent people than me and
concentrate on the application logic, which can scale by adding or removing
application and front end servers.

Thanks for all your comments.

Thanks

Jabbar Azam


On 22 May 2014 19:55, Robert Coli  wrote:

> On Wed, May 21, 2014 at 4:35 AM, Jabbar Azam  wrote:
>
>> Has anybody got a cassandra cluster which autoscales depending on load or
>> times of the day?
>>
>
> Netflix probably does, managed with Priam.
>
> In general I personally do not consider Cassandra's mechanisms for joining
> and parting nodes to currently work well enough to consider designing a
> production system which would do so as part of regular operation.
>
> =Rob
>
>


Re: CassandraStorage loader generating 2x many record?

2014-05-22 Thread Robert Coli
On Tue, May 20, 2014 at 1:44 PM, Kevin Burton  wrote:

> This has to be a bug or either that or I'm insane.
>

If it turns out you're not insane (;D) I suggest filing a JIRA ticket with
your repro steps at :

http://issues.apache.org

=Rob


Re: Change number of vnodes on an existing cluster

2014-05-22 Thread Phil Luckhurst
Thanks Rob, I didn't realize that you could use the initial_token when using
vnodes.

I see what you mean now that with RF=N having multiple vnodes is not
actually achieving anything unless we add further nodes, we hadn't really
considered that when we initially installed with the default yaml file.

For a small cluster, e.g. 9 nodes with RF=3 would you actually recommend
using vnodes at all and if so how many?

Phil



Robert Coli-3 wrote
> On Thu, May 22, 2014 at 4:31 AM, Phil Luckhurst <

> phil.luckhurst@

>> wrote:
> 
>> We have a 3 node 2.0.7 cluster with RF=3. At the moment these are
>> configured
>> to have the default 256 vnodes we'd like to try reducing that to see what
>> effect it has on some of our CQL query times.
>>
> 
> Because you have RF=N, all nodes have all data. This means that it is
> actually irrelevant how many vnodes (or nodes) you have, you just get lose
> from using them at all.
> 
> However to reduce the number of vnodes to a number that is ok in your case
> but also might be reasonable when you have RF != N, you can just :
> 
> 1) get a list of tokens per node via a one-liner like this :
> 
> nodetool info -T | grep Token | awk '{print $3}' | paste -s -d,
> 
> 2) modify this list by removing however many tokens you want to get to the
> new number of vnodes
> 
> 3) insert this list into the initial_token line of cassandra.yaml on each
> node [1]
> 
> 4) rolling re-start nodes with auto_bootstrap:false [2]
> 
> My *belief* is that you do not need a step 3.5 "nuke the system keyspace
> and reload schema, potentially with the entire cluster down", but it's
> possible that other nodes may remember your old vnodes unless you do. Test
> in a non-production environment, obviously.
> 
> If the above is too complicated and you have the spare hosts, adding 3 new
> nodes and then decommissioning the old ones is a safe and simple way to
> achieve the same goal.
> 
> =Rob
> [1] Note that I recommend this as a best practice for the use of vnodes,
> always populate initial_token.
> [2]
> https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Change-number-of-vnodes-on-an-existing-cluster-tp7594646p7594657.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-22 Thread Phil Luckhurst
Hi Andreas,

So does that mean it can compact the 'hottest' partitions into a new sstable
but the old sstables may not immediately be removed so the same data could
be in more that one sstable? That would certainly explain the difference we
see when we manually run nodetool compact.

Thanks
Phil


Andreas Finke wrote
> Hi Phil,
> 
> I found an interesting blog entry that may address your problem.
> 
> http://www.datastax.com/dev/blog/optimizations-around-cold-sstables
> 
> It seems that compaction is skipped for stables which so mit satisfy a
> certain read rate. Please check.
> 
> 
> Kind regards
> 
> Andreas Finke
> Java Developer
> Solvians IT-Solutions GmbH
> 
> 
>  Phil Luckhurst wrote 
> 
> Definitely no TTL and records are only written once with no deletions.
> 
> Phil
> 
> 
> DuyHai Doan wrote
>> Are you sure there is no TTL set on your data? It might explain the
>> shrink
>> in sstable size after compaction.
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594644.html
> Sent from the 

> cassandra-user@.apache

>  mailing list archive at Nabble.com.





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594658.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Change number of vnodes on an existing cluster

2014-05-22 Thread Robert Coli
On Thu, May 22, 2014 at 2:16 PM, Phil Luckhurst <
phil.luckhu...@powerassure.com> wrote:

> Thanks Rob, I didn't realize that you could use the initial_token when
> using
> vnodes.
>

I am planning to write a blog post on the always-specify-initial_token best
practice. It protects users in a variety of cases, at the cost of
comparatively little management overhead.

For reference, the format of this when using vnodes is a comma delimited
list.


> I see what you mean now that with RF=N having multiple vnodes is not
> actually achieving anything unless we add further nodes, we hadn't really
> considered that when we initially installed with the default yaml file.
>

Cassandra "design" and its implicit perspective on defaults often seem to
not be able to make up its mind : are they for the developer running a
single node on his laptop, or the operator deploying a reasonably sized
production cluster? The answer seems to be "it depends on who is asking for
what and who is merging the patch."

It is my view that vnodes as a default is unreasonable because most
clusters are unreasonably small and will stay that way, and vnodes only
lose in that case. Vnodes only help you if RF is some multiple of N, but
100% of vnode users pay fixed overhead costs. As it is clear that vnodes
lose in many common cases, I strongly doubt there was any formal-esque
argument made for it as a reasonable default for all users.

The paper trail suggests that virtually no public justification was made
for making it the default, perhaps there was some discussion not
immediately visible in google :

https://issues.apache.org/jira/browse/CASSANDRA-5869 - "Make vnodes default
in 2.0"
"
Description

_Click To Add Description_
"

http://grokbase.com/t/cassandra/dev/1383gva1fm/time-for-a-release-candidate
"
We hadn't actually set vnodes to be on by default, which we should do
after having 1.2 default to old single token per node. Fixed in
CASSANDRA-5869 so let's roll an rc2.
"

The stated rationale here appears to be that a major version has passed,
and therefore it is appropriate to change the default. This decision does
not seem related in any way to the volume of still quite serious problems
vnodes had as of the 2.0 rc1 era.

In real reality, vnodes were almost certainly set to default in 2.0 so that
they could be hardened by both early adopters (cassandra experts) and noobs
(cassandra anti-experts) encountering problems with this default. As Eric
Evans mentioned at the time in a related post, this is to some extent the
reality of open source software development. Where he and I appear to
disagree is on whether it is reasonable to set new features as defaults and
thereby use noobs as part of your QA process for same.

If a Cassandra Operator notices something new and shiny has been made
default or is being pushed heavily in a brand new version (vnodes, level
compaction at 5mb, serial repair, etc., etc.) they should do their own
evaluation and determine whether they want to participate in the QA process
for this new feature, in production. My view is that it would be a lot more
reasonable to continue to promote such new things heavily but with VERY
LARGE REPEATED CAVEATS so that only the former class of willful early
adopter experts will catch these bullets.

tl;dr - Prematurely and optimistically making immature features like vnodes
the default seems to mostly ensure that noobs such as yourself get a very
bad initial experience with Cassandra.

For a small cluster, e.g. 9 nodes with RF=3 would you actually recommend
> using vnodes at all and if so how many?


The key consideration when deciding vnodes or not is not really how many
nodes you initially deploy, it's how many nodes you are likely to ever
have. If you have a reasonable expectation of ever having over RF*3 nodes
where RF>=3, you probably want to use vnodes, so that you have a faster and
less degrading rebuild experience when 1/9 nodes fails. If you don't, it's
probably not worth the trouble.

The caveat here is that it is non-trivial to get from non-vnodes to vnodes
if RF

Re: Memory issue

2014-05-22 Thread opensaf dev
Well Romain, I had tried restarting the VM as well but problem still
remained.

What I noticed is after sometime irrespective I run cassandra from other
user or using the normal cassandra the problem still remains. As soon as it
starts, the JVM is get killed because of memory issue. Is there some other
settings other then limits.conf file I need to configure.

Also to note that I dont have a /etc/limits.d/cassandra.conf file. I just
configured them in limits.conf. Even though I made same the group ID of
both the users(cassanda, X) but no use. Is there anything like cassandra
has to be started under cassandra user only? What are the special
configurations required if we try to run cassandra under a different user?

Thanks
Dev






On Tue, May 20, 2014 at 10:44 PM, Romain HARDOUIN  wrote:

> Well... you have already changed the limits ;-)
> Keep in mind that changes in the limits.conf file will not affect
> processes that are already running.
>
> opensaf dev  a écrit sur 21/05/2014 06:59:05 :
>
> > De : opensaf dev 
> > A : user@cassandra.apache.org,
> > Date : 21/05/2014 07:00
> > Objet : Memory issue
> >
> > Hi guys,
> >
> > I am trying to run Cassandra on CentOS as an user X other then root
> > or cassandra. When I run as user cassandra, it starts and runs fine.
> > But, when I run under user X, I am getting the below error once
> > cassandra started and system freezes totally.
> >
> > Insufficient memlock settings:
> > WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable
> > to lock JVM memory (ENOMEM).
> > This can result in part of the JVM being swapped out, especially
> > with mmapped I/O enabled.
> > Increase RLIMIT_MEMLOCK or run Cassandra as root.
> >
> >
> > I have tried the tips available online to change the memlock and
> > other limits both for users cassadra and X, but did not solve the
> problem.
> >
>
> > What else I should consider when I run cassandra other then user
> > cassandra/root.
> >
> >
> > Any help is much appreciated.
> >
> >
> > Thanks
> > Dev
> >
>