Transitioning to incremental repair

2015-12-01 Thread Sam Klock
Hi folks,

A question like this was recently asked, but I don't think anyone ever 
supplied an unambiguous answer.  We have a set of clusters currently 
using sequential repair, and we'd like to transition them to 
incremental repair.  According to the documentation, this is a very 
manual (and likely time-consuming) process:

http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html

Our understanding is that this process is necessary for tables that use 
LCS, as unrepaired tables are compacted using STCS and (without the 
process described in the doc) all tables start in the unrepaired 
state.  The pain of this migration strategy is supposed to be offset by 
the savings in undesired compaction activity.  The docs aren't 
especially clear, but it sounds like this strategy is not needed for 
tables that use STCS.

However, CASSANDRA-8004 (resolved against 2.1.2) appears intended to 
have both the repaired and unrepaired sstable sets use the same 
compaction strategy.  It seems like that obviates the rationale for a 
migration procedure, which is supported by offhand comments on this 
list, e.g.:

https://www.mail-archive.com/user%40cassandra.apache.org/msg40303.html
https://www.mail-archive.com/user%40cassandra.apache.org/msg44896.html

In other words, it *looks* like the docs are obsolete, and the 
migration process for existing clusters only consists of flipping the 
switch (i.e., adding "-inc" to invocations of "nodetool repair").

Our questions:

1) Is our understanding of the status quo following 2.1.2 correct?  
Does migrating existing clusters to incremental repair only require 
adding the "-inc" argument, or is a process still required?

2) If a process is still required, have there been any changes since 
2.1.2?  Are the docs up-to-date?

3) If there is no process or if the process has changed, are there 
plans on the DataStax side to update the documentation accordingly?

Thanks,
SK


Re: Transitioning to incremental repair

2015-12-01 Thread Marcus Eriksson
Yes, it should now be safe to just run a repair with -inc -par to migrate
to incremental repairs

BUT, if you currently use for example repair service in OpsCenter or
Spotifys Cassandra reaper, you might still want to migrate the way it is
documented as you will have to run a full repair to migrate to incremental
repairs, not many sub range repairs and that might not be possible for some
users with a lot of data or with vnodes etc.

I would also wait until
https://issues.apache.org/jira/browse/CASSANDRA-10768 has been committed
and released as it will improve anticompaction performance

/Marcus

On Tue, Dec 1, 2015 at 3:24 PM, Sam Klock  wrote:

> Hi folks,
>
> A question like this was recently asked, but I don't think anyone ever
> supplied an unambiguous answer.  We have a set of clusters currently
> using sequential repair, and we'd like to transition them to
> incremental repair.  According to the documentation, this is a very
> manual (and likely time-consuming) process:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html
>
> Our understanding is that this process is necessary for tables that use
> LCS, as unrepaired tables are compacted using STCS and (without the
> process described in the doc) all tables start in the unrepaired
> state.  The pain of this migration strategy is supposed to be offset by
> the savings in undesired compaction activity.  The docs aren't
> especially clear, but it sounds like this strategy is not needed for
> tables that use STCS.
>
> However, CASSANDRA-8004 (resolved against 2.1.2) appears intended to
> have both the repaired and unrepaired sstable sets use the same
> compaction strategy.  It seems like that obviates the rationale for a
> migration procedure, which is supported by offhand comments on this
> list, e.g.:
>
> https://www.mail-archive.com/user%40cassandra.apache.org/msg40303.html
> https://www.mail-archive.com/user%40cassandra.apache.org/msg44896.html
>
> In other words, it *looks* like the docs are obsolete, and the
> migration process for existing clusters only consists of flipping the
> switch (i.e., adding "-inc" to invocations of "nodetool repair").
>
> Our questions:
>
> 1) Is our understanding of the status quo following 2.1.2 correct?
> Does migrating existing clusters to incremental repair only require
> adding the "-inc" argument, or is a process still required?
>
> 2) If a process is still required, have there been any changes since
> 2.1.2?  Are the docs up-to-date?
>
> 3) If there is no process or if the process has changed, are there
> plans on the DataStax side to update the documentation accordingly?
>
> Thanks,
> SK
>


Re: Running sstableloader from every node when migrating?

2015-12-01 Thread George Sigletos
Thank you Robert and Anuja,

It does not seem that sstable2json is the right tool to go: there is no
documentation beyond Cassandra 1.2, it requires a specific sstable to be
given, which means a lot of manual work.

The documentation also mentions it is good for testing/debugging but I
would need to migrate near 1 TB of data from a 6-node cluster to a 3-node
one. Neither copying sstables/nodetool refresh seems a great option as
well. Unless I am missing something.

Using sstableloader seems a more logical option. Still a bottleneck if you
need to do it for every node in your source cluster. What if you had a
100-node cluster?

Thinking of just running a simple script, instead, that selects data from
the source cluster and inserts them to the target one.

Kind regards,
George

On Tue, Dec 1, 2015 at 7:54 AM, anuja jain  wrote:

> Hello George,
> You can use sstable2json to create the json of your keyspace and then load
> this json to your keyspace in new cluster using json2sstable utility.
>
> On Tue, Dec 1, 2015 at 3:06 AM, Robert Coli  wrote:
>
>> On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos 
>> wrote:
>>
>>> We would like to migrate one keyspace from a 6-node cluster to a 3-node
>>> one.
>>>
>>
>> http://www.pythian.com/blog/bulk-loading-options-for-cassandra/
>>
>> =Rob
>>
>>
>
>


CPU usage is increased on node addition, even after bootstrap is finished

2015-12-01 Thread Thanasis Naskos

Hi,

CLUSTER SETUP
I'm using Cassandra 2.2.3 running on small private cloud infrastructure 
(supported by ganeti+KVM).
I have an initial Cassandra cluster of 8 nodes and a keyspace with 
Simple Strategy and Replication Factor 3, which is loaded with 2GBs of 
data (2GBs * 3rep = ~6GBs data in total).
On every Cassandra node I'm running ganglia to collect measurements of 
various metrics like incoming load, throughput, response latency and CPU 
usage.
Every Cassandra node has 2 vCPUs, 80GB HDD (commitlog NOT in a separate 
disk) and 6GB RAM.


CLIENTS SETUP
I'm using YCSB benchmark to produce READ load, using CQL clients 
(v2.1.8) with asynchronous calls.
21 YCSB client threads produce ~780 read requests/second each (~16380 
req/sec in total).


EXPERIMENT 1
I'm keeping the load steady to ~16380 r/s and I'm adding 1 node 
periodically. After adding a new node and after the bootstrapping is 
over, I expect both the response latency and the CPU usage to decrease, 
however this is not the case. On every node addition CPU usage increases 
and resp. latency is either steady or increases too.


MEASUREMENT LOGS
# nodes
LOAD (req/sec)  THROUGHPUT (req/sec)
LATENCY (ms)
CPU (%)
8
15668,81
15679,81
56,09
86,67
9
16177,45
16185,05
62,96
88.61
10
16353,36
16343,27
75,22
89,48
11
15723,14
15682,06
65,84
90,0
12
15348,97
15327,27
103,13
90,40


I moved from
8 to 9 nodes after 10 minutes,
9 to 10 after 15 minutes,
10 to 11 after 25 minutes,
11 to 12 after 25 minutes.
The bootstrapping took about 1,5 minutes on every addition. I didn't run 
nodetool cleanup at all.

The measurements of the table are averages.

EXPERIMENT 2
I have also tried with 7 nodes (up to 12) and lower incoming load ~9600 
req/seq. Both latency and CPU usage are kept to the same level no matter 
the number of nodes (3-4ms latency and 75% CPU load). And I've also run 
nodetool cleanup, still no decrease.


I've read somewhere that the benefits of the node addition in Cassandra 
are linear, am I missing something?


Thanks a lot!
Thanasis






Cassandra and GPU's...

2015-12-01 Thread Tony Anecito
Hi All,
Can Cassandra use GPU's? If not can someone recommend a open source database 
that runs on GPU's? I am interested in seeing the performance difference of a 
database that is under 2GB run on a GPU card such as as NVIDA gtx 980.
Thanks,-Tony


Re: Cassandra and GPU's...

2015-12-01 Thread Steve Robenalt
Hi Tony,

Somebody will likely prove me wrong on this (and I'd love to see it), but
I'm skeptical that there is much intersection between the set of things a
GPU is good at and the set of things a database needs to do. As such, I
don't expect there'd be much performance gain unless a way to exploit the
massive parallelism of the GPU effectively can be found.

Of course, GPU-driven analytics on the contents of the database opens up
all kinds of possibilities given the right kind of data...

Steve

On Tue, Dec 1, 2015 at 10:16 AM, Tony Anecito  wrote:

> Hi All,
>
> Can Cassandra use GPU's? If not can someone recommend a open source
> database that runs on GPU's? I am interested in seeing the performance
> difference of a database that is under 2GB run on a GPU card such as as
> NVIDA gtx 980.
>
> Thanks,
> -Tony
>



-- 
Steve Robenalt
Software Architect
sroben...@highwire.org 
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, CA 94063
www.highwire.org

Technology for Scholarly Communication


Re: Cassandra and GPU's...

2015-12-01 Thread james anderson
good evening;

> On 2015-12-01, at 21:17, Steve Robenalt  wrote:
> 
> Hi Tony,
> 
> Somebody will likely prove me wrong on this (and I'd love to see it), but I'm 
> skeptical that there is much intersection between the set of things a GPU is 
> good at and the set of things a database needs to do.

of course the context varies, but there are some demonstrated advantages:

https://www.blazegraph.com/product/gpu-accelerated/ 


> As such, I don't expect there'd be much performance gain unless a way to 
> exploit the massive parallelism of the GPU effectively can be found.
> 
> Of course, GPU-driven analytics on the contents of the database opens up all 
> kinds of possibilities given the right kind of data...
> 
> Steve
> 
> On Tue, Dec 1, 2015 at 10:16 AM, Tony Anecito  > wrote:
> Hi All,
> 
> Can Cassandra use GPU's? If not can someone recommend a open source database 
> that runs on GPU's? I am interested in seeing the performance difference of a 
> database that is under 2GB run on a GPU card such as as NVIDA gtx 980.
> 
> Thanks,
> -Tony
> 
> 
> 
> -- 
> Steve Robenalt 
> Software Architect
> sroben...@highwire.org  
> (office/cell): 916-505-1785
> 
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org 
> 
> Technology for Scholarly Communication






Re: Cassandra and GPU's...

2015-12-01 Thread Steve Robenalt
Hi James,

Yup, the analytics side of database usage is ripe with possibilities, as
the site at your link shows. In my original skepticism, I was referring not
to the analytics using the database, but to the database itself. In
Cassandra-specific terms, I would suggest that GPUs have more potential
impact on the Spark and/or Titan integration with Cassandra, rather than on
Cassandra itself.

Steve


On Tue, Dec 1, 2015 at 12:35 PM, james anderson 
wrote:

> good evening;
>
> On 2015-12-01, at 21:17, Steve Robenalt  wrote:
>
> Hi Tony,
>
> Somebody will likely prove me wrong on this (and I'd love to see it), but
> I'm skeptical that there is much intersection between the set of things a
> GPU is good at and the set of things a database needs to do.
>
>
> of course the context varies, but there are some demonstrated advantages:
>
> https://www.blazegraph.com/product/gpu-accelerated/
>
> As such, I don't expect there'd be much performance gain unless a way to
> exploit the massive parallelism of the GPU effectively can be found.
>
> Of course, GPU-driven analytics on the contents of the database opens up
> all kinds of possibilities given the right kind of data...
>
> Steve
>
> On Tue, Dec 1, 2015 at 10:16 AM, Tony Anecito  wrote:
>
>> Hi All,
>>
>> Can Cassandra use GPU's? If not can someone recommend a open source
>> database that runs on GPU's? I am interested in seeing the performance
>> difference of a database that is under 2GB run on a GPU card such as as
>> NVIDA gtx 980.
>>
>> Thanks,
>> -Tony
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org 
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>
>
>
>
>
>


-- 
Steve Robenalt
Software Architect
sroben...@highwire.org 
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, CA 94063
www.highwire.org

Technology for Scholarly Communication


Re: Transitioning to incremental repair

2015-12-01 Thread Bryan Cheng
Sorry if I misunderstood, but are you asking about the LCS case?

Based on our experience, I would absolutely recommend you continue with the
migration procedure. Even if the compaction strategy is the same, the
process of anticompaction is incredibly painful. We observed our test
cluster running 2.1.11 experiencing a dramatic increase in latency and not
responding to nodetool queries over JMX while anticompacting the largest
SSTables. This procedure also took several times longer than a standard
full repair.

If you absolutely cannot perform the migration procedure, I believe 2.2.x
contains the changes to automatically set the RepairedAt flags after a full
repair, so you may be able to do a full repair on 2.2.x and then transition
directly to incremental without migrating (can someone confirm?)


Re: Cassandra and GPU's...

2015-12-01 Thread james anderson
good morning;

> On 2015-12-01, at 21:53, Steve Robenalt  > wrote:
> 
> Hi James,
> 
> Yup, the analytics side of database usage is ripe with possibilities, as the 
> site at your link shows.

that case is not the “analytics”, but the database itself.
the graph navigation is a core aspect of that particular database form.

> In my original skepticism, I was referring not to the analytics using the 
> database, but to the database itself. In Cassandra-specific terms, I would 
> suggest that GPUs have more potential impact on the Spark and/or Titan 
> integration with Cassandra, rather than on Cassandra itself.
> 
> Steve
> 
> 
> On Tue, Dec 1, 2015 at 12:35 PM, james anderson  > wrote:
> good evening;
> 
>> On 2015-12-01, at 21:17, Steve Robenalt > > wrote:
>> 
>> Hi Tony,
>> 
>> Somebody will likely prove me wrong on this (and I'd love to see it), but 
>> I'm skeptical that there is much intersection between the set of things a 
>> GPU is good at and the set of things a database needs to do.
> 
> of course the context varies, but there are some demonstrated advantages:
> 
> https://www.blazegraph.com/product/gpu-accelerated/ 
> 
> 
>> As such, I don't expect there'd be much performance gain unless a way to 
>> exploit the massive parallelism of the GPU effectively can be found.
>> 
>> Of course, GPU-driven analytics on the contents of the database opens up all 
>> kinds of possibilities given the right kind of data...
>> 
>> Steve
>> 
>> On Tue, Dec 1, 2015 at 10:16 AM, Tony Anecito > > wrote:
>> Hi All,
>> 
>> Can Cassandra use GPU's? If not can someone recommend a open source database 
>> that runs on GPU's? I am interested in seeing the performance difference of 
>> a database that is under 2GB run on a GPU card such as as NVIDA gtx 980.
>> 
>> Thanks,
>> -Tony
>> 
>> 
>> 
>> -- 
>> Steve Robenalt 
>> Software Architect
>> sroben...@highwire.org  
>> (office/cell): 916-505-1785 
>> 
>> HighWire Press, Inc.
>> 425 Broadway St, Redwood City, CA 94063
>> www.highwire.org 
>> 
>> Technology for Scholarly Communication
> 
> 
> 
> 
> 
> 
> 
> -- 
> Steve Robenalt 
> Software Architect
> sroben...@highwire.org  
> (office/cell): 916-505-1785
> 
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org 
> 
> Technology for Scholarly Communication



Re: Cassandra and GPU's...

2015-12-01 Thread Rock Zhang
unsubscribe

On Tue, Dec 1, 2015 at 5:55 PM, james anderson 
wrote:

> good morning;
>
> On 2015-12-01, at 21:53, Steve Robenalt  wrote:
>
> Hi James,
>
> Yup, the analytics side of database usage is ripe with possibilities, as
> the site at your link shows.
>
>
> that case is not the “analytics”, but the database itself.
> the graph navigation is a core aspect of that particular database form.
>
> In my original skepticism, I was referring not to the analytics using the
> database, but to the database itself. In Cassandra-specific terms, I would
> suggest that GPUs have more potential impact on the Spark and/or Titan
> integration with Cassandra, rather than on Cassandra itself.
>
> Steve
>
>
> On Tue, Dec 1, 2015 at 12:35 PM, james anderson 
> wrote:
>
>> good evening;
>>
>> On 2015-12-01, at 21:17, Steve Robenalt  wrote:
>>
>> Hi Tony,
>>
>> Somebody will likely prove me wrong on this (and I'd love to see it), but
>> I'm skeptical that there is much intersection between the set of things a
>> GPU is good at and the set of things a database needs to do.
>>
>>
>> of course the context varies, but there are some demonstrated advantages:
>>
>> https://www.blazegraph.com/product/gpu-accelerated/
>>
>> As such, I don't expect there'd be much performance gain unless a way to
>> exploit the massive parallelism of the GPU effectively can be found.
>>
>> Of course, GPU-driven analytics on the contents of the database opens up
>> all kinds of possibilities given the right kind of data...
>>
>> Steve
>>
>> On Tue, Dec 1, 2015 at 10:16 AM, Tony Anecito 
>> wrote:
>>
>>> Hi All,
>>>
>>> Can Cassandra use GPU's? If not can someone recommend a open source
>>> database that runs on GPU's? I am interested in seeing the performance
>>> difference of a database that is under 2GB run on a GPU card such as as
>>> NVIDA gtx 980.
>>>
>>> Thanks,
>>> -Tony
>>>
>>
>>
>>
>> --
>> Steve Robenalt
>> Software Architect
>> sroben...@highwire.org 
>> (office/cell): 916-505-1785
>>
>> HighWire Press, Inc.
>> 425 Broadway St, Redwood City, CA 94063
>> www.highwire.org
>>
>> Technology for Scholarly Communication
>>
>>
>>
>>
>>
>>
>
>
> --
> Steve Robenalt
> Software Architect
> sroben...@highwire.org 
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>
>
>