Re: Replacing a dead node by deleting it and auto_bootstrap'ing a new node (Cassandra 2.0)

2014-12-05 Thread Omri Bahumi
I guess Cassandra is aware that it has some replicas not meeting the
replication factor. Wouldn't it be nice if a bootstrapping node would
get those?
Could make things much simpler in the Ops view.

What do you think?

On Fri, Dec 5, 2014 at 8:31 AM, Jaydeep Chovatia
 wrote:
> as per my knowledge if you have externally NOT specified
> "-Dcassandra.replace_address=old_node_ipaddress" then new tokens (randomly)
> would get assigned to bootstrapping node instead of tokens of dead node.
>
> -jaydeep
>
> On Thu, Dec 4, 2014 at 6:50 AM, Omri Bahumi  wrote:
>>
>> Hi,
>>
>> I was wondering, how would auto_bootstrap behave in this scenario:
>>
>> 1. I had a cluster with 3 nodes (RF=2)
>> 2. One node died, I deleted it with "nodetool removenode" (+ force)
>> 3. A new node launched with "auto_bootstrap: true"
>>
>> The question is: will the "right" vnodes go to the new node as if it
>> was bootstrapped with "-Dcassandra.replace_address=old_node_ipaddress"
>> ?
>>
>> Thanks,
>> Omri.
>
>


Cassandra memory & joining issues

2014-12-05 Thread farouk . umar
Hello,


A recent incident has brought to light that we have potentially two problems.
1. A node can start going up and down possibly due to memory issues.
2. We can't bring in new nodes


Here is an account of the incident.


3 vnode cluster setup (A, B & C). Cassandra version 2.0.10


1. We get an alert that a node is down (SD alert at 12:33)
2. We turn off the app that uses cassandra most heavily
3. Node A is down & CPU is high & it goes in repeating cycles of Garbage 
Collection which take a long time.
 INFO [ScheduledTasks:1] 2014-12-01 12:22:05,691 GCInspector.java (line 116) GC 
for ParNew: 2160 ms for 2 collections, 2847691776 used; max is 3911188480
    INFO [ScheduledTasks:1] 2014-12-01 12:22:06,658 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 13545 ms for 1 collections, 2801612640 used; max is 
3911188480
    INFO [ScheduledTasks:1] 2014-12-01 12:22:48,250 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 15891 ms for 1 collections, 3620884464 used; max is 
3911188480
    INFO [ScheduledTasks:1] 2014-12-01 12:23:07,925 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 16789 ms for 1 collections, 3696864640 used; max is 
3911188480
    INFO [ScheduledTasks:1] 2014-12-01 12:23:26,338 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 16777 ms for 1 collections, 3733452048 used; max is 
3911188480
    INFO [ScheduledTasks:1] 2014-12-01 12:23:46,990 GCInspector.java (line 116) 
GC for ParNew: 2783 ms for 5 collections, 3782932912 used; max is 3911188480
    INFO [ScheduledTasks:1] 2014-12-01 12:23:46,990 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 17203 ms for 1 collections, 3783141880 used; max is 
3911188480
    ..
    INFO [ScheduledTasks:1] 2014-12-01 12:30:35,256 GCInspector.java (line 116) 
GC for ConcurrentMarkSweep: 30084 ms for 2 collections, 3892036536 used; max is 
3911188480
4. Datastax Opscenter reports it as down
5. It keeps going down and up, logs showing it restarting and immediately going 
into the GC cycles again. 
6. Restarting the node manually did not help


At this point we decide to replace A, so;
1. Bring up a new node (D) with cassandra off
2. We stop node A
3. Before we start cassandra on node D, node B stops responding to 'nodetool 
status'
4. Node C reports B as up and after a while reports it as down
5. We turn on node A and it still has high CPU, though is not dropping out, 
logs do not show any long cycles of GC anymore.
6. We turn off node B
7. We start cassandra on node D and get this error on startup:
    ERROR [main] 2014-12-01 14:04:48,332 CassandraDaemon.java (line 513) 
Exception encountered during startup
    java.lang.IllegalStateException: unable to find sufficient sources for 
streaming range (2337155766868590732,2355076515890621387]
      at 
org.apache.cassandra.dht.RangeStreamer.getRangeFetchMap(RangeStreamer.java:201)
      at 
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:125)
      at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:72)
      at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:994)
      at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:797)
      at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
      at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
      at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
      at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
      at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
     INFO [StorageServiceShutdownHook] 2014-12-01 14:04:48,338 Gossiper.java 
(line 1279) Announcing shutdown
8. We restart node B and it joins the cluster fine.


Any help/pointers so that we can understand what happened and prevent it from 
happening in the future would be appreciated.


Thanks,

Farouk

—
Sent from Mailbox

Re: Cassandra schema migrator

2014-12-05 Thread Ben Hood
On Tue, Nov 25, 2014 at 12:49 PM, Phil Wise  wrote:
> https://github.com/advancedtelematic/cql-migrate

Great to see these tools out there!

Just to add to the list

https://github.com/mattes/migrate

Might not be as C* specific as the other tools mentioned earlier in
this thread, but it does integrate with Cassandra.


Re: Cassandra schema migrator

2014-12-05 Thread Brian Sam-Bodden
There is also https://github.com/hsgubert/cassandra_migrations

On Fri, Dec 5, 2014 at 7:49 AM, Ben Hood <0x6e6...@gmail.com> wrote:

> On Tue, Nov 25, 2014 at 12:49 PM, Phil Wise 
> wrote:
> > https://github.com/advancedtelematic/cql-migrate
>
> Great to see these tools out there!
>
> Just to add to the list
>
> https://github.com/mattes/migrate
>
> Might not be as C* specific as the other tools mentioned earlier in
> this thread, but it does integrate with Cassandra.
>


Re: Cassandra schema migrator

2014-12-05 Thread Phil Wise
I've added these as answers to a question I posted on Stack Overflow:

http://stackoverflow.com/questions/26460932/how-to-deploy-changes-to-a-cassandra-cql-schema/27013426

Thank you

Phil

On 05.12.2014 15:23, Brian Sam-Bodden wrote:
> There is also https://github.com/hsgubert/cassandra_migrations
> 
> On Fri, Dec 5, 2014 at 7:49 AM, Ben Hood <0x6e6...@gmail.com>
> wrote:
> 
>> On Tue, Nov 25, 2014 at 12:49 PM, Phil Wise
>>  wrote:
>>> https://github.com/advancedtelematic/cql-migrate
>> 
>> Great to see these tools out there!
>> 
>> Just to add to the list
>> 
>> https://github.com/mattes/migrate
>> 
>> Might not be as C* specific as the other tools mentioned earlier
>> in this thread, but it does integrate with Cassandra.
>> 
> 


Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Robert Wille
At the data modeling class at the Cassandra Summit, the instructor said that 
lots of small partitions are just fine. I’ve heard on this list that that is 
not true, and that its better to cluster small partitions into fewer, larger 
partitions. Due to conflicting information on this issue, I’d be interested in 
hearing people’s opinions.

For the sake of discussion, lets compare two tables:

CREATE TABLE a (
id INT,
value INT,
PRIMARY KEY (id)
)

CREATE TABLE b (
bucket INT,
id INT,
value INT,
PRIMARY KEY ((bucket), id)
)

And lets say that bucket is computed as id / N. For analysis purposes, lets 
assume I have 100 million id’s to store.

Table a is obviously going to have a larger bloom filter. That’s a clear 
negative.

When I request a record, table a will have less data to load from disk, so that 
seems like a positive.

Table a will never have its columns scattered across multiple SSTables, but 
table b might. If I only want one row from a partition in table b, does 
fragmentation matter (I think probably not, but I’m not sure)?

It’s not clear to me which will fit more efficiently on disk, but I would guess 
that table a wins.

Smaller partitions means sending less data during repair, but I suspect that 
when computing the Merkle tree for the table, more partitions might mean more 
overhead, but that’s only a guess. Which one repairs more efficiently?

In your opinion, which one is best and why? If you think table b is best, what 
would you choose N to be?

Robert



Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Tyler Hobbs
On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai  wrote:

> Sounds great! By the way, will you create a ticket for this, so we can
> follow the updates?


What would the ticket be for?  (I might have missed something in the
conversation.)


-- 
Tyler Hobbs
DataStax 


Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Tyler Hobbs
On Fri, Dec 5, 2014 at 11:14 AM, Robert Wille  wrote:

>
>  And lets say that bucket is computed as id / N. For analysis purposes,
> lets assume I have 100 million id’s to store.
>
>  Table a is obviously going to have a larger bloom filter. That’s a clear
> negative.
>

That's true, *but*, if you frequently ask for rows that do not exist, Table
B will have few BF false positives, while Table A will almost always get a
"hit" from the BF and have to look into the SSTable to see that the
requested row doesn't actually exist.


>
>  When I request a record, table a will have less data to load from disk,
> so that seems like a positive.
>

Correct.


>
>  Table a will never have its columns scattered across multiple SSTables,
> but table b might. If I only want one row from a partition in table b, does
> fragmentation matter (I think probably not, but I’m not sure)?
>

Yes, fragmentation can matter.  Cassandra knows the min and max clustering
column values for each SSTable, so it can use those to narrow down the set
of SSTables it needs to read if you request a specific clustering column
value.  However, in your example, this isn't likely to narrow things down
much, so it will have to check many more SSTables.


>
>  It’s not clear to me which will fit more efficiently on disk, but I
> would guess that table a wins.
>

They're probably close enough not to matter very much.


>
>  Smaller partitions means sending less data during repair, but I suspect
> that when computing the Merkle tree for the table, more partitions might
> mean more overhead, but that’s only a guess. Which one repairs more
> efficiently?
>

Table A repairs more efficiently by far.  Currently repair must repair
entire partitions when they differ.  It cannot repair individual rows
within a partition.


>
>  In your opinion, which one is best and why? If you think table b is
> best, what would you choose N to be?
>

Table A, hands down.  Here's why: you should model your tables to fit your
queries.  If you're doing a basic K/V lookup, model it like table A.
People recommend wide partitions because many (if not most) queries are
best served by that type of model, so if you're not using wide partitions,
it's a sign that something might be wrong.  However, there are certainly
plenty of use cases where single-row partitions are fine.


-- 
Tyler Hobbs
DataStax 


Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread DuyHai Doan
Another argument for table A is that it leverages a lot Bloom filter for
fast lookup. If negative, no disk hit otherwise at most 1 or 2 disk hits
depending on the fp chance.

Compaction also works better on skinny partition.

On Fri, Dec 5, 2014 at 6:33 PM, Tyler Hobbs  wrote:

>
> On Fri, Dec 5, 2014 at 11:14 AM, Robert Wille  wrote:
>
>>
>>  And lets say that bucket is computed as id / N. For analysis purposes,
>> lets assume I have 100 million id’s to store.
>>
>>  Table a is obviously going to have a larger bloom filter. That’s a
>> clear negative.
>>
>
> That's true, *but*, if you frequently ask for rows that do not exist,
> Table B will have few BF false positives, while Table A will almost always
> get a "hit" from the BF and have to look into the SSTable to see that the
> requested row doesn't actually exist.
>
>
>>
>>  When I request a record, table a will have less data to load from disk,
>> so that seems like a positive.
>>
>
> Correct.
>
>
>>
>>  Table a will never have its columns scattered across multiple SSTables,
>> but table b might. If I only want one row from a partition in table b, does
>> fragmentation matter (I think probably not, but I’m not sure)?
>>
>
> Yes, fragmentation can matter.  Cassandra knows the min and max clustering
> column values for each SSTable, so it can use those to narrow down the set
> of SSTables it needs to read if you request a specific clustering column
> value.  However, in your example, this isn't likely to narrow things down
> much, so it will have to check many more SSTables.
>
>
>>
>>  It’s not clear to me which will fit more efficiently on disk, but I
>> would guess that table a wins.
>>
>
> They're probably close enough not to matter very much.
>
>
>>
>>  Smaller partitions means sending less data during repair, but I suspect
>> that when computing the Merkle tree for the table, more partitions might
>> mean more overhead, but that’s only a guess. Which one repairs more
>> efficiently?
>>
>
> Table A repairs more efficiently by far.  Currently repair must repair
> entire partitions when they differ.  It cannot repair individual rows
> within a partition.
>
>
>>
>>  In your opinion, which one is best and why? If you think table b is
>> best, what would you choose N to be?
>>
>
> Table A, hands down.  Here's why: you should model your tables to fit your
> queries.  If you're doing a basic K/V lookup, model it like table A.
> People recommend wide partitions because many (if not most) queries are
> best served by that type of model, so if you're not using wide partitions,
> it's a sign that something might be wrong.  However, there are certainly
> plenty of use cases where single-row partitions are fine.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai

> On Dec 5, 2014, at 11:23 AM, Tyler Hobbs  wrote:
> 
> 
> On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai  > wrote:
> Sounds great! By the way, will you create a ticket for this, so we can follow 
> the updates?
> 
> What would the ticket be for?  (I might have missed something in the 
> conversation.)
> 

Sorry, there aren’t any tickets then. I just want to have a way to be aware of 
the progress. :)

- Dong

> 
> -- 
> Tyler Hobbs
> DataStax 



Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
What progress are you trying to be aware of? All of the features Tyler
discussed are implemented and can be used.

On Fri, Dec 5, 2014 at 2:41 PM, Dong Dai  wrote:

>
> On Dec 5, 2014, at 11:23 AM, Tyler Hobbs  wrote:
>
>
> On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai  wrote:
>
>> Sounds great! By the way, will you create a ticket for this, so we can
>> follow the updates?
>
>
> What would the ticket be for?  (I might have missed something in the
> conversation.)
>
>
> Sorry, there aren’t any tickets then. I just want to have a way to be
> aware of the progress. :)
>
> - Dong
>
>
> --
> Tyler Hobbs
> DataStax 
>
>
>


Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai
Err, am i misunderstanding something? 
I thought Tyler is going to add some codes to split unlogged batch and make the 
batch insertion token aware.

it is already done? or else i can do it too.

thanks,
- Dong

> On Dec 5, 2014, at 2:06 PM, Philip Thompson  
> wrote:
> 
> What progress are you trying to be aware of? All of the features Tyler 
> discussed are implemented and can be used.
> 
> On Fri, Dec 5, 2014 at 2:41 PM, Dong Dai  > wrote:
> 
>> On Dec 5, 2014, at 11:23 AM, Tyler Hobbs > > wrote:
>> 
>> 
>> On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai > > wrote:
>> Sounds great! By the way, will you create a ticket for this, so we can 
>> follow the updates?
>> 
>> What would the ticket be for?  (I might have missed something in the 
>> conversation.)
>> 
> 
> Sorry, there aren’t any tickets then. I just want to have a way to be aware 
> of the progress. :)
> 
> - Dong
> 
>> 
>> -- 
>> Tyler Hobbs
>> DataStax 
> 
> 



Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
Splitting the batches by partition key and inserting them with a TokenAware
policy is already possible with existing driver code, though you will have
to split the batches yourself.

On Fri, Dec 5, 2014 at 3:12 PM, Dong Dai  wrote:

> Err, am i misunderstanding something?
> I thought Tyler is going to add some codes to split unlogged batch and
> make the batch insertion token aware.
>
> it is already done? or else i can do it too.
>
> thanks,
> - Dong
>
> On Dec 5, 2014, at 2:06 PM, Philip Thompson 
> wrote:
>
> What progress are you trying to be aware of? All of the features Tyler
> discussed are implemented and can be used.
>
> On Fri, Dec 5, 2014 at 2:41 PM, Dong Dai  wrote:
>
>>
>> On Dec 5, 2014, at 11:23 AM, Tyler Hobbs  wrote:
>>
>>
>> On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai  wrote:
>>
>>> Sounds great! By the way, will you create a ticket for this, so we can
>>> follow the updates?
>>
>>
>> What would the ticket be for?  (I might have missed something in the
>> conversation.)
>>
>>
>> Sorry, there aren’t any tickets then. I just want to have a way to be
>> aware of the progress. :)
>>
>> - Dong
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>>
>>
>
>


Re: full gc too often

2014-12-05 Thread Robert Coli
On Thu, Dec 4, 2014 at 8:13 PM, Philo Yang  wrote:

> In each time Old Gen reduce only a little, Survivor Space will be clear
> but the heap is still full so there will be another full gc very soon then
> the node will down. If I restart the node, it will be fine without gc
> trouble.
>
> Can anyone help me to find out where is the problem that full gc can't
> reduce CMS Old Gen? Is it because there are too many objects in heap can't
> be recycled? I think review the table scheme designing and add new nodes
> into cluster is a good idea, but I still want to know if there is any other
> reason causing this trouble.
>

Yes. This state is what I call "GC pre-fail" because collections happen
constantly, and reclaim not-enough-heap.

In general what you have to do is look for sources of heap pressure that
put you into GC pre-fail and eliminate them or tune the generations to make
them acceptable for your workloads. If extended runtime with steady state
load, then growth in total stored data size and its derived increased heap
consumption is often to blame.

=Rob


Re: Repair taking many snapshots per minute

2014-12-05 Thread Robert Coli
On Thu, Dec 4, 2014 at 7:19 AM, Robert Wille  wrote:

> Does anybody have any idea what might cause this? That it happens at all
> is bizarre, and that it happens on only three nodes is even more bizarre.
> Also, it really doesn’t seem to have difficulty creating snapshots, so the
> snapshot failure creation errors are quite a mystery.
>

I conjecture the large number of snapshots relate to some automated repair
process accidentally repeatedly running repair?

Repair has been modified to use serial repair by default since early 2.0.
In order to do serial repair, it creates a snapshot.

https://issues.apache.org/jira/browse/CASSANDRA-5950

Is the ticket in which the Cassandra team (IMO) unreasonably and without
justification changes this default, resulting in lots of operators
experiencing suddenly dramatically different behavior on a minor point
release.

If you, as an operator of Cassandra in production, don't like these kind of
surprise major changes to defaults in a minor version without any
justification, your input is welcome on that JIRA, or on this one :

https://issues.apache.org/jira/browse/CASSANDRA-8177

The snapshotting is broken throughout 2.x, FWIW, and over-snapshots and
over-repairs as a result.

https://issues.apache.org/jira/browse/CASSANDRA-7024

And while we’re talking repairs, I have some questions about monitoring
> them. Even when not running an explicit repair, I randomly see repair tasks
> in OpsCenter. They usually only last a few seconds, and the progress
> percentage often goes into the quadruple digits. When I run repair using
> nodetool, it takes several hours, but again, all I ever see in OpsCenter
> are these random, short-lived repair tasks. Is there any way to monitor
> repairs? I frequently see posts about stalled repairs. How do you know a
> repair has stalled when you can’t see it? And, how do you know if a repair
> actually succeeded or not?
>

I have no idea why OpsCenter would spawn random repair tasks.

https://issues.apache.org/jira/browse/CASSANDRA-5483

Is the work for improved tracing of repair sessions.

=Rob


Re: Keyspace and table/cf limits

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 1:54 PM, Raj N  wrote:

> The question is more from a multi-tenancy point of view. We wanted to see
> if we can have a keyspace per client. Each keyspace may have 50 column
> families, but if we have 200 clients, that would be 10,000 column families.
> Do you think that's reasonable to support? I know that key cache capacity
> is reserved in heap still. Any plans to move it off-heap?
>

That's an order of magnitude more CFs than I would want to try to operate.

But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so
grain of salt.

=Rob
http://twitter.com/rcolidba


What companies are using Cassandra to serve customer-facing product?

2014-12-05 Thread jeremy p
Hey all,

So, I'm currently evaluating Cassandra + CQL as a solution for querying a
very large data set (think 60+ TB). We'd like to use it to directly power a
customer-facing product. My question is threefold :

1) What companies use Cassandra to serve a customer-facing product? I'm not
interested in evaluations, experiments, or POC.  I'm also not interested in
offline BI or analytics.  I'm specifically interested in cases where
Cassandra serves as the data store for a customer-facing product.

2) Of the companies that use Cassandra to serve a customer-facing product,
which ones use it to query data sets of 60TB or more?

3) Of companies use Cassandra to query 60+ TB data sets and serve a
customer-facing product, how many employees are required to support their
Cassandra installation?  In other words, if I were to start a team
tomorrow, and their purpose was to maintain a 60+ TB Cassandra installation
for a customer-facing product, how many people should I hire?

4) Of companies use Cassandra to query 60+ TB data sets and serve a
customer-facing product, what kind of measures do they take for disaster
recovery?

If you can, please point me to articles, videos, and other materials.
Obviously, the larger the company, the better case it will make for
Cassandra.

Thank you!


Re: Recommissioned node is much smaller

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 10:10 AM, Robert Wille  wrote:

>  Load and ownership didn’t correlate nearly as well as I expected. I have
> lots and lots of very small records. I would expect very high correlation.
>
>  I think the moral of the story is that I shouldn’t delete the system
> directory. If I have issues with a node, I should recommission it properly.
>

If you always specify initial_token in cassandra.yaml, then you are
protected from some cases similar to the one that you seem to have just
encountered.

Wish I had actually managed to post this on a blog, but :


--- cut ---

example of why :

https://issues.apache.org/jira/browse/CASSANDRA-5571

11:22 < rcoli> but basically, explicit is better than implicit
11:22 < rcoli> the only reason ppl let cassandra pick tokens is that it's
semi-complex to do "right" with vnodes
11:22 < rcoli> but once it has picked tokens
11:22 < rcoli> you know what they are
11:22 < rcoli> why have a risky conf file that relies on implicit state?
11:23 < rcoli> just put the tokens in the conf file. done.
11:23 < rcoli> then you can use auto_bootstrap:false even if you lose
system keyspace, etc.

I plan to write a short blog post about this, but...

I recommend that anyone using Cassandra, vnodes or not, always explicitly
populate their initial_token line in cassandra.yaml. There are a number of
cases where you will lose if you do not do so, and AFAICT no cases where
you lose by doing so.

If one is using vnodes and wants to do this, the process goes like :

1) set num_tokens to the desired number of vnodes
2) start node/bootstrap
3) use a one liner like jeffj's :

"
nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
's/,$/\n/'
"

to get a comma delimited list of the vnode tokens

4) insert this comma delimited list in initial_tokens, and comment out
num_tokens (though it is a NOOP)

 --- cut ---

=Rob


Re: Cassandra taking snapshots automatically?

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 10:46 AM, Robert Wille  wrote:

>  No. auto_snapshot is turned on, but snapshot_before_compaction is off.
>
>  Maybe this will shed some light on it. I tried running nodetool repair.
> I got several messages saying "Lost notification. You should check server
> log for repair status of keyspace test2_browse”.
>
>  I looked in system.log, and I have errors where repair is trying to
> create a snapshot. Not sure why repair is trying to create snapshots, or
> why it is failing. I also now have about 200 snapshots. One table has just
> one. Another table has 124.
>

As mentioned in another thread, this surprising behavior is likely the
result of sequential repair, which involves snapshotting.

I am somewhat redundantly pasting the relevant ticket here for future
googlers.

https://issues.apache.org/jira/browse/CASSANDRA-5950

=Rob


Re: nodetool repair exception

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 6:37 AM, Rafał Furmański 
wrote:

> I see “Too many open files” exception in logs, but I’m sure that my limit
> is now 150k.
> Should I increase it? What’s the reasonable limit of open files for
> cassandra?


Why provide any limit? ulimit allows "unlimited"?

=Rob


Re: What companies are using Cassandra to serve customer-facing product?

2014-12-05 Thread Tyler Hobbs
This page lists a lot of Cassandra users with descriptions of the use case:
http://planetcassandra.org/apache-cassandra-use-cases/

On Fri, Dec 5, 2014 at 3:33 PM, jeremy p 
wrote:

> Hey all,
>
> So, I'm currently evaluating Cassandra + CQL as a solution for querying a
> very large data set (think 60+ TB). We'd like to use it to directly power a
> customer-facing product. My question is threefold :
>
> 1) What companies use Cassandra to serve a customer-facing product? I'm
> not interested in evaluations, experiments, or POC.  I'm also not
> interested in offline BI or analytics.  I'm specifically interested in
> cases where Cassandra serves as the data store for a customer-facing
> product.
>
> 2) Of the companies that use Cassandra to serve a customer-facing product,
> which ones use it to query data sets of 60TB or more?
>
> 3) Of companies use Cassandra to query 60+ TB data sets and serve a
> customer-facing product, how many employees are required to support their
> Cassandra installation?  In other words, if I were to start a team
> tomorrow, and their purpose was to maintain a 60+ TB Cassandra installation
> for a customer-facing product, how many people should I hire?
>
> 4) Of companies use Cassandra to query 60+ TB data sets and serve a
> customer-facing product, what kind of measures do they take for disaster
> recovery?
>
> If you can, please point me to articles, videos, and other materials.
> Obviously, the larger the company, the better case it will make for
> Cassandra.
>
> Thank you!
>



-- 
Tyler Hobbs
DataStax 


Re: Replacing a dead node by deleting it and auto_bootstrap'ing a new node (Cassandra 2.0)

2014-12-05 Thread Jaydeep Chovatia
I think Cassandra gives us control as what we want to do:
a) If we want to replace a dead node then we should specify
"-Dcassandra.replace_address=old_node_ipaddress"
b) If we are adding new nodes (no replacement) then do not specify above
option and tokens would get assigned randomly.

I can think of a scenario in which your dead node has tons of data and you
are hopeful on its recovery so you do not want to replace this dead node
always. Momentarily you might just add a new node to meet the the capacity
until dead not is fully recovered.

-jaydeep

On Thu, Dec 4, 2014 at 11:30 PM, Omri Bahumi  wrote:

> I guess Cassandra is aware that it has some replicas not meeting the
> replication factor. Wouldn't it be nice if a bootstrapping node would
> get those?
> Could make things much simpler in the Ops view.
>
> What do you think?
>
> On Fri, Dec 5, 2014 at 8:31 AM, Jaydeep Chovatia
>  wrote:
> > as per my knowledge if you have externally NOT specified
> > "-Dcassandra.replace_address=old_node_ipaddress" then new tokens
> (randomly)
> > would get assigned to bootstrapping node instead of tokens of dead node.
> >
> > -jaydeep
> >
> > On Thu, Dec 4, 2014 at 6:50 AM, Omri Bahumi  wrote:
> >>
> >> Hi,
> >>
> >> I was wondering, how would auto_bootstrap behave in this scenario:
> >>
> >> 1. I had a cluster with 3 nodes (RF=2)
> >> 2. One node died, I deleted it with "nodetool removenode" (+ force)
> >> 3. A new node launched with "auto_bootstrap: true"
> >>
> >> The question is: will the "right" vnodes go to the new node as if it
> >> was bootstrapped with "-Dcassandra.replace_address=old_node_ipaddress"
> >> ?
> >>
> >> Thanks,
> >> Omri.
> >
> >
>


How to model data to achieve specific data locality

2014-12-05 Thread Kai Wang
I have a data model question. I am trying to figure out how to model the
data to achieve the best data locality for analytic purpose. Our
application processes sequences. Each sequence has a unique key in the
format of [seq_id]_[seq_type]. For any given seq_id, there are unlimited
number of seq_types. The typical read is to load a subset of sequences with
the same seq_id. Naturally I would like to have all the sequences with the
same seq_id to co-locate on the same node(s).


However I can't simply create one partition per seq_id and use seq_id as my
partition key. That's because:


1. there could be thousands or even more seq_types for each seq_id. It's
not feasible to include all the seq_types into one table.

2. each seq_id might have different sets of seq_types.

3. each application only needs to access a subset of seq_types for a
seq_id. Based on CASSANDRA-5762, select partial row loads the whole row. I
prefer only touching the data that's needed.


As per above, I think I should use one partition per [seq_id]_[seq_type].
But how can I archive the data locality on seq_id? One possible approach is
to override IPartitioner so that I just use part of the field (say 64
bytes) to get the token (for location) while still using the whole field as
partition key (for look up). But before heading that direction, I would
like to see if there are better options out there. Maybe any new or
upcoming features in C* 3.0?


Thanks.


Re: Keyspace and table/cf limits

2014-12-05 Thread Kai Wang
On Fri, Dec 5, 2014 at 4:32 PM, Robert Coli  wrote:

> On Wed, Dec 3, 2014 at 1:54 PM, Raj N  wrote:
>
>> The question is more from a multi-tenancy point of view. We wanted to see
>> if we can have a keyspace per client. Each keyspace may have 50 column
>> families, but if we have 200 clients, that would be 10,000 column families.
>> Do you think that's reasonable to support? I know that key cache capacity
>> is reserved in heap still. Any plans to move it off-heap?
>>
>
> That's an order of magnitude more CFs than I would want to try to operate.
>
> But then, I wouldn't want to operate Cassandra multi-tenant AT ALL, so
> grain of salt.
>
> =Rob
> http://twitter.com/rcolidba
>
>
I don't know if it's still true but Jonathan Ellis wrote in an old post
saying there's a fixed overhead per cf. Here is the link.
http://dba.stackexchange.com/a/12413. Even if it's improved since C* 1.0, I
still don't feel comfortable to scale my system by creating CFs.