Re: Huge amounts of hinted handoffs for counter table

2015-09-24 Thread Björn Hachmann
Thank you for your time!

Our replication factor is 'DC1': '2', 'DC2': '2'.
Consistency is set to LOCAL_ONE for these queries.

Indeed timeouts might be a problem as some of the nodes in DC2 are under
high load from time to time.
Is there some counter (eg. JMX or so) I could monitor to verify this
assumption.

​​
>
>> - What could we do to investigate the cause of this issue deeper?
>>
>
> Are the hints being successfully delivered? It sounds like not..
>

No, I do not think so. Actually we are not really interested in this data
at DC2, we only replicate them because this table is in that keyspace for
historic reasons.
Seems like we need to migrate that table to a different keyspace. doesn't
it?

Kind regards
Björn


​​
2015-09-23 22:56 GMT+02:00 Robert Coli :

> On Wed, Sep 23, 2015 at 7:28 AM, Björn Hachmann <
> bjoern.hachm...@metrigo.de> wrote:
>
>> Today I realized that one of the nodes in our Cassandra cluster (2.1.7)
>> is storing a lot of hints (>80GB) and I fail to see a convincing way to
>> deal with them.
>> ...
>> We had a look into the table system.hints and from there we learnt that
>> most hints
>> are for one of the nodes in our 2nd datacenter and most of the mutations
>> are
>> increments to one of our counter tables which are very frequent.
>>
>
> This is probably timeouts on the increment creating your hints.
>
>
>> We have several questions:
>> - What could be the reason that only one of the nodes has hints for only
>> one target node, altough every other node should be coordinator for these
>> queries sometimes also?
>>
>
> That sounds unexpected, I don't have a good answer.
>
>
>> - Is there a way to turn of hinted handoff on a table level or on data
>> center level?
>>
>
> No.
> ​​
>
>> - What could we do to investigate the cause of this issue deeper?
>>
>
> Are the hints being successfully delivered? It sounds like not..
>
> =Rob
>
>


Re: Do vnodes need more memory?

2015-09-24 Thread Tom van den Berge
On Thu, Sep 24, 2015 at 12:45 AM, Robert Coli  wrote:

> On Wed, Sep 23, 2015 at 7:09 AM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> So it seems that Cassandra simply doesn't have enough memory. I'm trying
>> to understand if this can be caused by the use of vnodes? Is there an
>> sensible reason why vnodes would consume more memory than regular nodes? Or
>> does any of you have the same experience? If not, I might be barking up the
>> wrong tree here, and I would love to know it before upgrading my servers
>> with more memory.
>>
>
> Yes, range ownership has a RAM/heap cost per-range-owned. This cost is
> paid during many, but not all, operations. Owning 256 ranges > Owning 1
> range.
>
> I have not had the same experience but am not at all surprised to hear
> that vnodes increase heap consumption for otherwise identical
> configurations. I am surprised to hear that it makes a significant
> difference in GC time, but you might have been close enough to heap
> saturation that vnodes tip you over.
>

That's apparently exactly what's going on. We've just increased the memory
from 8 to 16 GB, and all is fine now. This seems to confirm that using
vnodes indeed increase heap consumption significantly. I think it would be
great if this could be advertised in the documentation, as a warning. From
the current documentation, it seems that vnodes don't come at any cost.

What's also interesting is this: Before increasing the memory we have been
changing our code not to use our secondary indexes anymore. We still had a
number of those, and we were suspecting them to be the cause of the
increased heap consumption. It did not eliminate the problem, but it
definitely helped to bring the GC times dramatically. I already knew that
secondary indexes are best not to use, but it seems that using them in
combination with vnodes makes it far worse.


> As an aside, one is likely to win very little net win from vnodes if one's
> cluster is not now and will never be more than approximately 15 nodes.
>

That's a very interesting observation. Especially since vnodes are enabled
by default for some time now, and apparently it has a (heap) price. And my
guess is that a significant percentage of all clusters will never exceed 15
nodes.

Thx,
Tom


Regarding " node tool move "

2015-09-24 Thread Rock Zhang
Hi All,

I want to manually move some token from one hostA to host B, i do not fully
understand this command,


* move- Move node on the token ring to a new token*

Say host A has token: (i got it with nodetool info -T)
Token  : -9096422322933500933
Token  : -8988583730922232407
Token  : -8881261198286236893
Token  : -8811920524626612334

on host A, if i run command  "nodetool move  -9096422322933500933" ,  what
gonna happen ?
Move data associated with token "-9096422322933500933" from where to where ?

If i want to move data on token "-8811920524626612334" to hostB, what
should I do ?


Thanks
Rock


RE: Deserialize the collection type data from the SSTable file

2015-09-24 Thread java8964
Hi, Daniel:
I didn't find any branch related to C* 2.1 in the 
https://github.com/coursera/aegisthus, is there one?
It looks like there are big changes in the C* 2.1 of collections API. Just want 
to know if there is any CQLMapper for C* 2.1 branch. Meantime, I will also try 
to understand more about the new changes in C* 2.1
Thanks
Yong

From: danc...@coursera.org
Date: Wed, 10 Jun 2015 09:11:02 -0700
Subject: Re: Deserialize the collection type data from the SSTable file
To: java8...@hotmail.com
CC: user@cassandra.apache.org

Hi Yong,
Glad the code was helpful. I believe it serializes using List> for maps so that it can store the Key of the map as well.
Thanks for pointing out the edge case!Thanks,Daniel

On Wed, Jun 10, 2015 at 6:39 AM, java8964  wrote:



Thanks, Daniel.
I didn't realize that Cassandra will serialize one more way using 
List> for collection types. Reading your example code, 
I make it work.
>From link you gave me, using my test data, I found out one issue though. In 
>any one row, if the collection column is NULL, I think the logic of code will 
>throw NullPointException on line 148:
Line 145private void addValue(GenericRecord record, CFDefinition.Name name, 
ColumnGroupMap group){
Line 146if (name.type.isCollection()) {
Line 147List> collection = 
group.getCollection(name.name.key);
Line 148ByteBuffer buffer = 
((CollectionType)name.type).serialize(collection);
addCqlCollectionToRecord(record, name, buffer);
If the collection column in that row is NULL, then Line 147 will return NULL, 
which will cause the following exception:
Exception in thread "main" java.lang.NullPointerException   at 
org.apache.cassandra.db.marshal.CollectionType.enforceLimit(CollectionType.java:113)
 at org.apache.cassandra.db.marshal.ListType.serialize(ListType.java:120)
So I need to add a check to avoid that, as any regular columns in Cassandra 
could just have NULL value.
Thanks
Yong
From: danc...@coursera.org
Date: Mon, 8 Jun 2015 15:13:02 -0700
Subject: Re: Deserialize the collection type data from the SSTable file
To: user@cassandra.apache.org

I'm not sure why sstable2json doesn't work for collections, but if you're into 
reading raw sstables we use the following code with good success:
https://github.com/coursera/aegisthus/blob/77c73f6259f2a30d3d8ca64578be5c13ecc4e6f4/aegisthus-hadoop/src/main/java/org/coursera/mapreducer/CQLMapper.java#L85
Thanks,Daniel

On Mon, Jun 8, 2015 at 1:22 PM, java8964  wrote:



Hi, Cassandra users:
I have a question related to how to Deserialize the new collection types data 
in the Cassandra 2.x. (The exactly version is C 2.0.10).
I create the following example tables in the CQLSH:
CREATE TABLE coupon (  account_id bigint,  campaign_id uuid,  
,  discount_info map,  
,  PRIMARY KEY (account_id, campaign_id))
The other columns can be ignored in this case. Then I inserted into the one 
test data like this:
insert into coupon (account_id, campaign_id, discount_info) values (111,uuid(), 
{'test_key':'test_value'});
After this, I got the SSTable files. I use the sstable2json file to check the 
output:
$./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db[{"key": 
"006f","columns": 
[["0336e50d-21aa-4b3a-9f01-989a8c540e54:","",1433792922055000], 
["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info","0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!",1433792922054999,"t",1433792922],
 
["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579","746573745f76616c7565",1433792922055000]]}]
 
What I want to is to get the {"test_key" : "test_value"} as key/value pair that 
I input into "discount_info" column. I followed the sstable2json code, and try 
to deserialize the data by myself, but to my surprise, I cannot make it work, 
even I tried several ways, but kept getting Exception.
>From what I researched, I know that Cassandra put the "campaign_id" + 
>"discount_info" + "Another ByteBuffer" as composite column in this case. When 
>I deserialize this columnName, I got the following dumped out as String:
"0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579".
It includes 3 parts: the first part is the uuid for the campaign_id. The 2nd 
part as "discount_info", which is the static name I defined in the table. The 3 
part is a bytes array as length of 46, which I am not sure what it is. 
The corresponding value part of this composite column is another byte array as 
length of 10, hex as "746573745f76616c7565" if I dump it out.
Now, here is what I did and not sure why it doesn't work. First, I assume the 
value part stores the real value I put in the Map, so I did the following:
ByteBuffer value = ByteBufferUtil.clone(column.value());MapType 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
Map output = result.compose(value);// it gave me the following 
exception: org.apache.cassandra.serializers.MarshalException:

Re: What is your backup strategy for Cassandra?

2015-09-24 Thread Luigi Tagliamonte
Since I'm running on AWS we wrote a script that for each column performs a
snapshot and sync it on S3, and at the end of the script i'm also grabbing
the node tokens and store them on S3.
In case of restore i will use this procedure

.

On Mon, Sep 21, 2015 at 9:23 PM, Sanjay Baronia <
sanjay.baro...@triliodata.com> wrote:

> John,
>
> Yes the Trilio solution is private and today, it is for Cassandra running
> in Vmware and OpenStack environment. AWS support is on the roadmap. Will
> reach out separately to give you a demo after the summit.
>
> Thanks,
>
> Sanjay
>
> _
>
>
>
> *Sanjay Baronia VP of Product & Solutions Management Trilio Data *(c)
> 508-335-2306
> sanjay.baro...@triliodata.com
>
> [image: Trilio-Business Assurance_300 Pixels] 
>
> *Experience Trilio* *in action*, please *click here
> * to request a demo today!
>
>
> From: John Wong 
> Reply-To: Cassandra Maillist 
> Date: Friday, September 18, 2015 at 8:02 PM
> To: Cassandra Maillist 
> Subject: Re: What is your backup strategy for Cassandra?
>
>
>
> On Fri, Sep 18, 2015 at 3:02 PM, Sanjay Baronia <
> sanjay.baro...@triliodata.com> wrote:
>
>>
>> Will be at the Cassandra summit next week if any of you would like a demo.
>>
>>
>>
>
> Sanjay, is Trilio Data's work private? Unfortunately I will not attend the
> Summit, but maybe Trilio can also talk about this in, say, a Cassandra
> Planet blog post? I'd like to see a demo or get a little more technical. If
> open source would be cool.
>
> I didn't implement our solution, but the current solution is based on full
> snapshot copies to a remote server for storage using rsync (only transfers
> what is needed). On our remote server we have a complete backup of every
> hour, so if you cd into the data directory you can get every node's exact
> moment-in-time data like you are browsing on the actual nodes.
>
> We are an AWS shop so we can further optimize our cost by using EBS
> snapshot so the volume can reduce (currently we provisioned 4000GB which is
> too much). Anyway, s3 we tried, and is an okay solution. The bad thing is
> performance plus ability to quickly go back in time. With EBS I can create
> a dozen volumes from the same snapshot, attach each to my each of my node,
> and cp -r files over.
>
> John
>
>>
>> From: Maciek Sakrejda 
>> Reply-To: Cassandra Maillist 
>> Date: Friday, September 18, 2015 at 2:09 PM
>> To: Cassandra Maillist 
>> Subject: Re: What is your backup strategy for Cassandra?
>>
>> On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky  wrote:
>>
>>> This seems like an apt time to quote [1]:
>>>
>>> > Remember that you get 1 point for making a backup and 10,000 points
>>> for restoring one.
>>>
>>> Restoring from backups is my goal.
>>>
>>> The commonly recommended tools (tablesnap, cassandra_snapshotter) all
>>> seem to leave the restore operation as a pretty complicated exercise for
>>> the operator.
>>>
>>> Do any include a working way to restore, on a different host, all of
>>> node X's data from backups to the correct directories, such that the
>>> restored files are in the proper places and the node restart method [2]
>>> "just works"?
>>>
>>
>> As someone getting started with Cassandra, I'm very much interested in
>> this as well. It seems that for the most part, folks seem to rely on
>> replication and node replacement to recover from failures, and perhaps this
>> is a testament for how well this works, but as long as we're hauling out
>> aphorisms, "RAID is not a backup" seems to (partially) apply here too.
>>
>> I'd love to hear more about how the community does restores, too. This
>> isn't complaining about shoddy tooling: this is trying to understand--and
>> hopefully, in time, improve--the status quo re: disaster recovery. E.g.,
>> given that tableslurp operates on a single table at a time, do people
>> normally just restore single tables? Is that used when there's filesystem
>> or disk corruption? Bugs? Other issues? Looking forward to learning more.
>>
>> Thanks,
>> Maciek
>>
>
>


-- 
Luigi
---
“The only way to get smarter is by playing a smarter opponent.”


Re: Regarding " node tool move "

2015-09-24 Thread Robert Coli
On Thu, Sep 24, 2015 at 1:29 AM, Rock Zhang  wrote:

> on host A, if i run command  "nodetool move  -9096422322933500933" ,  what
> gonna happen ?
> Move data associated with token "-9096422322933500933" from where to where
> ?
>

When you change the token of a node, you change the "range" it is a primary
replica for.

Replicas for the range being "lost" on that node and adjacent nodes stream
their data to nodes "gaining" that data/range.

Be sure to run nodetool cleanup after any move operation.

=Rob


Re: Seeing null pointer exception 2.0.14 after purging gossip state

2015-09-24 Thread Robert Coli
On Mon, Sep 14, 2015 at 7:53 PM, K F  wrote:

> I have cassandra 2.0.14 deployed and after following the method described
> in Apache Cassandra™ 2.0
> 
>  to
> clear the gossip state of the node in one of the dc of my cluster
>

Why did you need to do this?

 I see wierd exception on the nodes not many but a few in an hour for nodes
> that have already successfully decommissioned from the cluster, you can see
> from below exception that 10.0.0.1 has been already decommissioned. Below
> is the exception snippet.
>

Have you done :

nodetool gossipinfo |grep SCHEMA |sort | uniq -c | sort -n

and checked for schema agreement... ?

=Rob


How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?

2015-09-24 Thread Benyi Wang
I have a cassandra cluster provides data to a web service. And there is a
daily batch load writing data into the cluster.

   - Without the batch loading, the service’s Latency 99thPercentile is
   3ms. But during the load, it jumps to 90ms.
   - I checked cassandra keyspace’s ReadLatency.99thPercentile, which jumps
   to 1ms from 600 microsec.
   - The service’s cassandra java driver request 99thPercentile was 90ms
   during the load

The java driver took the most time. I knew the Cassandra servers are busy
in writing, but I want to know what kinds of metrics can identify where is
the bottleneck so that I can tune it.

I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5.
​


Re: How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?

2015-09-24 Thread Gerard Maas
How are you loading the data? I mean, what insert method are you using?

On Thu, Sep 24, 2015 at 9:58 PM, Benyi Wang  wrote:

> I have a cassandra cluster provides data to a web service. And there is a
> daily batch load writing data into the cluster.
>
>- Without the batch loading, the service’s Latency 99thPercentile is
>3ms. But during the load, it jumps to 90ms.
>- I checked cassandra keyspace’s ReadLatency.99thPercentile, which
>jumps to 1ms from 600 microsec.
>- The service’s cassandra java driver request 99thPercentile was 90ms
>during the load
>
> The java driver took the most time. I knew the Cassandra servers are busy
> in writing, but I want to know what kinds of metrics can identify where is
> the bottleneck so that I can tune it.
>
> I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5.
> ​
>


Re: Throttling Cassandra Load

2015-09-24 Thread Anuj Wadehra
Can Robert or anyone else want to take this and express thier views on our 
throttling approach ? 


As I said, we would be moving to CQL Driver incrementally but that would take 
several months.So, we are trying to throttle Cassandra load based on rpc thrift 
interface and Hector as of now. Later we can think of applying similar 
throttling with native protocol. Yes CQL driver may provide us some advanced 
properties for tuning connection pooling and timeout idle connections.



Thanks

Anuj

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" 
Date:Wed, 23 Sep, 2015 at 9:03 am
Subject:Re: Throttling Cassandra Load

Hi Robert,


We will be moving to CQL incrementally. But that would take some time (at least 
6 mths). Till then we need a solution for throttling load. Can you comment on 
the approach mentioned and any better ideas for achieving that?




Thanks

Anuj

Sent from Yahoo Mail on Android

From:"Robert Coli" 
Date:Wed, 23 Sep, 2015 at 2:43 am
Subject:Re: Throttling Cassandra Load

On Tue, Sep 22, 2015 at 1:06 PM, Anuj Wadehra  wrote:

We are using Cassandra 2.0.14 with Hector 1.1.4. Each node in cluster has an 
application using Hector and a Cassandra instance.


Why are you using Hector?


=Rob

 



Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

2015-09-24 Thread Robert Coli
On Tue, Sep 15, 2015 at 9:42 AM, Nate McCall  wrote:

> Either way, you are going to have to run nodetool scrub. I'm not sure if
> it's better to do this from 2.1.8 or from 2.1.9 with "disk_failure_policy:
> ignore"
>

A node which has lost a SSTable also needs to be repaired immediately. If
it is not repaired before being brought back into the cluster, there are
cases where it can poison consistency on other nodes. For example, perhaps
the SSTable you lost contained the only copy of a tombstone, and the row is
now unmasked.

=Rob


Re: Currupt sstables when upgrading from 2.1.8 to 2.1.9

2015-09-24 Thread Robert Coli
On Thu, Sep 24, 2015 at 3:00 PM, Robert Coli  wrote:

> A node which has lost a SSTable also needs to be repaired immediately.
>

Forgot to mention, you can repair via this technique :

https://issues.apache.org/jira/browse/CASSANDRA-6961

=Rob


Re: To batch or not to batch: A question for fast inserts

2015-09-24 Thread Eric Stevens
> I side-tracked some punctual benchmarks and stumbled on the observations
of unlogged inserts being *A LOT* faster than the async counterparts.

My own testing agrees very strongly with this.  When this topic came up on
this list before, there was a concern that batch coordination produces GC
pressure in your cluster because you're involving nodes which aren't *strictly
speaking* necessary to be involved.

Our own testing shows some small impact on this front, but really
lightweight GC tuning mitigated the effects by putting a little more room
in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what
we run in production) we weren't able to measure a difference.

Our testing shows data loads being as much as 5x to 8x faster when using
small concurrent batches over using single statements concurrently.  We
tried three different concurrency models.

To save on coordinator overhead, we group the statements in our "batch" by
replica (using the functionality exposed by the DataStax Java driver), and
do essentially token aware batching.  This still has a *small* amount of
additional coordinator overhead (since the data size of the unit of work is
larger, and sits in memory in the coordinator longer).  We've been running
this way successfully for months with *sustained* rates north of 50,000
mutates per second.  We burst *much* higher.

Through trial and error we determined we got diminishing returns in the
realm of 100 statements per token-aware batch.  It looks like your own data
bears that out as well.  I'm sure that's workload dependent though.

I've been disagreed with on this topic in this list in the past despite the
numbers I was able to post.  Nobody has shown me numbers (nor anything else
concrete) that contradict my position though, so I stand by it.  There's no
question in my mind, if your mutates are of any significant volume and you
care about the performance of them, token aware unlogged batching is the
right strategy.  When we reduce our batch sizes or switch to single async
statements, we fall over immediately.

On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas  wrote:

> General advice advocates for individual async inserts as the fastest way
> to insert data into Cassandra. Our insertion mechanism is based on that
> model and recently we have been evaluating performance, looking to measure
> and optimize our ingestion rate.
>
> I side-tracked some punctual benchmarks and stumbled on the observations
> of unlogged inserts being *A LOT* faster than the async counterparts.
>
> In our tests, unlogged batch shows increased throughput and lower cluster
> CPU usage, so I'm wondering where the tradeoff might be.
>
> I compiled those observations in this document that I'm sharing and
> opening up for comments.  Are we observing some artifact or should we set
> the record straight for unlogged batches to achieve better insertion
> throughput?
>
>
> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI
>
> Let me know.
>
> Kind regards,
>
> Gerard.
>


Re: How to tune Cassandra or Java Driver to get lower latency when there are a lot of writes?

2015-09-24 Thread Benyi Wang
I use Spark and spark-cassandra-connector with a customized Cassandra
writer (spark-cassandra-connector doesn’t support DELETE). Basically the
writer works as follows:

   - Bind a row in Spark RDD with either INSERT/Delete PreparedStatement
   - Create a BatchStatement for multiple rows
   - Write to Cassandra.

I knew using CQLBulkOutputFormat would be better, but it doesn't supports
DELETE.
​

On Thu, Sep 24, 2015 at 1:27 PM, Gerard Maas  wrote:

> How are you loading the data? I mean, what insert method are you using?
>
> On Thu, Sep 24, 2015 at 9:58 PM, Benyi Wang  wrote:
>
>> I have a cassandra cluster provides data to a web service. And there is a
>> daily batch load writing data into the cluster.
>>
>>- Without the batch loading, the service’s Latency 99thPercentile is
>>3ms. But during the load, it jumps to 90ms.
>>- I checked cassandra keyspace’s ReadLatency.99thPercentile, which
>>jumps to 1ms from 600 microsec.
>>- The service’s cassandra java driver request 99thPercentile was 90ms
>>during the load
>>
>> The java driver took the most time. I knew the Cassandra servers are busy
>> in writing, but I want to know what kinds of metrics can identify where is
>> the bottleneck so that I can tune it.
>>
>> I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5.
>> ​
>>
>
>


Re: Unable to remove dead node from cluster.

2015-09-24 Thread Dikang Gu
@Jeff, I just use jmx connect to one node, run the unsafeAssainateEndpoint,
and pass in the "10.210.165.55" ip address.

Yes, we have hundreds of other nodes in the nodetool status output as well.

On Tue, Sep 22, 2015 at 11:31 PM, Jeff Jirsa 
wrote:

> When you run unsafeAssassinateEndpoint, to which host are you connected,
> and what argument are you passing?
>
> Are there other nodes in the ring that you’re not including in the
> ‘nodetool status’ output?
>
>
> From: Dikang Gu
> Reply-To: "user@cassandra.apache.org"
> Date: Tuesday, September 22, 2015 at 10:09 PM
> To: cassandra
> Cc: "d...@cassandra.apache.org"
> Subject: Re: Unable to remove dead node from cluster.
>
> ping.
>
> On Mon, Sep 21, 2015 at 11:51 AM, Dikang Gu  wrote:
>
>> I have tried all of them, neither of them worked.
>> 1. decommission: the host had hardware issue, and I can not connect to it.
>> 2. remove, there is not HostID, so the removenode did not work.
>> 3. unsafeAssassinateEndpoint, it will throw NPE as I pasted before, can
>> we fix it?
>>
>> Thanks
>> Dikang.
>>
>> On Mon, Sep 21, 2015 at 11:11 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> Order is decommission, remove, assassinate.
>>>
>>> Which have you tried?
>>> On Sep 21, 2015 10:47 AM, "Dikang Gu"  wrote:
>>>
 Hi there,

 I have a dead node in our cluster, which is a wired state right now,
 and can not be removed from cluster.

 The nodestatus shows:
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  OwnsHost ID
   Rack
 DN  10.210.165.55?  256 ?   null
r1

 I tried the unsafeAssassinateEndpoint, but got exception like:
 2015-09-18_23:21:40.79760 INFO  23:21:40 InetAddress /10.210.165.55 is
 now DOWN
 2015-09-18_23:21:40.80667 ERROR 23:21:40 Exception in thread
 Thread[GossipStage:1,5,main]
 2015-09-18_23:21:40.80668 java.lang.NullPointerException: null
 2015-09-18_23:21:40.80669   at
 org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1584)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80669   at
 org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1592)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80670   at
 org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1822)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80671   at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1495)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80671   at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2121)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80672   at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80673   at
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1113)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80673   at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80673   at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
 ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
 2015-09-18_23:21:40.80674   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 ~[na:1.7.0_45]
 2015-09-18_23:21:40.80674   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 ~[na:1.7.0_45]
 2015-09-18_23:21:40.80674   at
 java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_45]
 2015-09-18_23:21:40.85812 WARN  23:21:40 Not marking nodes down due to
 local pause of 10852378435 > 50

 Any suggestions about how to remove it?
 Thanks.

 --
 Dikang


>>
>>
>> --
>> Dikang
>>
>>
>
>
> --
> Dikang
>
>


-- 
Dikang