Thanks Mehmet and Erick,
I don't have any monitoring other than nodetool but I manage to see
some disk errors cause exceptions.
I changed faulty disk and performance ok now.
Regards,
Osman
On Sun, 5 Apr 2020 at 03:17, Erick Ramirez wrote:
>
> With only 2 replicas per DC, it means you're likely
With only 2 replicas per DC, it means you're likely writing with a
consistency level of either ONE or LOCAL_ONE. Everytime you hit the
problematic node, the write performance drops. All other configurations
being equal, this indicates an issue with the commitlog disk on the node.
Get your sysadmin
Hi Osman,Do you use any monitoring solution such as prometheus on your cluster?
If yes, you should install and use cassandra exporter from the link below and
examine some detailed metrics.https://github.com/criteo/cassandra_exporter
ndroid’de Yahoo Postadan gönderildi
15:53’’4e’ 4 Nis 2020
Hello,
I manage one cluster with 2 dc, 7 nodes each and replication factor is 2:2
My insertion performance dropped somehow.
I restarted nodes one by one and found one node degrades performance.
Verified this node after problem occurs a couple of times.
How can I continue to investigate?
Regards,
Hello,
I'm using jmx metric node
org_apache_cassandra_net_failuredetector_downendpointcount to monitor
number of Cassandra nodes down. For any reason (aws schedule retirement)
we decommission Cassandra node this metric shows the node down for 72 hours
until the gossip clearead. We want
9, 2019 10:56 AM
To: user@cassandra.apache.org
Subject: Re: Jmx metrics shows node down
Is there workaround to shorten 72 hours to something shorter?(you said by
default, wondering if one can set a non-default value?)
Thanks,
Yuping
On Jul 29, 2019, at 7:28 AM, Oleksandr Shulgin
mailto:olek
Is there workaround to shorten 72 hours to something shorter?(you said by
default, wondering if one can set a non-default value?)
Thanks,
Yuping
On Jul 29, 2019, at 7:28 AM, Oleksandr Shulgin
wrote:
> On Mon, Jul 29, 2019 at 1:21 PM Rahul Reddy wrote:
>
> Decommissioned 2 nodes from clust
We have the same issue. We observed the JMX only cleared after exactly 72 hours
too.
On Jul 29, 2019, at 11:23 AM, Rahul Reddy wrote:
And also system.peers table doesn't have the information on old nodes only
ghost nodes to be there in JMX
> On Mon, Jul 29, 2019, 7:39 AM Rahul Reddy wrote:
And also system.peers table doesn't have the information on old nodes only
ghost nodes to be there in JMX
On Mon, Jul 29, 2019, 7:39 AM Rahul Reddy wrote:
> We removed many times nodes from a cluster but never seen the jmx metric
> down stay for 72 hours. So it has to be completely removed from
We removed many times nodes from a cluster but never seen the jmx metric
down stay for 72 hours. So it has to be completely removed from gossip to
show the metric as expected? This would be problem for using the metric to
alert on call
On Mon, Jul 29, 2019, 7:28 AM Oleksandr Shulgin <
oleksandr.sh
On Mon, Jul 29, 2019 at 1:21 PM Rahul Reddy
wrote:
>
> Decommissioned 2 nodes from cluster nodetool status doesn't list the
> nodes as expected but jmx metrics shows still those 2 nodes has down.
> Nodetool gossip shows the 2 nodes in Left state. Why does my jmx still
> shows those nodes down ev
Hello,
Decommissioned 2 nodes from cluster nodetool status doesn't list the nodes
as expected but jmx metrics shows still those 2 nodes has down. Nodetool
gossip shows the 2 nodes in Left state. Why does my jmx still shows those
nodes down even after 24 hours. Cassandra version 3.11.3 ? Anything
Being one of our largest and unfortunately heaviest multi-tenant clusters,
and our last 2.1 prod cluster, we are encountering not enough replica
errors (need 2, only found 1) after only bringing down 1 node. 90 node
cluster, 30/dc, dcs are in europe, asia, and us. AWS.
Are there bugs for erroneous
CQL Copy command will not work in case if you are trying to copy from all
NODES because COPY command will check all N nodes UP and RUNNING Status.
If you want to complete then you have 2 options:-
1) Remove DOWN NODE from COPY command
2) Make it UP and NORMAL status.
On Mon, Jul 2, 2018 at 9:15
Hi,
The error shows that, the cqlsh connection with down node is failed.
So, you should debug why it happened.
Although, you have mentioned other node in cqlsh command '10.0.0.154'
my guess is, the down node was present in connection pool, hence it was
attempted for connection.
Ideally the avail
Hello!
I have cassandra cluster with 5 nodes.
There is a (relatively small) keyspace X with RF5.
One node goes down.
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host
ID Rack
UN 10.0.0.82 253.64 M
@cassandra.apache.org
Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster
If you think that will fix the problem, maybe you could add a little more
memory to each machine as a short term fix.
From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com]
Sent: Wednesday, March 28, 2018 5:24 AM
To: user
If you think that will fix the problem, maybe you could add a little more
memory to each machine as a short term fix.
From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com]
Sent: Wednesday, March 28, 2018 5:24 AM
To: user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes
Brotman
发送时间: 2018年3月28日 20:16
收件人: user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster
David,
Did you figure out what to do about the data model problem? It could be that
your data files finally grow to the point that the data model problem caused
the Java heap
model.
Kenneth Brotman
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com]
Sent: Wednesday, March 28, 2018 4:46 AM
To: 'user@cassandra.apache.org'
Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster
Was any change to hardware done around the time the problem star
-dt.com]
Sent: Wednesday, March 28, 2018 4:40 AM
To: user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster
Hi Kenneth,
The cluster has been running for 4 months,
The problem occurred from last week,
Best Regards,
倪项菲/ David Ni
中移德电网络科技有限公司
: Kenneth Brotman
发送时间: 2018年3月28日 19:34
收件人: user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster
David,
How long has the cluster been operating?
How long has the problem been occurring?
Kenneth Brotman
From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Tuesday, March 27
David,
How long has the cluster been operating?
How long has the problem been occurring?
Kenneth Brotman
From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Tuesday, March 27, 2018 7:00 PM
To: Xiangfei Ni
Cc: user@cassandra.apache.org
Subject: Re: 答复: 答复: A node down every day in a 6
Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>
> 发件人: Jeff Jirsa
> 发送时间: 2018年3月27日 11:03
> 收件人: user@cassandra.apache.org
> 主题: Re: A node down every day in
,Wuhan,HuBei
Mob: +86 13797007811|Tel: + 86 27 5024 2516
发件人: Xiangfei Ni
发送时间: 2018年3月28日 9:45
收件人: Jeff Jirsa
抄送: user@cassandra.apache.org
主题: 答复: 答复: A node down every day in a 6 nodes cluster
Hi Jeff,
Today another node was shutdown,I have attached the exception log
file,could you
: + 86 27 5024 2516
发件人: Jeff Jirsa
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster
Only one node having the problem is suspicious. May be that your application is
improperly pooling connections, or you have a hardware
m/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html
>
> Kenneth Brotman
>
>
> From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com]
> Sent: Tuesday, March 27, 2018 5:44 AM
> To: user@cassandra.apache.org
> Subject: Re:RE: 答复: A node down every day in a 6 nodes clu
/operations/opsReplaceLiveNode.html
Kenneth Brotman
From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com]
Sent: Tuesday, March 27, 2018 5:44 AM
To: user@cassandra.apache.org
Subject: Re:RE: 答复: A node down every day in a 6 nodes cluster
Thanks,Kenneth,this is production database,and it is
[mailto:xiangfei...@cm-dt.com]
Sent: Tuesday, March 27, 2018 3:27 AM
To: Jeff Jirsa
Cc: user@cassandra.apache.org
Subject: 答复: 答复: A node down every day in a 6 nodes cluster
Thanks Jeff,
So your suggestion is to first resolve the data model issue which
cause wide partition,right?
Best Regards
David,
Can you replace the misbehaving node to see if that resolves the problem?
Kenneth Brotman
From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com]
Sent: Tuesday, March 27, 2018 3:27 AM
To: Jeff Jirsa
Cc: user@cassandra.apache.org
Subject: 答复: 答复: A node down every day in a 6 nodes
5024 2516
发件人: Jeff Jirsa
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster
Only one node having the problem is suspicious. May be that your application is
improperly pooling connections, or you have a hardware problem.
I
@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster
Only one node having the problem is suspicious. May be that your application is
improperly pooling connections, or you have a hardware problem.
I dont see anything in nodetool that explains it, though you certainly have a
data
27 5024 2516
发件人: daemeon reiydelle
发送时间: 2018年3月27日 11:42
收件人: user
主题: Re: 答复: A node down every day in a 6 nodes cluster
Look for errors on your network interface. I think you have periodic errors in
your network connectivity
<==>
"Who do you think made the first stone
+86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516
> <+86%2027%205024%202516>
>
>
>
> *发件人:* Jeff Jirsa
> *发送时间:* 2018年3月27日 11:03
> *收件人:* user@cassandra.apache.org
> *主题:* Re: A node down every day in a 6 nodes cluster
>
>
>
> That wa
lt;+86%2027%205024%202516>
>
>
>
> *发件人:* Jeff Jirsa
> *发送时间:* 2018年3月27日 11:03
> *收件人:* user@cassandra.apache.org
> *主题:* Re: A node down every day in a 6 nodes cluster
>
>
>
> That warning isn’t sufficient to understand why the node is going down
>
>
>
That warning isn’t sufficient to understand why the node is going down
Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is
likely a good idea
Are the nodes coming up on their own? Or are you restarting them?
Paste the output of nodetool tpstats and nodetool cfstats
Hi Cassandra experts,
I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster
is just in one DC,
Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is
3,a node downs one tim
Hi John,
the other main source of STW pause in the JVM is the safepoint mechanism :
http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html
If you turn on full GC logging in your cassandra-env.sh file, you will find
lines like this :
2017-10-09T20:13:42.462+: 4.890: Total time for wh
I have a small, two-node cluster running Cassandra 2.2.1. I am seeing a lot
of these messages in both logs:
WARN 07:23:16 Not marking nodes down due to local pause of 7219277694 >
50
I am fairly certain that they are not due to GC. I am not seeing a whole of
GC being logged and nothing o
On Tue, Dec 23, 2014 at 12:29 AM, Jiri Horky wrote:
> just a follow up. We've seen this behavior multiple times now. It seems
> that the receiving node loses connectivity to the cluster and thus
> thinks that it is the sole online node, whereas the rest of the cluster
> thinks that it is the only
Hi,
just a follow up. We've seen this behavior multiple times now. It seems
that the receiving node loses connectivity to the cluster and thus
thinks that it is the sole online node, whereas the rest of the cluster
thinks that it is the only offline node, really just after the streaming
is over. I
Hi list,
we added a new node to existing 8-nodes cluster with C* 1.2.9 without
vnodes and because we are almost totally out of space, we are shuffling
the token fone node after another (not in parallel). During one of this
move operations, the receiving node died and thus the streaming failed:
W
t; The number of in flight hints is greater than…
>>
>>private static volatile int maxHintsInProgress = 1024 *
>> Runtime.getRuntime().availableProcessors();
>>
>> You may be able to work around this by reducing the max_hint_window_in_ms
>> in the yaml file
s();
>
> You may be able to work around this by reducing the max_hint_window_in_ms
> in the yaml file so that hints are recorded if say the node has been down
> for more than 1 minute.
>
> Anyways I would say your test showed that the current cluster does not
> have sufficie
Anyways I would say your test showed that the current cluster does not have
sufficient capacity to handle the write load with one node down and HH enabled
at the current level. You can either add more nodes, use nodes with more cores,
adjust the HH settings, or reduce the throughput.
>>
On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir wrote:
> Do you have a suggestion as to what could be a better fit for counters?
> Something that can also replicate across DCs and survive link breakdown
> between nodes (across DCs)? (and no, I don't need 100.00% precision
> (although it would be ni
>
>> We wanted to test what happens if one node goes down, so we brought one
>> node
>> down in DC1 (i.e. the node that was handling half of the incoming
>> writes).
>> ...
>> This led to a complete explosion of logs on the remaining alive node in
>
l.
> We wanted to test what happens if one node goes down, so we brought one node
> down in DC1 (i.e. the node that was handling half of the incoming writes).
> ...
> This led to a complete explosion of logs on the remaining alive node in DC1.
I agree, this level of exception logging dur
backup). In total there's 100 separate clients executing 1-2 batch updates
per second.
We wanted to test what happens if one node goes down, so we brought one node
down in DC1 (i.e. the node that was handling half of the incoming writes).
This led to a complete explosion of logs on the remaining
on?
>
> Thanks!
>
> Rene
>
> From: aaron morton [mailto:aa...@thelastpickle.com]
> Sent: woensdag 1 februari 2012 21:03
> To: user@cassandra.apache.org
> Subject: Re: Node down
>
> Without knowing too much more information I would try this…
>
> * R
ring view". Can it be that this stored ring view was out
of sync with the actual (gossip) situation?
Thanks!
Rene
From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: woensdag 1 februari 2012 21:03
To: user@cassandra.apache.org
Subject: Re: Node down
Without knowing too much more inf
Without knowing too much more information I would try this…
* Restart node each node in turn, watch the logs to see what it says about the
other.
* If that restart did not fix it, try using the
Dcassandra.load_ring_state=false JVM option when starting the node. That will
tell it to ignore it'
I have a cluster with seven nodes.
If I run the node-tool ring command on all nodes, I see the following:
Node1 says that node2 is down.
Node 2 says that node1 is down.
All other nodes say that everyone is up.
Is this normal behavior?
I see no network related problems. Also no problems between
> Thank you for your explanations. Even with a RF=1 and one node down I don't
> understand why I can't at least read the data in the nodes that are still
> up?
You will be able to read data for row keys that do not live on the
node that is down. But for any request to a row w
Hi Peter,
Thank you for your explanations. Even with a RF=1 and one node down I don't
understand why I can't at least read the data in the nodes that are still
up? Also, why can't I at least perform writes with consistency level ANY and
failover policy ON_FAIL_TRY_ALL_AVAILABLE.
> If you want to survive node failures, use an RF above 1. And then make
> sure to use an appropriate consistency level.
To elaborate a bit: RF, or replication factor, is the *total* number
of copies of any piece of data in the cluster. So with only one copy,
the data will not be available when a
> took a node down to see how it behaves. All of a sudden I couldn't write or
[snip]
> me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be
[snip]
> Default replication factor = 1
So you have an RF=1 cluster (only one copy of data) and you bring a
n
Hi guys,
It's interesting to see this thread. I recently discovered a similar
problem on my 3 node Cassandra 0.8.5 cluster. It was working fine, then I
took a node down to see how it behaves. All of a sudden I couldn't write or
read because of this exception being thrown:
Exception
I'm currently having a similar problem with a 2-node cluster. When 1
> shutdown
> >> one of the nodes, the other isn't responding any more.
> >>
> >> Did you found a solution for your problem?
> >>
> >> /I'm new to mailing lists, if i
problem?
>>
>> /I'm new to mailing lists, if it's inappropriate to reply here, please let
>> me know../
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html
>> http://cassandra-user-inc
t; /I'm new to mailing lists, if it's inappropriate to reply here, please
> let
> > me know../
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html
> >
> http://cassandra-user-incubator-
the nodes, the other isn't responding any more.
>
> Did you found a solution for your problem?
>
> /I'm new to mailing lists, if it's inappropriate to reply here, please let
> me know../
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-
Thank you very much Jake! It solved the problem. All reads and writes are
working now.
Have a nice day!
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936947.html
Sent from the cassandra-u
e in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936912.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>
--
http://twitter.com/tjake
I'm reading with: cassandra_ConsistencyLevel::ANY (phpcassa lib). Is there
any way to verify that all the nodes know that they are RF=2 ?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-fa
exception
> 'cassandra_UnavailableException'
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936869.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>
The error I currently see when I take down node B:
Error performing get_indexed_slices on NODE A IP:9160: exception
'cassandra_UnavailableException'
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overa
82254385124880979556330753059704699
> IP-Of-Node-Adatacenter1 rack1 Up Normal 2.73 MB
> 55.00% 167057712653383445280042298172156091026
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-do
andra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html
--
View this message in context:
http://cassandra-user-incubato
80042298172156091026
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-tp6936722p6936722.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
And fixed! a co-worker put in a bad host line entry last night that through it
all off :( Thanks for the assist guys.
--
Ray Slakinski
On Wednesday, July 13, 2011 at 1:32 PM, Ray Slakinski wrote:
> Was all working before, but we ran out of file handles and ended up
> restarting the nodes. No
Was all working before, but we ran out of file handles and ended up restarting
the nodes. No yaml changes have occurred.
Ray Slakinski
On 2011-07-13, at 12:55 PM, Sasha Dolgy wrote:
> any firewall changes? ping is fine ... but if you can't get from
> node(a) to nodes(n) on the specific ports
any firewall changes? ping is fine ... but if you can't get from
node(a) to nodes(n) on the specific ports...
On Wed, Jul 13, 2011 at 6:47 PM, samal wrote:
> Check seed ip is same in all node and should not be loopback ip on cluster.
>
> On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski
> wrote:
>
Check seed ip is same in all node and should not be loopback ip on cluster.
On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski wrote:
> One of our nodes, which happens to be the seed thinks its Up and all the
> other nodes are down. However all the other nodes thinks the seed is down
> instead. The l
One of our nodes, which happens to be the seed thinks its Up and all the other
nodes are down. However all the other nodes thinks the seed is down instead.
The logs for the seed node show everything is running as it should be. I've
tried restarting the node, turning on/off gossip and thrift and
; Sent: Monday, May 23, 2011 6:42 PM
> To: user@cassandra.apache.org
> Subject: Re: Reboot, now node down 0.8rc1
>
> You could have removed the affected commit log file and then run a
> nodetool repair after the node had started.
>
> It would be handy to have some more context for the proble
3, 2011 6:42 PM
> To: user@cassandra.apache.org
> Subject: Re: Reboot, now node down 0.8rc1
>
> You could have removed the affected commit log file and then run a
> nodetool repair after the node had started.
>
> It would be handy to have some more context for the problem. Was this
@cassandra.apache.org
Subject: Re: Reboot, now node down 0.8rc1
You could have removed the affected commit log file and then run a
nodetool repair after the node had started.
It would be handy to have some more context for the problem. Was this an
upgrade from 0.7 or a fresh install?
If you are
letely
> what the commitlog is?
>
>
> Scott
>
>
> -Original Message-
> From: Scott McPheeters [mailto:smcpheet...@healthx.com]
> Sent: Monday, May 23, 2011 2:18 PM
> To: user@cassandra.apache.org
> Subject: Reboot, now node down 0.8rc1
>
> I have a
n the node and bring it back? Or am I missing completely
what the commitlog is?
Scott
-Original Message-
From: Scott McPheeters [mailto:smcpheet...@healthx.com]
Sent: Monday, May 23, 2011 2:18 PM
To: user@cassandra.apache.org
Subject: Reboot, now node down 0.8rc1
I have a test node s
I have a test node system running release 0.8rc1. I rebooted node3 and
now Cassandra is failing on startup.
Any ideas? I am not sure where to begin.
Debian 6, plenty of disk space, Cassandra 0.8rc1
INFO 13:48:58,192 Creating new commitlog segment
/home/cassandra/commitlog/CommitLog-130617293
If the node is crashing with OutOfMemory it will be in the cassandra logs.
Search them for "ERROR". Alternatively if you've installed a package the stdout
and stderr may be redirected to a file called something like output.log in the
same location as the log file.
You can change the logging usi
I have a test cluster with 3 nodes, earlier I've installed OpsCenter to
watch my cluster. Every day I see, that the same one node goes down (at
different time, but every day). Then I just run `service cassandra start` to
fix that problem. system.log doesn't show me anything strange. What are the
st
eport it being down. We are running a 9 node cluster with RF=3, all reads and writes at quorum. I was making the same assumption you are, that an operation would complete fine at quorum with only one node down since the other two nodes would be able to respond.
JustinOn Wed, Sep 29, 2010 at 5:
It seems to be about 15 seconds after killing a node before the other nodes
report it being down.
We are running a 9 node cluster with RF=3, all reads and writes at quorum.
I was making the same assumption you are, that an operation would complete
fine at quorum with only one node down since the
Ah, that was not exactly what you were after. I do not know how long it takes gossip / failure detector to detect a down node. In your case what is the CF you're using for reads and what is your RF? The hope would be that taking one node down at a time would leave enough server running to
:15 AM, Justin Sanders wrote:I looked through the documentation but couldn't find anything. I was wondering if there is a way to manually mark a node "down" in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no longer up.
The
I looked through the documentation but couldn't find anything. I was
wondering if there is a way to manually mark a node "down" in the cluster
instead of killing the cassandra process and letting the other nodes figure
out the node is no longer up.
The reason I ask is because w
Coordination in a distributed system is difficult. I don't think we
can fix HH's existing edge cases, without introducing other more
complicated edge cases.
So weekly-or-so repair will remain a common maintenance task for the
forseeable future.
On Wed, Jul 14, 2010 at 4:17 PM, B. Todd Burruss w
thx, but disappointing :)
is this just something we have to live with and periodically "repair"
the nodes? or is there future work to tighten up the window?
thx
On Wed, 2010-07-14 at 12:13 -0700, Jonathan Ellis wrote:
> On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss wrote:
> > there is a wi
On Wed, Jul 14, 2010 at 1:43 PM, B. Todd Burruss wrote:
> there is a window of time from when a node goes down and when the rest
> of the cluster actually realizes that it is down.
>
> what happens to writes during this time frame? does hinted handoff
> record these writes and then "handoff" when
there is a window of time from when a node goes down and when the rest
of the cluster actually realizes that it is down.
what happens to writes during this time frame? does hinted handoff
record these writes and then "handoff" when the down node returns? or
does hinted handoff not kick in until
; >
>> > Sent from my iPhone.
>> >
>> > On 2010-07-01, at 1:39 AM, Benjamin Black wrote:
>> >
>> >> .QUORUM or .ALL (they are the same with RF=2).
>> >>
>> >> On Wed, Jun 30, 2010 at 10:22 PM, James Golick
>> >> wrote:
>>
> > Oops. I meant to say that I'm reading with CL.ONE.
> >
> > J.
> >
> > Sent from my iPhone.
> >
> > On 2010-07-01, at 1:39 AM, Benjamin Black wrote:
> >
> >> .QUORUM or .ALL (they are the same with RF=2).
> >>
> >>
Black wrote:
>
>> .QUORUM or .ALL (they are the same with RF=2).
>>
>> On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote:
>>> 4 nodes, RF=2, 1 node down.
>>> How can I get an UnavailableException in that scenario?
>>> - J.
>
--
Jonatha
Oops. I meant to say that I'm reading with CL.ONE.
J.
Sent from my iPhone.
On 2010-07-01, at 1:39 AM, Benjamin Black wrote:
> .QUORUM or .ALL (they are the same with RF=2).
>
> On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote:
>> 4 nodes, RF=2, 1 node down.
.QUORUM or .ALL (they are the same with RF=2).
On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote:
> 4 nodes, RF=2, 1 node down.
> How can I get an UnavailableException in that scenario?
> - J.
4 nodes, RF=2, 1 node down.
How can I get an UnavailableException in that scenario?
- J.
98 matches
Mail list logo