James,
Thanks for sharing. Anyway, good to know there's one more thing to add to
the checklist.
On Sun, Jan 17, 2016 at 12:23 PM, James Griffin <
james.grif...@idioplatform.com> wrote:
> Hi all,
>
> Just to let you know, we finally figured this out on Friday. It turns out
> the new nodes had an
Hi all,
Just to let you know, we finally figured this out on Friday. It turns out
the new nodes had an older version of the kernel installed. Upgrading the
kernel solved our issues. For reference, the "bad" kernel was
3.2.0-75-virtual, upgrading to 3.2.0-86-virtual resolved the issue. We
still don
James,
I may miss something. You mentioned your cluster had RF=3. Then why does
"nodetool status" show each node owns 1/3 of the data especially after a
full repair?
On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
james.grif...@idioplatform.com> wrote:
> Hi Kai,
>
> Below - nothing going on tha
Hi Kai,
Well observed - running `nodetool status` without specifying keyspace does
report ~33% on each node. We have two keyspaces on this cluster - if I
specify either of them the ownership reported by each node is 100%, so I
believe the repair completed successfully.
Best wishes,
Griff
[image
Hi Kai,
Below - nothing going on that I can see
$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive Pending Completed
Commandsn/a 0
James,
Can you post the result of "nodetool netstats" on the bad node?
On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
james.grif...@idioplatform.com> wrote:
> A summary of what we've done this morning:
>
>- Noted that there are no GCInspector lines in system.log on bad node
>(there are
A summary of what we've done this morning:
- Noted that there are no GCInspector lines in system.log on bad node
(there are GCInspector logs on other healthy nodes)
- Turned on GC logging, noted that we had logs which stated out total
time for which application threads were stopped was
Ok. I saw dropped mutations on your cluster and full gc is a common cause for
that.Can you just search the word GCInspector in system.log and share the
frequency of minor and full gc. Moreover, are you printing promotion failures
in gc logs?? Why full gc ia getting triggered??promotion failures
I think I was incorrect in assuming GC wasn't an issue due to the lack of
logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
differences, though
comparing the startup flags on the two machines show the GC config is
identical.:
$ jstat -gcutil
S0 S1 E O P Y
Node 2 has slightly higher data but that should be ok. Not sure how read ops
are so high when no IO intensive activity such as repair and compaction is
running on node 3.May be you can try investigating logs to see whats happening.
Others on the mailing list could also share their views on the si
Hi Anuj,
Below is the output of nodetool status. The nodes were replaced following
the instructions in Datastax documentation for replacing running nodes
since the nodes were running fine, it was that the servers had been
incorrectly initialised and they thus had less disk space. The status below
Hi,
Revisiting the thread I can see that nodetool status had both good and bad
nodes at same time. How do you replace nodes? When you say bad node..I
understand that the node is no more usable even though Cassandra is UP? Is that
correct?
If a node is in bad shape and not working, adding new nod
Hi all,
We’ve spent a few days running things but are in the same position. To add
some more flavour:
- We have a 3-node ring, replication factor = 3. We’ve been running in
this configuration for a few years without any real issues
- Nodes 2 & 3 are much newer than node 1. These two no
Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc.
System seems to be overloaded. This is evident from mutation drops.
2. Make sure that all recommended Cassandra production settings available at
Datastax site
# nodetool compactionstats
pending tasks: 22
compaction typekeyspace table
completed total unit progress
Compactionproduction_analyticsinteractions
240410213161172668724 bytes 0.15%
Compactionproduction_decisionsdecisions.d
What’s your output of `nodetool compactionstats`?
> On Jan 6, 2016, at 7:26 AM, Vickrum Loi wrote:
>
> Hi,
>
> We recently added a new node to our cluster in order to replace a node that
> died (hardware failure we believe). For the next two weeks it had high disk
> and network activity. We r
I should probably have mentioned that we're on Cassandra 2.0.10.
On 6 January 2016 at 15:26, Vickrum Loi
wrote:
> Hi,
>
> We recently added a new node to our cluster in order to replace a node
> that died (hardware failure we believe). For the next two weeks it had high
> disk and network activi
Hi,
We recently added a new node to our cluster in order to replace a node that
died (hardware failure we believe). For the next two weeks it had high disk
and network activity. We replaced the server, but it's happened again.
We've looked into memory allowances, disk performance, number of
connec
18 matches
Mail list logo