Re:Re: Re: Re: unconfigured table logtabl

2020-04-04 Thread David Ni
Thank you very much for your friendly note.


ERROR [AntiEntropyStage:1] 2020-04-04 13:57:09,614 
RepairMessageVerbHandler.java:177 - Table with id 
21a3fa90-74c7-11ea-978a-b556b0c3a5ea was dropped during prepare phase of repair



cassandra@cqlsh:system_schema> select keyspace_name,table_name,id from tables 
where keyspace_name='oapi_dev' and table_name='logtabl';

 keyspace_name | table_name | id
---++--
  oapi_dev |logtabl | 830028a0-7584-11ea-a277-bdf3d1289bdd

the table id does not match the id  from system_schema.tables


how to fix it?











At 2020-04-04 14:44:16, "Erick Ramirez"  wrote:

Is it possible someone else dropped then recreated the logtabl table? Also, did 
you confirm that the missing table ID matches the ID of logtabl?


On a friendly note, there are a number of users here like me who respond to 
questions on the go. I personally find it difficult to read screenshots on my 
phone so if it isn't too much trouble, it would be preferable if you pasted the 
text here instead. Cheers!

one node down and cluster works better

2020-04-04 Thread Osman Yozgatlıoğlu
Hello,

I manage one cluster with 2 dc, 7 nodes each and replication factor is 2:2
My insertion performance dropped somehow.
I restarted nodes one by one and found one node degrades performance.
Verified this node after problem occurs a couple of times.
How can I continue to investigate?

Regards,
Osman

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: one node down and cluster works better

2020-04-04 Thread mehmet bursali
Hi Osman,Do you use any monitoring solution such as prometheus on your cluster? 
If yes, you should install and use cassandra exporter from the link below and 
examine some detailed metrics.https://github.com/criteo/cassandra_exporter  
ndroid’de Yahoo Postadan gönderildi 
 
  15:53’’4e’ 4 Nis 2020 Cmt tarihinde, Osman 
Yozgatlıoğlu şunu yazdı:   Hello,

I manage one cluster with 2 dc, 7 nodes each and replication factor is 2:2
My insertion performance dropped somehow.
I restarted nodes one by one and found one node degrades performance.
Verified this node after problem occurs a couple of times.
How can I continue to investigate?

Regards,
Osman

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

  


OOM only on one datacenter nodes

2020-04-04 Thread Surbhi Gupta
Hi,

We have two datacenter with 5 nodes each and have replication factor of 3.
We have traffic on DC1 and DC2 is just for disaster recovery and there is
no direct traffic.
We are using 24cpu with 128GB RAM machines .
For DC1 where we have live traffic , we don't see any issue, however for
DC2 where we don't have live traffic we see lots OOM(Out of Memory)and node
goes down(only on DC2 nodes).

We were using 16GB heap with G1GC in DC1 and DC2 both .
As DC2 nodes were OOM so we increased 16GB to 24GB and then to 32GB but
still DC2 nodes goes down with OOM , but obviously not as frequently as it
used to go down when heap was 16GB .
DC1 nodes are still on 16GB heap and none of the nodes goes down .

We are on open source 3.11.0 .
We are having Materialized views.
We see lots of hints pending on DC2 nodes and hints replay is very very
slow on DC2 nodes compare to DC1 nodes.

Other than heap sizes mentioned above , all the configs are same in
all nodes in the clusters.
We are using JRE and can't collect the heap dump.

Any idea, what can be the cause ?

Currently disk_access_modeis not set hence it is auto in our env. Should
setting disk_access_mode  to mmap_index_only  will help ?

My question is "*Why DC2 nodes OOM and DC1 nodes doesn't?*"

Thanks
Surbhi


Re: one node down and cluster works better

2020-04-04 Thread Erick Ramirez
With only 2 replicas per DC, it means you're likely writing with a
consistency level of either ONE or LOCAL_ONE. Everytime you hit the
problematic node, the write performance drops. All other configurations
being equal, this indicates an issue with the commitlog disk on the node.

Get your sysadmin to check for any issues with the disk, perhaps there's a
problem with the hardware and failure is impending. As a side note, it's
probably a good time to ensure that cluster repairs are in order in case
the node fails. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax
have answers! Share your expertise on https://community.datastax.com/.


Re: OOM only on one datacenter nodes

2020-04-04 Thread Erick Ramirez
With a lack of heapdump for you to analyse, my hypothesis is that your DC2
nodes are taking on traffic (from some client somewhere) but you're just
not aware of it. The hints replay is just a side-effect of the nodes
getting overloaded.

To rule out my hypothesis in the first instance, my recommendation is to
monitor the incoming connections to the nodes in DC2. If you don't have
monitoring in place, you could simply run netstat at regular intervals and
go from there. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax
have answers! Share your expertise on https://community.datastax.com/.


Re: Re: Re: Re: unconfigured table logtabl

2020-04-04 Thread Erick Ramirez
This is confirmation that you have a schema disagreement in your cluster:

   - 21a3fa90-74c7-11ea-978a-b556b0c3a5ea = Friday, April 3 05:07:44 PT
   - 830028a0-7584-11ea-a277-bdf3d1289bdd = Friday, April 3 01:24:18 PT

The schema on the node where you ran that query has an older version of the
table (created on Friday 1am PT) versus the expected table ID (created on
Friday 5am PT). Try running a nodetool resetlocalschema to force the node
to get the latest version from other nodes. Check the docs for details on
the command if you need to. Cheers!


Re: OOM only on one datacenter nodes

2020-04-04 Thread Reid Pinchback
Surbi:

If you aren’t seeing connection activity in DC2, I’d check to see if the 
operations hitting DC1 are quorum ops instead of local quorum.  That still 
wouldn’t explain DC2 nodes going down, but would at least explain them doing 
more work than might be on your radar right now.

The hint replay being slow to me sounds like you could be fighting GC.

You mentioned bumping the DC2 nodes to 32gb.  You might have already been doing 
this, but if not, be sure to be under 32gb, like 31gb.  Otherwise you’re using 
larger object pointers and could actually have less effective ability to 
allocate memory.

As the problem is only happening in DC2, then there has to be a thing that is 
true in DC2 that isn’t true in DC1.  A difference in hardware, a difference in 
O/S version, a difference in networking config or physical infrastructure, a 
difference in client-triggered activity, or a difference in how repairs are 
handled. Somewhere, there is a difference.  I’d start with focusing on that.


From: Erick Ramirez 
Reply-To: "user@cassandra.apache.org" 
Date: Saturday, April 4, 2020 at 8:28 PM
To: "user@cassandra.apache.org" 
Subject: Re: OOM only on one datacenter nodes

Message from External Sender
With a lack of heapdump for you to analyse, my hypothesis is that your DC2 
nodes are taking on traffic (from some client somewhere) but you're just not 
aware of it. The hints replay is just a side-effect of the nodes getting 
overloaded.

To rule out my hypothesis in the first instance, my recommendation is to 
monitor the incoming connections to the nodes in DC2. If you don't have 
monitoring in place, you could simply run netstat at regular intervals and go 
from there. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax have 
answers! Share your expertise on 
https://community.datastax.com/.