RE: all the nost are not reacheable when running massive deletes

Paco Trujillo Thu, 07 Apr 2016 00:19:46 -0700

Well, then you could trying to replace this node as soon as you have more nodes 
available. I would use this procedure as I believe it is the most efficient 
one: 
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.


It is not always the same node, it is always one node from the seven in the 
cluster which has the high load but not always the same.

Respect to the question of the hardware ( from one of the nodes, all of them 
have the same configuration)

Disk:


-          We use sdd disks

-          Output from iostat -mx 5 100:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1,00    0,00    0,40    0,03    0,00   98,57

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,20     0,00     0,00     8,00     
0,00    0,00   0,00   0,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00     0,00     
0,00    0,00   0,00   0,00
sdc               0,00     0,00    0,00    0,00     0,00     0,00     0,00     
0,00    0,00   0,00   0,00
sdd               0,00     0,20    0,00    0,40     0,00     0,00    12,00     
0,00    2,50   2,50   0,10


-          Logs, I do not see nothing on the messages log except this:

Apr  3 03:07:01 GT-cassandra7 rsyslogd: [origin software="rsyslogd" 
swVersion="5.8.10" x-pid="1504" x-info="http://www.rsyslog.com";] rsyslogd was 
HUPed
Apr  3 18:24:55 GT-cassandra7 ntpd[1847]: 0.0.0.0 06a8 08 no_sys_peer
Apr  4 06:56:18 GT-cassandra7 ntpd[1847]: 0.0.0.0 06b8 08 no_sys_peer

CPU:


-          General use: 1 – 4 %

-          Worst case: 98% .It is when the problem comes, running massive 
deletes(even in a different machine which is receiving the deletes) or running 
a repair.

RAM:


-          We are using CMS.

-          Each node have 16GB, and we dedicate to Cassandra

o   MAX_HEAP_SIZE="10G"

o   HEAP_NEWSIZE="800M"


Regarding to the rest of questions you mention:


-          Clients: we use the datastax java driver with this configuration:
//Get contact points
                  String[] 
contactPoints=this.environment.getRequiredProperty(CASSANDRA_CLUSTER_URL).split(",");
          cluster = com.datastax.driver.core.Cluster.builder()
                  .addContactPoints(contactPoints)
                      
//.addContactPoint(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_URL))
                      
.withCredentials(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_USERNAME),
                                  
this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PASSWORD))
                                  .withQueryOptions(new QueryOptions()
                                  .setConsistencyLevel(ConsistencyLevel.QUORUM))
                                  //.withLoadBalancingPolicy(new 
TokenAwarePolicy(new DCAwareRoundRobinPolicy(CASSANDRA_PRIMARY_CLUSTER)))
                                  .withLoadBalancingPolicy(new 
TokenAwarePolicy(new RoundRobinPolicy()))
                                  //.withLoadBalancingPolicy(new 
TokenAwarePolicy((LoadBalancingPolicy) new RoundRobinBalancingPolicy()))
                                  .withRetryPolicy(new 
LoggingRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE))
                                  
.withPort(Integer.parseInt(this.environment.getRequiredProperty(CASSANDRA_CLUSTER_PORT)))
                                  .build();

So request should be evenly distributed.


-          Deletes are contained in a cql file, and I am using cqlsh to execute 
them. I will try to run the deletes in small batches and separate nodes, but 
same problem appear when running repairs.

I think the problem is related with one specific column family:

CREATE TABLE snpaware.snpsearch (
    idline1 bigint,
    idline2 bigint,
    partid int,
    id uuid,
    alleles int,
    coverage int,
    distancetonext int,
    distancetonextbyline int,
    distancetoprev int,
    distancetoprevbyline int,
    frequency double,
    idindividual bigint,
    idindividualmorph bigint,
    idreferencebuild bigint,
    isinexon boolean,
    isinorf boolean,
    max_length int,
    morphid bigint,
    position int,
    qualityflag int,
    ranking int,
    referencebuildlength int,
    snpsearchid uuid,
    synonymous boolean,
    PRIMARY KEY ((idline1, idline2, partid), id)
) WITH CLUSTERING ORDER BY (id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = 'KEYS_ONLY'
    AND comment = 'Table with the snp between lines'
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
   AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND index_interval = 128
    AND memtable_flush_period_in_ms = 0
    AND populate_io_cache_on_flush = false
    AND read_repair_chance = 0.1
    AND replicate_on_write = true
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX snpsearch_morphid ON snpaware.snpsearch (morphid);

Which holds a lot of data. It is normaly a cf which needs to be readed but 
sometimes updated and deleted and I think the problem is there. I wanted to 
change the compaction strategy but that means that a compaction will be 
executed and then timeouts will appear and I can not do that on the live 
cluster right now.

I will try bring a snapshot of the cf to a test cluster and test the repair 
there (I can not snaphost the data from the live cluster completely because it 
does not fit in our test cluster). Following your recommendation I will 
postpone the upgrade of the cluster (but the partial repair in version 2.1 
looks a good fit for my situation to decrease the pressure on the nodes when 
running compactions).

Anyway I have ordered two new nodes, because maybe that will help. The problem 
is that adding a new node will need to run clean up in all nodes, the clean up 
implies a compaction? If the answer to this is yes, then the timeouts will 
appear again.


From: Alain RODRIGUEZ [mailto:[email protected]]
Sent: dinsdag 5 april 2016 15:11
To: [email protected]
Subject: Re: all the nost are not reacheable when running massive deletes

 Over use the cluster was one thing which I was thinking about, and I have 
requested two new nodes (anyway it was something already planned). But the 
pattern of nodes with high CPU load is only visible in 1 or two of the nodes, 
the rest are working correctly. That made me think that adding two new nodes 
maybe will not help.

Well, then you could trying to replace this node as soon as you have more nodes 
available. I would use this procedure as I believe it is the most efficient 
one: 
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.

Yet I believe it might not be a hardware or cluster throughput issue, and if it 
is a hardware issues you probably want to dig it as this machine is yours and 
not a virtual one. You might want to reuse it anyway.

Some questions about the machine and their usage.

Disk:
What disk hardware and configuration do you use.
iostat -mx 5 100 gives you? How is iowait?
Any error in the system / kernel logs?

CPU
How much used are the CPUs in general / worst cases?
What is the load average / max and how many cores have the cpu?

RAM
You are using 10GB heap and CMS right? You seems to say that GC activity looks 
ok, can you confirm?
How much total RAM are the machines using?

The point here is to see if we can spot the bottleneck. If there is none, 
Cassandra is probably badly configured at some point.

when running “massive deletes” on one of the nodes

 Run the deletes at slower at constant path sounds good and definitely I will 
try that.

Are clients and queries well configured to use all the nodes evenly? Are 
deletes well balanced also? If not, balancing the usage of the nodes will 
probably alleviate things.

The update of Cassandra is a good point but I am afraid that if I start the 
updates right now the timeouts problems will appear again. During an update 
compactions are executed? If it is not I think is safe to update the cluster.

I do not recommend you to upgrade right now indeed. Yet I would do it asap (= 
as soon as the cluster is ready and clients are compatible with the new 
version). You should always start operations with an healthy cluster or you 
might end in a worst situation. Compactions will run normally. Make sure not to 
run any streaming process (repairs / bootstrap / node removal) during the 
upgrade and while you have not yet run "nodetool upgradesstable". There is a 
lot of informations out there about upgrades.

C*heers,
-----------------------
Alain Rodriguez - [email protected]<mailto:[email protected]>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-05 10:32 GMT+02:00 Paco Trujillo 
<[email protected]<mailto:[email protected]>>:
Hi daemeon

We have check network and it is ok, in fact the nodes are connecting between 
themselves with a dedicated network.

From: daemeon reiydelle [mailto:[email protected]<mailto:[email protected]>]
Sent: maandag 4 april 2016 18:42
To: [email protected]<mailto:[email protected]>
Subject: Re: all the nost are not reacheable when running massive deletes


Network issues. Could be jumbo frames not consistent or other.

sent from my mobile

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198<tel:415.501.0198>
London +44.0.20.8144.9872<tel:%2B44.0.20.8144.9872>
On Apr 4, 2016 5:34 AM, "Paco Trujillo" 
<[email protected]<mailto:[email protected]>> wrote:
Hi everyone

We are having problems with our cluster (7 nodes version 2.0.17) when running 
“massive deletes” on one of the nodes (via cql command line). At the beginning 
everything is fine, but after a while we start getting constant 
NoHostAvailableException using the datastax driver:

Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (tried: 
/172.31.7.243:9042<http://172.31.7.243:9042> 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /172.31.7.245:9042<http://172.31.7.245:9042> 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /172.31.7.246:9042<http://172.31.7.246:9042> 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /172.31.7.247:9042<http://172.31.7.247:9042>, 
/172.31.7.232:9042<http://172.31.7.232:9042>, 
/172.31.7.233:9042<http://172.31.7.233:9042>, 
/172.31.7.244:9042<http://172.31.7.244:9042> [only showing errors of first 3 
hosts, use getErrors() for more details])


All the nodes are running:

UN  172.31.7.244  152.21 GB  256     14.5%  
58abea69-e7ba-4e57-9609-24f3673a7e58  RAC1
UN  172.31.7.245  168.4 GB   256     14.5%  
bc11b4f0-cf96-4ca5-9a3e-33cc2b92a752  RAC1
UN  172.31.7.246  177.71 GB  256     13.7%  
8dc7bb3d-38f7-49b9-b8db-a622cc80346c  RAC1
UN  172.31.7.247  158.57 GB  256     14.1%  
94022081-a563-4042-81ab-75ffe4d13194  RAC1
UN  172.31.7.243  176.83 GB  256     14.6%  
0dda3410-db58-42f2-9351-068bdf68f530  RAC1
UN  172.31.7.233  159 GB     256     13.6%  
01e013fb-2f57-44fb-b3c5-fd89d705bfdd  RAC1
UN  172.31.7.232  166.05 GB  256     15.0%  4d009603-faa9-4add-b3a2-fe24ec16a7c1

but two of them have high cpu load, especially the 232 because I am running a 
lot of deletes using cqlsh in that node.

I know that deletes generate tombstones, but with 7 nodes in the cluster I do 
not think is normal that all the host are not accesible.

We have a replication factor of 3 and for the deletes I am not using any 
consistency (so it is using the default ONE).

I check the nodes which a lot of CPU (near 96%) and th gc activity remains on 
1.6% (using only 3 GB from the 10 which have assigned). But looking at the 
thread pool stats, the mutation stages pending column grows without stop, could 
be that the problem?

I cannot find the reason that originates the timeouts. I already have increased 
the timeouts, but It do not think that is a solution because the timeouts 
indicated another type of error. Anyone have a tip to try to determine where is 
the problem?

Thanks in advance

RE: all the nost are not reacheable when running massive deletes

Reply via email to