read workloads (up to 100s downtime) in case of a Cassandra node failure

Alexander Dejanovski Fri, 16 Nov 2018 06:08:44 -0800

Hi Daniel,

it seems like the driver isn't detecting that the node went down, which is
probably due to the way the node is being killed.
If I remember correctly, in some cases Netty transport is still up in the
client, which will still allows to send queries without them answering back
: https://datastax-oss.atlassian.net/browse/JAVA-1346
Eventually, the node gets discarded when the heartbeat system catches up.
It's also possible that the stuck queries then eat up all the available
slots in the driver, preventing any other query to be sent in that JVM.


Which version of the Datastax Driver are you using for your tests?
How is it configured (load balancing policies, etc...) ?
Do you have some debug logs on the client side that could help?

Thanks,


On Fri, Nov 16, 2018 at 1:19 PM Daniel Seybold <daniel.seyb...@uni-ulm.de>
wrote:

> Hi Sean,
>
> thanks for your comments, find below some more details with respect to the
> (1) VM sizing and (2) the replication factor:
>
> (1) VM sizing:
>
> We selected the small VMs as intial setup to run our experiments. We have
> also executed the same experiments (5 nodes) on larger VMs with 6 cores and
> 12GB memory (where 6GB was allocated to Cassandra).
>
> We use the default CMS garbace collector (with default settings) and the
> debug.log and system.log does not show any suspicious GC messages.
>
> (2) Replication factor
>
> We set the RF to 5 as we want to emulate a scenario which is able to
> survive multiple-node failures. We have also tried a RF of 3 (in the 5 node
> cluster) but the downtime in case of a node failure persists.
>
>
> I also attached two plots which show the results with the downtimes for
> using the larger VMs and setting the RF to 3
>
> Any further comments much appreciated,
> Cheers,
> Daniel
>
>
> Am 09.11.2018 um 19:04 schrieb Durity, Sean R:
>
> The VMs’ memory (4 GB) seems pretty small for Cassandra. What heap size
> are you using? Which garbage collector? Are you seeing long GC times on the
> nodes? The basic rule of thumb is to give the Cassandra heap 50% of the RAM
> on the host. 2 GB isn’t very much.
>
>
>
> Also, I wouldn’t set the replication factor to 5 (the number of nodes). If
> RF is always equal to the number of nodes, you can’t really scale beyond
> the size of the disk on any one node (all data is on each node). A
> replication factor of 3 would be more like a typical production set-up.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Daniel Seybold <daniel.seyb...@uni-ulm.de>
> <daniel.seyb...@uni-ulm.de>
> *Sent:* Friday, November 09, 2018 5:49 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Availability issues for write/update/read workloads
> (up to 100s downtime) in case of a Cassandra node failure
>
>
>
> Hi Apache Cassandra experts,
>
> we are running a set of availability evaluations under a write/read/update
> workloads with Apache Cassandra and experience some unexpected results,
> i.e.  0 ops/s over a period up to 100s.
>
> In order to provide a clear picture find below the details of (1) the
> setup and (2) the evaluation workflow
>
> *1. Setup:*
>
> Cassandra version: 3.11.2
> Cluster size: 5 nodes
> Replication Factor: 5
> Each nodes runs in the same private OpenStack based cloud, within the same
> availability zone and uses the private network.
> Each nodes runs as OS Ubuntu 16.04 server and has 2 cores, 4GB RAM and
> 50GB disk.
>
> Workload:
> Yahoo Cloud Serving Benchmark 0.12
> W1: 100% write
> W2: 100% read
> W3: 100% update
>
> *2. Evaluation Workflow: *
>
> 1. allocate 5 VMs & deploy DBMS cluster
> 2. start a YCSB worklod (only one of W1-3) which runs up to 30 minutes
> 3. wait for 200s
> 4. trigger the selection of a  random node in the cluster and delete the
> VM without stopping  Cassandra before
> 5. analyze throughput time series over the evaluation
>
>
>
> *3. (Unexpected) Results *We expected to see a (slight) drop in the
> throughput as soon as the VM was deleted.
> But the throughput results show that the there are periods of ~10s - 150s
> (not deterministic) where no operations are executed (all metrics are
> collected on client side)
> Yet, there are no timeout exceptions on client side and also the logs on
> cluster side do not show anything that explains this behaviour.
>
> I attached a series of plots which show the throughput and the downtimes
> over the evaluation runs.
>
> Do you have any explanations for this behaviour or recommendations how to
> reduce the  potential "downtime" ?
>
> Thanks in advance for any help and recommendations,
>
> Cheers,
> Daniel
>
>
>
> --
>
> M.Sc. Daniel Seybold
>
>
>
> Universität Ulm
>
> Institut Organisation und Management
>
> von Informationssystemen (OMI)
>
> Albert-Einstein-Allee 43 
> <https://maps.google.com/?q=Albert-Einstein-Allee+43+%0D%0A+++++++++++89081+Ulm&entry=gmail&source=g>
>
>
> <https://maps.google.com/?q=Albert-Einstein-Allee+43+%0D%0A+++++++++++89081+Ulm&entry=gmail&source=g>
>
> 89081 Ulm 
> <https://maps.google.com/?q=Albert-Einstein-Allee+43+%0D%0A+++++++++++89081+Ulm&entry=gmail&source=g>
>
> Phone: +49 (0)731 50-28 799 <+49%20731%205028799>
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> --
> M.Sc. Daniel Seybold
>
> Universität Ulm
> Institut Organisation und Management
> von Informationssystemen (OMI)Albert-Einstein-Allee 43
> 89081 Ulm 
> <https://maps.google.com/?q=Albert-Einstein-Allee+43%0D%0A89081+Ulm&entry=gmail&source=g>
> Phone: +49 (0)731 50-28 799 <+49%20731%205028799>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

Reply via email to