Hi, an update: I am pretty sure it is a problem with insufficient bandwidth. I can’t be sure because Cassandra does not seem to provide debug information on hint creation (only when replaying hints). When the bandwidth issue is solved I will try to reproduce the accumulation of hints by artificially limiting the bandwidth.
BG Jens On 3. Apr 2019, at 01:48, Stefan Miklosovic <stefan.mikloso...@instaclustr.com<mailto:stefan.mikloso...@instaclustr.com>> wrote: Hi Jens, I am reading Cassandra The definitive guide and there is a chapter 9 - Reading and Writing Data and section The Cassandra Write Path and this sentence in it: If a replica does not respond within the timeout, it is presumed to be down and a hint is stored for the write. So your node might be actually fine eventually but it just can not cope with the load and it will reply too late after a coordinator has sufficient replies from other replicas. So it makes a hint for that write and for that node. I am not sure how is this related to turning off handoffs completely. I can do some tests locally if time allows to investigate various scenarios. There might be some subtle differences .... On Wed, 3 Apr 2019 at 07:19, Jens Fischer <j.fisc...@sonnen.de<mailto:j.fisc...@sonnen.de>> wrote: Yes, Apache Cassandra 3.11.2 (no DSE). On 2. Apr 2019, at 19:40, sankalp kohli <kohlisank...@gmail.com<mailto:kohlisank...@gmail.com>> wrote: Are you using OSS C*? On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer <j.fisc...@sonnen.de<mailto:j.fisc...@sonnen.de>> wrote: Hi, I have a Cassandra setup with multiple data centres. The vast majority of writes are LOCAL_ONE writes to data center DC-A. One node (lets call this node A1) in DC-A has accumulated large amounts of hint files (~100 GB). In the logs of this node I see lots of messages like the following: INFO [HintsDispatcher:26] 2019-03-28 01:49:25,217 HintsDispatchExecutor.java:289 - Finished hinted handoff of file db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint /10.10.2.55<http://10.10.2.55/>: db485ac6-8acd-4241-9e21-7a2b540459de The node 10.10.2.55 is in DC-B, lets call this node B1. There is no indication whatsoever that B1 was down: Nothing in our monitoring, nothing in the logs of B1, nothing in the logs of A1. Are there any other situations where hints to B1 are stored at A1? Other than A1's failure detection detecting B1 as down I mean. For example could the reason for the hints be that B1 is overloaded and can not handle the intake from the A1? Or that the network connection between DC-A and DC-B is to slow? While researching this I also found the following information on Stack Overflow from Ben Slater regarding hints and multi-dc replication: Another factor here is the consistency level you are using - a LOCAL_* consistency level will only require writes to be written to the local DC for the operation to be considered a success (and hints will be stored for replication to the other DC). (…) The hints are the records of writes that have been made in one DC that are not yet replicated to the other DC (or even nodes within a DC). I think your options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*) consistency - this will slow down your writes but will ensure writes go into both DCs before the op completes (2) Don't replicate the data to the second DC (by setting the replication factor to 0 for the second DC in the keyspace definition) (3) Increase the capacity of the second DC so it can keep up with the writes (4) Slow down your writes so the second DC can keep up. Source: https://stackoverflow.com/a/37382726 This reads like hints are used for “normal” (async) replication between data centres, i.e. hints could show up without any nodes being down whatsoever. This could explain what I am seeing. Does anyone now more about this? Does that mean I will see hints even if I disable hinted handoff? Any pointers or help are greatly appreciated! Thanks in advance Jens [https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg] Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer. Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908 [https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg] Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer. Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908 [https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg] Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer. Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908