Hi Mike, Unlike normal EBS volumes for which you don't need to pre-warm, I think you need to pre-Warm your EBS volume restored from a snapshot Have a look at this AWS doc https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html It says that: However, storage blocks on volumes that were restored from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block. This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed. For most applications, amortizing this cost over the lifetime of the volume is acceptable. Performance is restored after the data is accessed once.
I hope it helps :) Cheers, Christophe On 28 March 2018 at 06:24, Mike Torra <mto...@salesforce.com> wrote: > Hi There - > > I have noticed an issue where I consistently see high p999 read latency on > a node for a few hours after replacing the node. Before replacing the node, > the p999 read latency is ~30ms, but after it increases to 1-5s. I am > running C* 3.11.2 in EC2. > > I am testing out using EBS snapshots of the /data disk as a backup, so > that I can replace nodes without having to fully bootstrap the replacement. > This seems to work ok, except for the latency issue. Some things I have > noticed: > > - `nodetool netstats` doesn't show any 'Completed' Large Messages, only > 'Dropped', while this is going on. There are only a few of these. > - the logs show warnings like this: > > WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655 > NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s > with average duration of 235.88ms, 86 have exceeded the configured commit > interval by an average of 113.66ms > and I can see some slow queries in debug.log, but I can't figure out > what is causing it > - gc seems normal > > Could this have something to do with starting the node with the EBS > snapshot of the /data directory? My first thought was that this is related > to the EBS volumes, but it seems too consistent to be actually caused by > that. The problem is consistent across multiple replacements, and multiple > EC2 regions. > > I appreciate any suggestions! > > - Mike > -- *Christophe Schmitz - **VP Consulting* AU: +61 4 03751980 / FR: +33 7 82022899 <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> <https://www.linkedin.com/company/instaclustr> Read our latest technical blog posts here <https://www.instaclustr.com/blog/>. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.