Hi Ilya, Regarding the throttling question, I have not yet looked at thread dumps - the observed behaviour has been seen in production metrics and logging. What would you expect a thread dump to show in this case?
Given my description of the sizes of the data regions and the numbers of pages being updated in a checkpoint would you expect any throttling behaviour? Thanks, Raymond. On Mon, Dec 28, 2020 at 11:53 PM Ilya Kasnacheev <ilya.kasnach...@gmail.com> wrote: > Hello! > > 1. If we knew the specific circumstances in which a specific setting value > will yield the most benefit, we would've already set it to that value. A > setting means that you may tune it and get better results, or not. But in > general we can't promise you anything. I did see improvements from > increasing this setting in a very specific setup, but in general you may > leave it as is. > > 2. More frequent checkpoints mean increased write amplification. So > reducing this value may overwhelm your system with load that it was able to > handle previously. You can set this setting to arbitrary small value, > meaning that checkpoints will be purely sequential without any pauses > between them. > > 3. I don't think that default throttling mechanism will emit any warnings. > What do you see in thread dumps? > > Regards, > -- > Ilya Kasnacheev > > > ср, 23 дек. 2020 г. в 12:48, Raymond Wilson <raymond_wil...@trimble.com>: > >> Hi, >> >> We have been investigating some issues which appear to be related to >> checkpointing. We currently use the IA 2.8.1 with the C# client. >> >> I have been trying to gain clarity on how certain aspects of the Ignite >> configuration relate to the checkpointing process: >> >> 1. Number of check pointing threads. This defaults to 4, but I don't >> understand how it applies to the checkpointing process. Are more threads >> generally better (eg: because it makes the disk IO parallel across the >> threads), or does it only have a positive effect if you have many data >> storage regions? Or something else? If this could be clarified in the >> documentation (or a pointer to it which Google has not yet found), that >> would be good. >> >> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was thinking >> that reducing this time would result in smaller less disruptive check >> points. Setting it to 60 seconds seems pretty safe, but is there a >> practical lower limit that should be used for use cases with new data >> constantly being added, eg: 5 seconds, 10 seconds? >> >> 3. Write exclusivity constraints during checkpointing. I understand that >> while a checkpoint is occurring ongoing writes will be supported into the >> caches being check pointed, and if those are writes to existing pages then >> those will be duplicated into the checkpoint buffer. If this buffer becomes >> full or stressed then Ignite will throttle, and perhaps block, writes until >> the checkpoint is complete. If this is the case then Ignite will emit >> logging (warning or informational?) that writes are being throttled. >> >> We have cases where simple puts to caches (a few requests per second) are >> taking up to 90 seconds to execute when there is an active check point >> occurring, where the check point has been triggered by the checkpoint >> timer. When a checkpoint is not occurring the time to do this is usually in >> the milliseconds. The checkpoints themselves can take 90 seconds or longer, >> and are updating up to 30,000-40,000 pages, across a pair of data storage >> regions, one with 4Gb in-memory space allocated (which should be 1,000,000 >> pages at the standard 4kb page size), and one small region with 128Mb. >> There is no 'throttling' logging being emitted that we can tell, so the >> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb >> for the second smaller region in this case) does not look like it can fill >> up during the checkpoint. >> >> It seems like the checkpoint is affecting the put operations, but I don't >> understand why that may be given the documented checkpointing process, and >> the checkpoint itself (at least via Informational logging) is not >> advertising any restrictions. >> >> Thanks, >> Raymond. >> >> -- >> <http://www.trimble.com/> >> Raymond Wilson >> Solution Architect, Civil Construction Software Systems (CCSS) >> >> -- <http://www.trimble.com/> Raymond Wilson Solution Architect, Civil Construction Software Systems (CCSS) 11 Birmingham Drive | Christchurch, New Zealand +64-21-2013317 Mobile raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>