Hello everyone,

We are currently using Cassandra 4.1.3 in a two-data-center cluster.
Recently, we observed cross-node latency spikes of 3-4 seconds in one of
our data centers. Below are the relevant logs from all three nodes in this
DC:

DEBUG [ScheduledTasks:1] 2024-10-20 02:46:43,164
MonitoringTask.java:174 - 413 operations were slow in the last 5001
msecs:
<SELECT ItemDetails, ItemDocuments, ItemISQDetails, ItemMappings,
LastModified, ItemImages, ItemTitles, ItemCategories, ItemRating,
ApprovalStatus, LocalName, UserIdentifier, IsDisplayed, VariantOptions
FROM product_data.item_table WHERE item_table_display_id =
2854462277448 LIMIT 5000 ALLOW FILTERING>, time 3400 msec - slow
timeout 500 msec
<SELECT AlternateMasterCategoryData, MasterCategoryData,
MasterGroupData, MasterSubCategoryData, MasterParentCategoryData FROM
product_data.taxonomy_table WHERE master_id = 6402 LIMIT 5000 ALLOW
FILTERING>, time 2309 msec - slow timeout 500 msec/cross-node
<SELECT ItemDetails, ItemDocuments, ItemISQDetails, ItemMappings,
LastModified, ItemImages, ItemTitles, ItemCategories, ItemRating,
ApprovalStatus, LocalName, UserIdentifier, IsDisplayed, VariantOptions
FROM product_data.item_table WHERE item_table_display_id = 24279823548
LIMIT 5000 ALLOW FILTERING>, time 3287 msec - slow timeout 500
msec/cross-node
<SELECT ItemDetails, ItemDocuments, ItemISQDetails, ItemMappings,
LastModified, ItemImages, ItemTitles, ItemCategories, ItemRating,
ApprovalStatus, LocalName, UserIdentifier, IsDisplayed, VariantOptions
FROM product_data.item_table WHERE item_table_display_id =
2854486264330 LIMIT 5000 ALLOW FILTERING>, time 2878 msec - slow
timeout 500 msec/cross-node
<SELECT AlternateMasterCategoryData, MasterCategoryData,
MasterGroupData, MasterSubCategoryData, MasterParentCategoryData FROM
product_data.taxonomy_table WHERE master_id = 27245 LIMIT 5000 ALLOW
FILTERING>, time 3056 msec - slow timeout 500 msec/cross-node
<SELECT AlternateMasterCategoryData, MasterCategoryData,
MasterGroupData, MasterSubCategoryData, MasterParentCategoryData FROM
product_data.taxonomy_table WHERE master_id = 32856 LIMIT 5000 ALLOW
FILTERING>, time 2353 msec - slow timeout 500 msec/cross-node
<SELECT AlternateMasterCategoryData, MasterCategoryData,
MasterGroupData, MasterSubCategoryData, MasterParentCategoryData FROM
product_data.taxonomy_table WHERE master_id = 95589 LIMIT 5000 ALLOW
FILTERING>, time 2224 msec - slow timeout 500 msec/cross-node
<SELECT ItemDetails, ItemDocuments, ItemISQDetails, ItemMappings,
LastModified, ItemImages, ItemTitles, ItemCategories, ItemRating,
ApprovalStatus, LocalName, UserIdentifier, IsDisplayed, VariantOptions
FROM product_data.item_table WHERE item_table_display_id =
2854514159012 LIMIT 5000 ALLOW FILTERING>, time 3396 msec - slow
timeout 500 msec

Upon investigation, we found no GC pauses at the time of the latency, and
CPU and memory utilization across all nodes appeared normal. Additionally,
latency metrics from Grafana also showed standard performance.

Given these observations, we are trying to identify the potential causes of
this latency. Any insights or suggestions from the community would be
greatly appreciated!

Thank you!

Reply via email to