[
https://issues.apache.org/jira/browse/CASSANDRA-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Konstantinov updated CASSANDRA-21083:
--------------------------------------------
Attachment: CASSANDRA-21088_draft_ci_summary.htm
> Optimize memtable flush logic
> -----------------------------
>
> Key: CASSANDRA-21083
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21083
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.x
>
> Attachments: CASSANDRA-21083.html,
> CASSANDRA-21088_draft_ci_summary.htm,
> CASSANDRA-21088_draft_results_details.tar.xz, profiles.zip
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Memtable flushing to disk impacts write performance and can be a limiting
> factor for write throughput:
> * If we cannot flush fast enough we have to limit writes to memtables due to
> lack of available memory for them
> * flushing logic can be CPU-intensive and complete with writing threads for
> CPU by stealing 1-2 cores (or even more if memtable_flush_writers is set to a
> higher value)
> Suggested optimisations:
> # invoke MetadataCollector.updateClusteringValues only for first and last
> clustering key in a partition, not for every row
> ([link|https://github.com/apache/cassandra/commit/df2df1d0eefc8b603eafa87f42ed1975dfc46143])
> # split call sites for in Cell.Serializer serialize logic to avoid
> megamorphic calls + make cell.isCounterCell check cheaper (avoid megamorphic
> call + pre-calculate isCounterColumn info)
> ([link|https://github.com/apache/cassandra/commit/7653194932e9ffb966c0c1c1f76fbcf532f222a7])
> # check if Guardrails enabled at the beginning of writing, not per row,
> avoid hidden auto-boxing for logging of primitive parameters
> ([link|https://github.com/apache/cassandra/commit/f17e835108ad6f282257e95992084a33e9d47b52])
> # add fast return for BTreeRow.hasComplexDeletion if there was no deletions,
> avoid column.name.bytes.hashCode if not needed, avoid capturing lambda
> allocation in UnfilteredSerializer.serializeRowBody
> ([link|https://github.com/apache/cassandra/commit/8c2d0f5a24a6e25832d7fae6668f01fbbccc285a])
> # reduce allocations during serialization of NativeClustering
> ([link|https://github.com/apache/cassandra/commit/457f2803efd2af1a919dbe56fe958627e5652fc2])
> # do not re-map colums in serializeRowBody if they haven't changed
> ([link|https://github.com/apache/cassandra/commit/c0f08ee437f5d468f83ff6a1c952182bc5b156a4])
> # add flushing iterator without column filtering
> ([link|https://github.com/apache/cassandra/commit/0f77206334320815dc80622122b40dc2f5c3f6fd])
> # split call sites in MetadataCollector.update(Cell<?> cell) to improve cell
> methods inlining and use monomorphic calls
> ([link|https://github.com/apache/cassandra/commit/7fa2bf3ba4e11a2c4e7be421aba1295fb6738f18])
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]