[
https://issues.apache.org/jira/browse/CASSANDRA-16360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771183#comment-17771183
]
Maxim Muzafarov commented on CASSANDRA-16360:
---------------------------------------------
[~samt] Hello,
I have prepared a patch to replace the CRC32 algorithm with the more efficient
CRC32C. This will only be used for a message payload in encoders and decoders.
I suggest keeping the CRC24 algorithm for headers as it is used now (see the
details below). New benchmarks have also been added to prove that the changes
are worthwhile.
The implementation consists of the following parts:
- CASSANDRA-18780 with a fix for ConnectionBurnTest which is used for the
patch verification;
- and a code that changes CRC32 to CRC32C;
Can you take a look at these changes?
[https://github.com/apache/cassandra/pull/2647]
h3. Implementation Plan
As you mentioned above and in the [Drift backwards compatibility
#2|https://lists.apache.org/thread/2h0xtmf6ygqyf89s5hjy65185pplkh43] thread on
the dev-list the difference between server-server and client-server protocols,
so this patch focuses only on the internode communication protocol, which seems
easier for us to change. Replacing the CRC algorithm in the client-server
communication will probably require a bump to up the native protocol version,
or as I suggested [Drift backwards compatibility
#1|https://lists.apache.org/thread/t977l7b4x78zkcr24rn16j8v0jmhrmk5] we can try
to implement the feature flags for it, but still, this thing needs a discussion
and it is better to move it to another issue.
So, my plan here is:
- We replace the CRC for the internode messaging protocol in this issue;
- We create an issue to investigate possible solutions to fix the exact
replacement in the SimpleClient and other drivers (this is related to
client-server communication that we have now);
h3. Current solution: Replace CRC32 for the payload only
Benedict mentioned that we can also try to get rid of CRC24 and replace it with
CRC32C. I think this is not possible and it is not worth it for the following
reasons.
h4. Benchmarks haven't shown any valuable difference.
I have implemented the prototype
[cassandra-16360-poc|https://github.com/apache/cassandra/compare/trunk...Mmuzaf:cassandra:cassandra-16360-poc#diff-3e4dbb556bbe3df9b3061a15c96c6be539727cb8788d5274e597425b71c0c58aR62]
that replaces both CRC24 and CRC32 in an encoder and its corresponding decoder.
So, I have prepared and measured
[here|https://gist.github.com/Mmuzaf/349c5f8dbdfada82e5199f36cbfc15b4] a few
combinations of encoder-decoder pairs:
- CRC24 + CRC32 (current)
- CRC24 + CRC32C (change for payload)
- CRC32C + CRC32C (a new implementation)
So, according to benchmarks, we can see a valuable difference between pairs
[CRC24 + CRC32] and [CRC32C + CRC32C] and no difference between [CRC24 +
CRC32C] and [CRC32C + CRC32C]. This leads me to think that calculating CRC24
over an 8-byte header makes a really small contribution to the total time.
||CRC24 + CRC32||CRC32C + CRC32C||
|1109.830 ± 111.170 ns/op|863.869 ± 20.206 ns/op|
||CRC24 + CRC32C||CRC32C + CRC32C||
|1039.133 ± 61.430 ns/op|1090.281 ± 97.314 ns/op|
h4. Replacing CRC24 makes the header longer than 8 bytes
So, for the LZ4 encoder and decoder replacing the CRC24 algorithm (3 bytes)
with the CRC32C algorithm (4 bytes) makes the header itself longer than 8
bytes. Currently, we use one hop ({{{}readLong{}}}) to read the whole message
header. If the message header is longer than 8 bytes, then the efficiency of
the replacement itself is not worthwhile because, as it requires two reads
instead of one for almost no good reason.
h4. Patch complexity and code duplication
It is worth noting that adding another implementation of frame decoders and
frame encoders is much more difficult and time-consuming to verify on the
reviewer's side, thus reducing the chance that the patch will be reviewed :)
h3. Benchmarks
I have included the FramingBench, which measures encoding and decoding
operations, in the patch to verify that we are on the right track.
{code:java}
[java] Benchmark (payloadSize) (type) Mode Cnt Score
Error Units
[java] FramingBench.decode 2037 CRC avgt 4 382.764
± 75.327 ns/op
[java] FramingBench.decode 2037 CRC32C avgt 4 272.388
± 63.618 ns/op
[java] FramingBench.decode 2037 LZ4 avgt 4 377.231
± 20.023 ns/op
[java] FramingBench.decode 2037 LZ4_CRC32C avgt 4 278.301
± 28.004 ns/op
[java] FramingBench.decode 16373 CRC avgt 4 1324.664
± 39.001 ns/op
[java] FramingBench.decode 16373 CRC32C avgt 4 1318.600
± 102.559 ns/op
[java] FramingBench.decode 16373 LZ4 avgt 4 1292.971
± 66.381 ns/op
[java] FramingBench.decode 16373 LZ4_CRC32C avgt 4 1283.954
± 87.084 ns/op
[java] FramingBench.encode 2037 CRC avgt 4 382.130
± 12.318 ns/op
[java] FramingBench.encode 2037 CRC32C avgt 4 311.646
± 17.114 ns/op
[java] FramingBench.encode 2037 LZ4 avgt 4 2126.181
± 287.391 ns/op
[java] FramingBench.encode 2037 LZ4_CRC32C avgt 4 2026.204
± 219.598 ns/op
[java] FramingBench.encode 16373 CRC avgt 4 1311.842
± 102.518 ns/op
[java] FramingBench.encode 16373 CRC32C avgt 4 1310.921
± 139.089 ns/op
[java] FramingBench.encode 16373 LZ4 avgt 4 6818.232
± 200.489 ns/op
[java] FramingBench.encode 16373 LZ4_CRC32C avgt 4 6499.657
± 618.240 ns/op
{code}
[https://gist.github.com/Mmuzaf/a7ac6f5759c16ee24683856704e7c941]
> CRC32 is inefficient on x86
> ---------------------------
>
> Key: CASSANDRA-16360
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16360
> Project: Cassandra
> Issue Type: Improvement
> Components: Messaging/Client
> Reporter: Avi Kivity
> Assignee: Maxim Muzafarov
> Priority: Normal
> Labels: protocolv6
> Fix For: 5.x
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The client/server protocol specifies CRC24 and CRC32 as the checksum
> algorithm (cql_protocol_V5_framing.asc). Those however are expensive to
> compute; this affects both the client and the server.
>
> A better checksum algorithm is CRC32C, which has hardware support on x86 (as
> well as other modern architectures).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]