I've been sketching on an efficient solution to this. What about something along the way below? I've run it with the combinations of: even buf, even len even buf, odd len odd buf, even len odd buf, odd len
and it seems to give the same results as the older version of __rte_raw_cksum, before 21.03. I ran it without optimizations and such to ensure the compiler didn't insert vector instructions and such so the results were comparable. static inline uint32_t __rte_raw_cksum_newest(const void *buf, size_t len, uint32_t sum) { const uint8_t *end = buf + len; uint32_t sum_even = 0; for (const uint8_t *p = buf + 1; p < end; p += 2) { sum_even += *p; } sum += sum_even << 8; uint32_t sum_odd = 0; for (const uint8_t *p = buf; p < end; p += 2) { sum_odd += *p; } sum += sum_odd; return sum; } /Emil -----Original Message----- From: Emil Berg Sent: den 16 juni 2022 07:45 To: Morten Brørup <m...@smartsharesystems.com>; bugzi...@dpdk.org Cc: dev@dpdk.org Subject: RE: [Bug 1035] __rte_raw_cksum() crash with misaligned pointer Hi! We want the B option, i.e. the 6 bytes option. Perhaps adding alignment detection to __rte_raw_cksum() is a good idea. A minor comment but I think buf & 1 won't work since buf isn't an integral type, but something along that way. I'm starting to think about an efficient way to do this. Thank you! -----Original Message----- From: Morten Brørup <m...@smartsharesystems.com> Sent: den 15 juni 2022 16:41 To: Emil Berg <emil.b...@ericsson.com>; bugzi...@dpdk.org Cc: dev@dpdk.org Subject: RE: [Bug 1035] __rte_raw_cksum() crash with misaligned pointer > From: bugzi...@dpdk.org [mailto:bugzi...@dpdk.org] > Sent: Wednesday, 15 June 2022 09.16 > > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-45444 > 5555731-2e92ae6bf759c0c5&q=1&e=b3fc70af-5d37-4ffb-b34d-9a51927f5f6d&u= > https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D1035 > > Bug ID: 1035 > Summary: __rte_raw_cksum() crash with misaligned pointer > Product: DPDK > Version: 21.11 > Hardware: All > OS: All > Status: UNCONFIRMED > Severity: normal > Priority: Normal > Component: ethdev > Assignee: dev@dpdk.org > Reporter: emil.b...@ericsson.com > Target Milestone: --- > > See rte_raw_cksum() in rte_ip.h, which is part of the public API. See > also the subfunction __rte_raw_cksum(). > > _rte_raw_cksum assumes that the buffer over which the checksum is > calculated is an even address (divisible by two). See for example this > stack overflow > post: > https://stackoverflow.com/questions/46790550/c-undefined-behavior- > strict-aliasing-rule-or-incorrect-alignment > > The post explains that there is undefined behavior in C11 when > "conversion between two pointer types produces a result that is > incorrectly aligned". When the buf argument starts on an odd address > we thus have undefined behavior, since a pointer is cast from void* to > uint16_t*. > > In most cases (at least on x86) that isn't a problem, but with higher > optimization levels it may break due to vector instructions. This new > function seems to be easier to optimize by the compiler, resulting in > a crash when the buf argument is odd. Please note that the undefined > behavior is present in earlier versions of dpdk as well. > > Now you're probably thinking: "Just align your buffers". The problem > is that we have a packet buffer which is aligned. The checksum is > calculated on a subset of that aligned packet buffer, and that > sometimes lies on odd addresses. > > The question remains if this is an issue with dpdk or not. I can imagine other systems doing what you describe too. So it needs to be addressed. Off the top of my head, an easy fix would be updating __rte_raw_cksum() like this: static inline uint32_t __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { if (likely((buf & 1) == 0)) { /* The buffer is 16 bit aligned. */ Keep the existing, optimized implementation here. } else { /* The buffer is not 16 bit aligned. */ Add a new odd-buf tolerant implementation here. } } However, I'm not sure that it covers your scenario! The checksum is 16 bit wide, so if you calculate the checksum of e.g. 4 bytes of memory starting at offset 1 in a 6 byte packet buffer, the memory block can be treated as either 4 or 6 bytes relative to the data covered by the checksum, i.e.: A: XX [01 02] [03 04] XX --> cksum = [04 06] B: [XX 01] [02 03] [04 XX] --> cksum = [06 04] Which one do you need? Perhaps an additional function is required to support your use case, and the documentation for rte_raw_cksum() and __rte_raw_cksum() needs to reflect that the buffer must be 16 bit aligned. Or the rte_raw_cksum() function can be modified to support an odd buffer pointer as outlined above, with documentation added about alignment of the running checksum.