64 helper

Kuan-Wei Chiu Sun, 23 Mar 2025 08:16:52 -0700

On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote:
> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.no...@gmail.com> 
> > > > wrote:
> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight 
> > > > >> > <david.laight.li...@gmail.com> wrote:
> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
> > > > >> > >"H. Peter Anvin" <h...@zytor.com> wrote:
> > > > >> > >
> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper 
> > > > >> > >> <andrew.coop...@citrix.com> wrote:
> > > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
> > > > >> > >> >
> > > > >> > >> >That's not technically correct any more.
> > > > >> > >> >
> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit 
> > > > >> > >> >patterns
> > > > >> > >> >other than 0 and 1.
> > > > >> > >> >
> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
> > > > >> > >> >
> > > > >> > >> >~Andrew  
> > > > >> > >> 
> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux 
> > > > >> > >> kernel using them) but
> > > > >> > >> for compiler-generated conversations that's still a given, or 
> > > > >> > >> the manager isn't C
> > > > >> > >> or anything even remotely like it.
> > > > >> > >> 
> > > > >> > >
> > > > >> > >The whole idea of 'bool' is pretty much broken by design.
> > > > >> > >The underlying problem is that values other than 'true' and 
> > > > >> > >'false' can
> > > > >> > >always get into 'bool' variables.
> > > > >> > >
> > > > >> > >Once that has happened it is all fubar.
> > > > >> > >
> > > > >> > >Trying to sanitise a value with (say):
> > > > >> > >int f(bool v)
> > > > >> > >{
> > > > >> > >  return (int)v & 1;
> > > > >> > >}    
> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> > > > >> > >
> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
> > > > >> > >What happens if the value is wrong? a trap or exception?, good 
> > > > >> > >luck recovering
> > > > >> > >from that.
> > > > >> > >
> > > > >> > >  David
> > > > >> > 
> > > > >> > Did you just discover GIGO?
> > > > >> 
> > > > >> Thanks for all the suggestions.
> > > > >> 
> > > > >> I don't have a strong opinion on the naming or return type. I'm 
> > > > >> still a
> > > > >> bit confused about whether I can assume that casting bool to int 
> > > > >> always
> > > > >> results in 0 or 1.
> > > > >> 
> > > > >> If that's the case, since most people prefer bool over int as the
> > > > >> return type and some are against introducing u1, my current plan is 
> > > > >> to
> > > > >> use the following in the next version:
> > > > >> 
> > > > >> bool parity_odd(u64 val);
> > > > >> 
> > > > >> This keeps the bool return type, renames the function for better
> > > > >> clarity, and avoids extra maintenance burden by having just one
> > > > >> function.
> > > > >> 
> > > > >> If I can't assume that casting bool to int always results in 0 or 1,
> > > > >> would it be acceptable to keep the return type as int?
> > > > >> 
> > > > >> Would this work for everyone?
> > > > >
> > > > >Alright, it's clearly a split opinion. So what I would do myself in
> > > > >such case is to look at existing code and see what people who really
> > > > >need parity invent in their drivers:
> > > > >
> > > > >                                     bool      parity_odd
> > > > >static inline int parity8(u8 val)       -               -
> > > > >static u8 calc_parity(u8 val)           -               -
> > > > >static int odd_parity(u8 c)             -               +
> > > > >static int saa711x_odd_parity           -               +
> > > > >static int max3100_do_parity            -               -
> > > > >static inline int parity(unsigned x)    -               -
> > > > >static int bit_parity(u32 pkt)          -               -
> > > > >static int oa_tc6_get_parity(u32 p)     -               -
> > > > >static u32 parity32(__le32 data)        -               -
> > > > >static u32 parity(u32 sample)           -               -
> > > > >static int get_parity(int number,       -               -
> > > > >                      int size)
> > > > >static bool i2cr_check_parity32(u32 v,  +               -
> > > > >                        bool parity)
> > > > >static bool i2cr_check_parity64(u64 v)  +               -
> > > > >static int sw_parity(__u64 t)           -               -
> > > > >static bool parity(u64 value)           +               -
> > > > >
> > > > >Now you can refer to that table say that int parity(uXX) is what
> > > > >people want to see in their drivers.
> > > > >
> > > > >Whichever interface you choose, please discuss it's pros and cons.
> > > > >What bloat-o-meter says for each option? What's maintenance burden?
> > > > >Perf test? Look at generated code?
> > > > >
> > > > >I personally for a macro returning boolean, something like I
> > > > >proposed at the very beginning.
> > > > >
> > > > >Thanks,
> > > > >Yury
> > > > 
> > > > Also, please at least provide a way for an arch to opt in to using the 
> > > > builtins, which seem to produce as good results or better at least on 
> > > > some architectures like x86 and probably with CPU options that imply 
> > > > fast popcnt is available.
> > > 
> > > Yeah. And because linux/bitops.h already includes asm/bitops.h
> > > the simplest way would be wrapping generic implementation with
> > > the #ifndef parity, similarly to how we handle find_next_bit case.
> > > 
> > > So:
> > > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
> > > 2. This may, and probably should, be a separate follow-up series,
> > >    likely created by corresponding arch experts.
> > > 
> > I saw discussions in the previous email thread about both
> > __builtin_parity and x86-specific implementations. However, from the
> > discussion, I learned that before considering any optimization, we
> > should first ask: which driver or subsystem actually cares about parity
> > efficiency? If someone does, I can help with a micro-benchmark to
> > provide performance numbers, but I don't have enough domain knowledge
> > to identify hot paths where parity efficiency matters.
> > 
> IMHO,
> 
> If parity is never used in any hot path and we don't care about parity:
> 
> Then benchmarking its performance seems meaningless. In this case, a
> function with a u64 argument would suffice, and we might not even need
> a macro to optimize for different types—especially since the macro
> requires special hacks to avoid compiler warnings. Also, I don't think
> code size matters here. If it does, we should first consider making
> parity a non-inline function in a .c file rather than an inline
> function/macro in a header.
> 
> If parity is used in a hot path:
> 
> We need different handling for different type sizes. As previously
> discussed, x86 assembly might use different instructions for u8 and
> u16. This may sound stubborn, but I want to ask again: should we
> consider using parity8/16/32/64 interfaces? Like in the i3c driver
> example, if we only have a single parity macro that selects an
> implementation based on type size, users must explicitly cast types.
> If future users also need parity in a hot path, they might not be aware
> of this requirement and end up generating suboptimal code. Since we
> care about efficiency and generated code, why not follow hweight() and
> provide separate implementations for different sizes?
> 
It seems no one will reply to my two emails. So, I have summarized
different interface approaches. If there is a next version, I will send
it after the merge window closes.


Interface 1: Single Function
Description: bool parity_odd(u64)
Pros: Minimal maintenance cost
Cons: Difficult to integrate with architecture-specific implementations
      due to the inability to optimize for different argument sizes
Opinions: Jiri supports this approach

Interface 2: Single Macro
Description: parity_odd() macro
Pros: Allows type-specific implementation
Cons: Requires hacks to avoid warnings; users may need explicit
      casting; potential sub-optimal code on 32-bit x86
Opinions: Yury supports this approach

Interface 3: Multiple Functions
Description: bool parity_odd8/16/32/64()
Pros: No need for explicit casting; easy to integrate
      architecture-specific optimizations; except for parity8(), all
      functions are one-liners with no significant code duplication
Cons: More functions may increase maintenance burden
Opinions: Only I support this approach

Regards,
Kuan-Wei

Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

Reply via email to