[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 11:44, Chris Wilson wrote:
> 
> Now I feel silly. Looking at the .s, there is no difference with the
> addition of the barrier to clflush_cache_range(). And sure enough
> letting the test run for longer, we see a failure. I fell for a placebo.
> 
> The failing assertion is always on the last cacheline and is always one
> value behind. Oh well, back to wondering where we miss the flush.
> -Chris
> 

Could you include the assembly here?

-hpa



[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 13:54, Chris Wilson wrote:
> 
> Whilst you are looking at this asm, note that we reload
> boot_cpu_data.x86_cflush_size everytime around the loop. That's a small
> but noticeable extra cost (especially when we are only flushing a single
> cacheline).
> 

I did notice that; I don't know if this is a compiler failure to do an
obvious hoisting (which should then be merged with the other load) or a
consequence of the volatile.  It might be useful to report this to the
gcc bugzilla to let them look at it.

Either way, the diff looks good and you can add my:

Acked-by: H. Peter Anvin 

However, I see absolutely nothing wrong with the assembly other than
minor optimization possibilities: since gcc generates an early-out test
as well as a late-out test anyway, we could add an explicit:

if (p < vend)
return;

before the first mb() at no additional cost (assuming gcc is smart
enough to skip the second test; otherwise that can easily be done
manually by replacing the for loop with a do { } while loop.

I would be very interested in knowing if replacing the final clflushopt
with a clflush would resolve your problems (in which case the last mb()
shouldn't be necessary either.)

Something like:

void clflush_cache_range(void *vaddr, unsigned int size)
{
unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
char *vend = (char *)vaddr + size;
char *p = (char *)((unsigned long)vaddr & ~(clflush_size - 1);

if (p >= vend)
return;

mb();

for (; p < vend - clflush_size; p += clflush_size)
clflushopt(p);

clflush(p); /* Serializing and thus a barrier */
}



[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread H. Peter Anvin
On 01/07/16 14:29, H. Peter Anvin wrote:
> 
> I would be very interested in knowing if replacing the final clflushopt
> with a clflush would resolve your problems (in which case the last mb()
> shouldn't be necessary either.)
> 

Nevermind.  CLFLUSH is not ordered with regards to CLFLUSHOPT to the
same cache line.

Could you add a sync_cpu(); call to the end (can replace the final mb())
and see if that helps your case?

-hpa




[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-08 Thread H. Peter Anvin
On 01/07/16 14:32, H. Peter Anvin wrote:
>
> Nevermind.  CLFLUSH is not ordered with regards to CLFLUSHOPT to the
> same cache line.
> 

*Except* to the same cache line, dammit.

-hpa




[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread H. Peter Anvin
On January 11, 2016 3:28:01 AM PST, Chris Wilson  
wrote:
>On Sat, Jan 09, 2016 at 02:36:03PM -0800, Andy Lutomirski wrote:
>> On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson
> wrote:
>> > On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
>> >> On 01/07/16 14:29, H. Peter Anvin wrote:
>> >> >
>> >> > I would be very interested in knowing if replacing the final
>clflushopt
>> >> > with a clflush would resolve your problems (in which case the
>last mb()
>> >> > shouldn't be necessary either.)
>> >> >
>> >>
>> >> Nevermind.  CLFLUSH is not ordered with regards to CLFLUSHOPT to
>the
>> >> same cache line.
>> >>
>> >> Could you add a sync_cpu(); call to the end (can replace the final
>mb())
>> >> and see if that helps your case?
>> >
>> > s/sync_cpu()/sync_core()/
>> >
>> > No. I still see failures on Baytrail and Braswell (Pineview is not
>> > affected) with the final mb() replaced with sync_core(). I can
>reproduce
>> > failures on Pineview by tweaking the clflush_cache_range()
>parameters,
>> > so I am fairly confident that it is validating the current code.
>> >
>> > iirc sync_core() is cpuid, a heavy serialising instruction, an
>> > alternative to mfence.  Is there anything that else I can infer
>about
>> > the nature of my bug from this result?
>> 
>> No clue, but I don't know much about the underlying architecture.
>> 
>> Can you try clflush_cache_ranging one cacheline less and then
>manually
>> doing clflushopt; mb on the last cache line, just to make sure that
>> the helper is really doing the right thing?  You could also try
>> clflush instead of clflushopt to see if that makes a difference.
>
>I had looked at increasing the range over which clflush_cache_range()
>runs (using roundup/rounddown by cache lines), but it took something
>like +/- 256 bytes to pass all the tests. And also did
>s/clflushopt/clflush/ to confirm that made no differnce.
>
>Bizarrely,
>
>diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>index 6000ad7..cf074400 100644
>--- a/arch/x86/mm/pageattr.c
>+++ b/arch/x86/mm/pageattr.c
>@@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned int
>size)
>for (; p < vend; p += clflush_size)
>clflushopt(p);
> 
>+   clflushopt(vend-1);
>mb();
> }
> EXPORT_SYMBOL_GPL(clflush_cache_range);
>
>works like a charm.
>-Chris

That clflushopt touches a cache line already touched and therefore serializes 
with it.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.


[REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-19 Thread H. Peter Anvin
On 06/19/2014 01:01 PM, Greg KH wrote:
> On Thu, Jun 19, 2014 at 09:15:36PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 19, 2014 at 7:00 PM, Greg KH  
>> wrote:
>> + BUG_ON(f1->context != f2->context);
>
> Nice, you just crashed the kernel, making it impossible to debug or
> recover :(

 agreed, that should probably be 'if (WARN_ON(...)) return NULL;'

 (but at least I wouldn't expect to hit that under console_lock so you
 should at least see the last N lines of the backtrace on the screen
 ;-))
>>>
>>> Lots of devices don't have console screens :)
>>
>> Aside: This is a pet peeve of mine and recently I've switched to
>> rejecting all patch that have a BUG_ON, period.
> 
> Please do, I have been for a few years now as well for the same reasons
> you cite.
> 

I'm actually concerned about this trend.  Downgrading things to WARN_ON
can allow a security bug in the kernel to continue to exist, for
example, or make the error message disappear.

I am wondering if the right thing here isn't to have a user (command
line?) settable policy as to how to proceed on an assert violation,
instead of hardcoding it at compile time.

-hpa




[PATCH] drm: Missed clflushopt in drm_clflush_virt_range

2014-05-15 Thread H. Peter Anvin
On 05/15/2014 05:38 AM, Daniel Vetter wrote:
> On Wed, May 14, 2014 at 09:41:12AM -0600, Ross Zwisler wrote:
>> With this commit:
>>
>> 2a0788dc9bc4 x86: Use clflushopt in drm_clflush_virt_range
>>
>> If clflushopt is available on the system, we use it instead of clflush
>> in drm_clflush_virt_range.  There were two calls to clflush in this
>> function, but only one was changed to clflushopt.  This patch changes
>> the other clflush call to clflushopt.
>>
>> Signed-off-by: Ross Zwisler 
>> Reported-by: Matthew Wilcox 
>>
>> Cc: David Airlie 
>> Cc: dri-devel at lists.freedesktop.org
>> Cc: H Peter Anvin 
>> Cc: Ingo Molnar 
>> Cc: Thomas Gleixner 
> 
> Picked to my topic/core-stuff drm branch so it doesn't get lost.
> -Daniel
> 

Does this mean you're picking this up, or do you want me to put it into
-tip?

-hpa




Re: [PATCH v6 7/7] x86/vmware: Add TDX hypercall support

2024-01-22 Thread H. Peter Anvin
On January 22, 2024 8:32:22 AM PST, Dave Hansen  wrote:
>On 1/9/24 00:40, Alexey Makhalov wrote:
>> +#ifdef CONFIG_INTEL_TDX_GUEST
>> +unsigned long vmware_tdx_hypercall(unsigned long cmd,
>> +   struct tdx_module_args *args)
>> +{
>> +if (!hypervisor_is_type(X86_HYPER_VMWARE))
>> +return ULONG_MAX;
>> +
>> +if (cmd & ~VMWARE_CMD_MASK) {
>> +pr_warn_once("Out of range command %lx\n", cmd);
>> +return ULONG_MAX;
>> +}
>> +
>> +args->r10 = VMWARE_TDX_VENDOR_LEAF;
>> +args->r11 = VMWARE_TDX_HCALL_FUNC;
>> +args->r12 = VMWARE_HYPERVISOR_MAGIC;
>> +args->r13 = cmd;
>> +args->r15 = 0; /* CPL */
>> +
>> +__tdx_hypercall(args);
>> +
>> +return args->r12;
>> +}
>> +EXPORT_SYMBOL_GPL(vmware_tdx_hypercall);
>> +#endif
>
>This is the kind of wrapper that I was hoping for.  Thanks.
>
>Acked-by: Dave Hansen 
>

I'm slightly confused by this TBH.

Why are the arguments passed in as a structure, which is modified by the 
wrapper to boot? This is analogous to a system call interface.

Furthermore, this is an out-of-line function; it should never be called with 
!X86_HYPER_VMWARE or you are introducing overhead for other hypervisors; I 
believe a pr_warn_once() is in order at least, just as you have for the 
out-of-range test.





Re: [PATCH v6 7/7] x86/vmware: Add TDX hypercall support

2024-01-22 Thread H. Peter Anvin
On January 22, 2024 4:04:33 PM PST, Alexey Makhalov 
 wrote:
>
>
>On 1/22/24 10:28 AM, H. Peter Anvin wrote:
>> On January 22, 2024 8:32:22 AM PST, Dave Hansen  
>> wrote:
>>> On 1/9/24 00:40, Alexey Makhalov wrote:
>>>> +#ifdef CONFIG_INTEL_TDX_GUEST
>>>> +unsigned long vmware_tdx_hypercall(unsigned long cmd,
>>>> + struct tdx_module_args *args)
>>>> +{
>>>> +  if (!hypervisor_is_type(X86_HYPER_VMWARE))
>>>> +  return ULONG_MAX;
>>>> +
>>>> +  if (cmd & ~VMWARE_CMD_MASK) {
>>>> +  pr_warn_once("Out of range command %lx\n", cmd);
>>>> +  return ULONG_MAX;
>>>> +  }
>>>> +
>>>> +  args->r10 = VMWARE_TDX_VENDOR_LEAF;
>>>> +  args->r11 = VMWARE_TDX_HCALL_FUNC;
>>>> +  args->r12 = VMWARE_HYPERVISOR_MAGIC;
>>>> +  args->r13 = cmd;
>>>> +  args->r15 = 0; /* CPL */
>>>> +
>>>> +  __tdx_hypercall(args);
>>>> +
>>>> +  return args->r12;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vmware_tdx_hypercall);
>>>> +#endif
>>> 
>>> This is the kind of wrapper that I was hoping for.  Thanks.
>>> 
>>> Acked-by: Dave Hansen 
>>> 
>> 
>> I'm slightly confused by this TBH.
>> 
>> Why are the arguments passed in as a structure, which is modified by the 
>> wrapper to boot? This is analogous to a system call interface.
>> 
>> Furthermore, this is an out-of-line function; it should never be called with 
>> !X86_HYPER_VMWARE or you are introducing overhead for other hypervisors; I 
>> believe a pr_warn_once() is in order at least, just as you have for the 
>> out-of-range test.
>> 
>
>This patch series introduces vmware_hypercall family of functions similar to 
>kvm_hypercall. Similarity: both vmware and kvm implementations are static 
>inline functions and both of them use __tdx_hypercall (global not exported 
>symbol). Difference: kvm_hypercall functions are used _only_ within the 
>kernel, but vmware_hypercall are also used by modules.
>Exporting __tdx_hypercall function is an original Dave's concern.
>So we ended up with exporting wrapper, not generic, but VMware specific with 
>added checks against arbitrary use.
>vmware_tdx_hypercall is not designed for !X86_HYPER_VMWARE callers. But such a 
>calls are not forbidden.
>Arguments in a structure is an API for __tdx_hypercall(). Input and output 
>argument handling are done by vmware_hypercall callers, while VMware specific 
>dress up is inside the wrapper.
>
>Peter, do you think code comments are required to make it clear for the reader?
>
>

TBH that explanation didn't make much sense to me...


Re: [PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-23 Thread H. Peter Anvin
We should clear this bit presumably on switching either from or to 512-char 
mode, since the bit doesn't really make sense either way.

Dave Airlie  wrote:

>From: Dave Airlie 
>
>When we switch from 256->512 byte font rendering mode, it means the
>current contents of the screen is being reinterpreted. The bit that
>holds
>the high bit of the 9-bit font, may have been previously set, and thus
>the new font misrenders.
>
>The problem case we see is grub2 writes spaces with the bit set, so it
>ends up with data like 0x820, which gets reinterpreted into 0x120 char
>which the font translates into G with a circumflex. This flashes up on
>screen at boot and is quite ugly.
>
>A current side effect of this patch though is that any rendering on the
>screen changes color to a slightly darker color, but at least the
>screen
>no longer corrupts.
>
>Signed-off-by: Dave Airlie 
>---
> drivers/tty/vt/vt.c|  2 +-
> drivers/video/console/vgacon.c | 19 ---
> include/linux/vt_kern.h|  1 +
> 3 files changed, 14 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
>index 8fd8968..c8067ae 100644
>--- a/drivers/tty/vt/vt.c
>+++ b/drivers/tty/vt/vt.c
>@@ -638,7 +638,7 @@ static inline void save_screen(struct vc_data *vc)
>  *Redrawing of screen
>  */
> 
>-static void clear_buffer_attributes(struct vc_data *vc)
>+void clear_buffer_attributes(struct vc_data *vc)
> {
>   unsigned short *p = (unsigned short *)vc->vc_origin;
>   int count = vc->vc_screenbuf_size / 2;
>diff --git a/drivers/video/console/vgacon.c
>b/drivers/video/console/vgacon.c
>index d449a74..271b5d0 100644
>--- a/drivers/video/console/vgacon.c
>+++ b/drivers/video/console/vgacon.c
>@@ -1064,7 +1064,7 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
>   unsigned short video_port_status = vga_video_port_reg + 6;
>   int font_select = 0x00, beg, i;
>   char *charmap;
>-  
>+  bool clear_attribs = false;
>   if (vga_video_type != VIDEO_TYPE_EGAM) {
>   charmap = (char *) VGA_MAP_MEM(colourmap, 0);
>   beg = 0x0e;
>@@ -1169,12 +1169,6 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
> 
>   /* if 512 char mode is already enabled don't re-enable it. */
>   if ((set) && (ch512 != vga_512_chars)) {
>-  /* attribute controller */
>-  for (i = 0; i < MAX_NR_CONSOLES; i++) {
>-  struct vc_data *c = vc_cons[i].d;
>-  if (c && c->vc_sw == &vga_con)
>-  c->vc_hi_font_mask = ch512 ? 0x0800 : 0;
>-  }
>   vga_512_chars = ch512;
>   /* 256-char: enable intensity bit
>  512-char: disable intensity bit */
>@@ -1185,8 +1179,19 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
>  it means, but it works, and it appears necessary */
>   inb_p(video_port_status);
>   vga_wattr(state->vgabase, VGA_AR_ENABLE_DISPLAY, 0);
>+  clear_attribs = true;
>   }
>   raw_spin_unlock_irq(&vga_lock);
>+
>+  if (clear_attribs) {
>+  for (i = 0; i < MAX_NR_CONSOLES; i++) {
>+  struct vc_data *c = vc_cons[i].d;
>+  if (c && c->vc_sw == &vga_con) {
>+  clear_buffer_attributes(c);
>+  c->vc_hi_font_mask = ch512 ? 0x0800 : 0;
>+  }
>+  }
>+  }
>   return 0;
> }
> 
>diff --git a/include/linux/vt_kern.h b/include/linux/vt_kern.h
>index 50ae7d0..1f55665 100644
>--- a/include/linux/vt_kern.h
>+++ b/include/linux/vt_kern.h
>@@ -47,6 +47,7 @@ int con_set_cmap(unsigned char __user *cmap);
> int con_get_cmap(unsigned char __user *cmap);
> void scrollback(struct vc_data *vc, int lines);
> void scrollfront(struct vc_data *vc, int lines);
>+void clear_buffer_attributes(struct vc_data *vc);
>void update_region(struct vc_data *vc, unsigned long start, int count);
> void redraw_screen(struct vc_data *vc, int is_switch);
> #define update_screen(x) redraw_screen(x, 0)

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
The characters will morph anyway, it is just a matter off having them randomly 
scream with the intensity bit.

512-character mode is definitely useful... we get much wider language coverage 
with 512 than with 256, which is why most distros use a 512 console font.

Dave Airlie  wrote:

>On Thu, Jan 24, 2013 at 3:20 PM, H. Peter Anvin  wrote:
>> We should clear this bit presumably on switching either from or to
>512-char mode, since the bit doesn't really make sense either way.
>
>Yeah the only problem going from 512-char is that chars above 256 will
>morph into garbage chars below 256, like ctrl chars, I don't really
>want to face into doing some sort of safe translation out of 512-char
>mode, from reading around the net, 512 char mode doesn't seem all that
>brilliantly useful.
>
>Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
It would be nice to support more than 512 characters when using a graphical 
console.  We should be able to support up to 2048 at least.

Dave Airlie  wrote:

>On Thu, Jan 24, 2013 at 10:18 PM, H. Peter Anvin  wrote:
>> The characters will morph anyway, it is just a matter off having them
>randomly scream with the intensity bit.
>>
>> 512-character mode is definitely useful... we get much wider language
>coverage with 512 than with 256, which is why most distros use a 512
>console font.
>
>Yeah really use a graphics console :-)
>
>But I've sent a v2 patch that should clear the attribute space going
>both directions.
>
>Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread H. Peter Anvin
On 03/07/2013 09:28 PM, Tejun Heo wrote:
> On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu  wrote:
>> They are not using memblock_find_in_range(), so 1ULL<< will not help.
>>
>> Really hope i915 drm guys could clean that hacks.
> 
> The code isn't being used.  Just leave it alone.  Maybe add a comment.
>  The change is just making things more confusing.
> 

Indeed, but...

Daniel: can you guys clean this up or can we just remove the #if 0 clause?

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-11 Thread H. Peter Anvin
The problem is that the code will be broken, and so it makes no sense.  The #if 
0 is really confusing.

Daniel Vetter  wrote:

>On Thu, Mar 07, 2013 at 10:09:26PM -0800, H. Peter Anvin wrote:
>> On 03/07/2013 09:28 PM, Tejun Heo wrote:
>> > On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu 
>wrote:
>> >> They are not using memblock_find_in_range(), so 1ULL<< will not
>help.
>> >>
>> >> Really hope i915 drm guys could clean that hacks.
>> > 
>> > The code isn't being used.  Just leave it alone.  Maybe add a
>comment.
>> >  The change is just making things more confusing.
>> > 
>> 
>> Indeed, but...
>> 
>> Daniel: can you guys clean this up or can we just remove the #if 0
>clause?
>
>I guess we could just put this into a comment explaining where stolen
>memory for the gfx devices is at on gen2. But tbh I don't mind if we
>just
>keep the #if 0 code around. For all newer platforms we can get at that
>offset through mch bar registers, so I don't really care.
>-Daniel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


MTRR use in drivers

2013-06-20 Thread H. Peter Anvin
An awful lot of drivers, mostly DRI drivers, are still mucking with
MTRRs directly as opposed to using ioremap_wc() or similar interfaces.
In addition to the architecture dependency, this is really undesirable
because MTRRs are a limited resource, whereas page table attributes are not.

Furthermore, this perpetuates the need for the horrific hack known as
"MTRR cleanup".

What, if anything, can we do to clean up this mess?

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
Why do you care about performance when PAT is disabled?

Brice Goglin  wrote:

>Le 21/06/2013 07:00, H. Peter Anvin a écrit :
>> An awful lot of drivers, mostly DRI drivers, are still mucking with
>> MTRRs directly as opposed to using ioremap_wc() or similar
>interfaces.
>> In addition to the architecture dependency, this is really
>undesirable
>> because MTRRs are a limited resource, whereas page table attributes
>are not.
>>
>> Furthermore, this perpetuates the need for the horrific hack known as
>> "MTRR cleanup".
>>
>> What, if anything, can we do to clean up this mess?
>>
>>  -hpa
>>
>
>The first network driver that used ioremap_wc() back in 2008 (myri10ge)
>had to keep using MTRR because ioremap_wc() silently falls back to
>ioremap_nocache() when PAT is disabled.
>
>I asked about this in https://lkml.org/lkml/2008/5/31/42 and there was
>some talk about putting the MTRR addition in the nocache fallback path
>but I guess nobody implemented the idea.
>
>Brice

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 12:29 PM, Henrique de Moraes Holschuh wrote:
> On Sun, 23 Jun 2013, H. Peter Anvin wrote:
>> Why do you care about performance when PAT is disabled?
> 
> It will regress already slow boxes.  We blacklist a LOT of P4s, PMs, etc and
> nobody ever took the pain to track down which ones of those actually have
> PAT+MTRR aliasing bugs.
> 
> These boxes have boards like the Radeon X300, which needs either PAT or MTRR
> to not become unusable...
> 

We're talking hardware which is now many years old, but this is causing
very serious problems on real, modern hardware.  As far as I understand
it, too, the blacklisting was precautionary (the only bug that I
personally know about is a performance bug, where WC would be
incorrectly converted to UC.)

We need a way forward here.  If it is the only way I think we would have
to sacrifice the old machines, but perhaps something can be worked out
(e.g. if PAT is disabled, fall back to MTRRs if available for ioremap_wc()).

-hpa

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:30 PM, Dave Airlie wrote:
 Why do you care about performance when PAT is disabled?
> 
> breaking old boxes just because, is just going to get reverted when I
> get the first regression report that you broke old boxes.
> 

Not "just because", but *if* the choice is between breaking old boxes
and breaking new boxes I'll take the latter.

> Andy Lutomirski just submitted a bunch of patches to clean up the DRM
> usage of mtrrs, they are in drm-next, afaik we no longer add them on
> PAT systems.

Fantastic news.  No issue, then, and no need to break anything.

The only problem I see with having ioremap_wc() installing an MTRR on
non-PAT, rather than pushing that into the drivers which is clearly not
the right thing, is that we will need a hook to uninstall it when the
mapping is destroyed.

-hpa



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:54 PM, Dave Airlie wrote:
>>> breaking old boxes just because, is just going to get reverted when I
>>> get the first regression report that you broke old boxes.
>>>
>>
>> Not "just because", but *if* the choice is between breaking old boxes
>> and breaking new boxes I'll take the latter.
>>
> 
> But Linus won't so your choice doesn't matter.

I hate to break it to you, but we regress on ancient hardware all the
time.  Optimization work gets done on modern machines, so the sweet spot
keeps moving.  In particular, if supporting ancient hardware means
leaving a lot of performance on modern hardware on the table, we may
have to take that penalty.

Fortunately, most of the time we don't have to.

-hpa



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
> 
> And as far as I could find from Intel's not-that-complete public
> "specification updates", we are applying the errata workaround to a few more
> processors than strictly required, but since I have no idea how to write a
> test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor can I get
> ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and report
> back.
> 

Which specific erratum are you referring to, here?  The "WC becomes UC"
erratum?  I don't think there is a sane testcase for it since it needs a
very complicated setup to trigger.

-hpa

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
The aliasing doesn't matter for Linux because we map the high and low half the 
same.

Henrique de Moraes Holschuh  wrote:

>On Sun, 23 Jun 2013, H. Peter Anvin wrote:
>> On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
>> > 
>> > And as far as I could find from Intel's not-that-complete public
>> > "specification updates", we are applying the errata workaround to a
>few more
>> > processors than strictly required, but since I have no idea how to
>write a
>> > test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor
>can I get
>> > ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and
>report
>> > back.
>> 
>> Which specific erratum are you referring to, here?  The "WC becomes
>UC"
>> erratum?  I don't think there is a sane testcase for it since it
>needs a
>> very complicated setup to trigger.
>
>There are at least two different nasty PAT issues that are not always
>critical, and one that outright hangs the processor (if the unsupported
>aliasing of WB with UC/WC happens).
>
>Interestingly enough, most of the P4-Xeons and P4 do not appear to have
>the
>"WC becomes UC" errata.
>
>However, LOTS of P4, M-P4, Xeon PIII, Xeon, and Pentium M have a bug
>where
>the four highest entries in the PAT table are inactive (aliased to the
>four
>lowest entries) in mode B (PSE) and mode C (PAE) for 4k pages.  They
>work
>fine for large pages.
>
>Also, lots of them can hang if you ever alias WB with UC or WC (which
>is
>apparently an unsupported configuration anyway, or so it says in the
>errata).
>
>There are other weird aliasing nasties, such as one where you get
>memory
>corruption if you alias WB data with code (being accessed as UC or WC)
>in
>the same cacheline, and some stuff such as weirdness should the page
>table
>be on WC memory...
>
>I can track down most of the CPUIDs involved if you want, but someone
>from
>Intel would be better (I assume they actually have access to the errata
>documentation in some less idiotic way than reading a ton of badly
>indexed
>PDFs that take forever to find in their site).

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 00/14] Platform Framebuffers and SimpleDRM

2013-07-04 Thread H. Peter Anvin

On 07/04/2013 05:25 AM, David Herrmann wrote:

  - What FB formats are common on x86 that we should add to SIMPLEFB_FORMATS?
(other than ARGB/XRGB32)


The common pixel formats on x86 are:

- Palettized 4-bit planar (bigendian, i.e. MSB to the left)
- Palettized 8-bit packed (one byte per pixel)
- 16-bit RGB555 (16-bit littleendian words with R=14:10, G=9:5, B=4:0)
- 16-bit RGB565 (16-bit littleendian words with R=15:11, G=10:5, B=4:0)
- 24-bit RGB888 in littleendian order (first byte in memory is B,
  second is G, third is R)
- 32-bit ARGB (first byte in memory is B, second G, third R, fourth
  unused in the framebuffer proper)
- 32-bit RGB10:10:10 (I *believe* 32-bit littleendian words with
  R=29:20, G=19:10, B=9:0)

-hpa

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> 
> First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
> only a couple of patches and merged v2.6.38-rc4 in at every step. There
> was no failure found.
> Then I tried this again, but this time I merged v2.6.38-rc5 at every
> step and was successful. The bad commit in this branch turned out to be
> 
>   1a4a678b12c84db9ae5dce424e0e97f0559bb57c
> 
> which is related to memblock.
> 
> Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
> is needed to trigger the failure, so I used f005fe12b90c as a base,
> bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
> into the base and tested. Here the bad commit turned out to be
> 
>   e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
> 
> which is related to gart. It turned out that the gart aperture on that
> box is on another position with these patches. Before it was as
> 0xa400 and now it is at 0xa000. It seems like this has something
> to do with the root-cause.
> 
> Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
> problem btw. and booting with iommu=soft also works, but I have no idea
> yet why the aperture at that address is a problem (with the patch
> reverted the aperture lands at 0x8000).
> 

Does reverting e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 solve the
problem for you?

1a4a678b12c84db9ae5dce424e0e97f0559bb57c is a memory-allocation-order
patch, which have a nasty tendency to unmask bugs elsewhere in the
kernel.  However, e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 looks
positively strange (and it doesn't exactly help that the description is
written in Yinghai-ese and is therefore nearly impossible to decode,
never mind tell if it is remotely correct.)

-hpa


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
>> Could you please send the before/after bootlog (in particular all memory 
>> init 
>> messages included) and your .config?
>>
>>  before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out 
>> of init_memory_mapping()
>>   after:  d2137d5af425: Merge branch 'linus' into x86/bootmem
>>
>> I've Cc:-ed more people who might have an idea about it.
> 
> Okay, I have done some more bisecting and debugging today.
> 

First of all, *huge* thanks for this effort.  At least we need to track
down the bits that need to be reverted -- it is past rc3, and it's time
to see what we should revert and tell the submitter to try again next cycle.

This looks to be the same issue as in bugzilla 33012:

https://bugzilla.kernel.org/show_bug.cgi?id=33012

... so it would be good if we could keep the information in there.

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:50 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>> -addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>> +addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> 
> Btw, while looking at this code I wondered why the 512M goal is enforced
> by the alignment. Start could be set to 512M instead and the alignment
> can be aper_size as it should. Any reason for such a big alignment?
> 
>   Joerg
> 
> P.S.: The box is still in the office, I will try this debug-patch
>   tomorrow.

The only reason that I can think of is that the aperture itself can be
huge, and perhaps 512 MiB is the biggest such known.  512ULL<<21 is of
course a particularly moronic way to write 1 GiB, but it was a debug patch.

The value 512 MiB apparently comes from
7677b2ef6c0c4fddc84f6473f3863f40eb71821b, which is apparently totally ad
hoc; effectively it tries to prevent a collision with kexec by
hardcoding the kdump allocation as it sat at that point in time in the
GART assignment rules.

Yeah.  Brilliant.

-hpa

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:59 PM, Yinghai Lu wrote:
> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>> -   addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>> +   addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>
>> Btw, while looking at this code I wondered why the 512M goal is enforced
>> by the alignment. Start could be set to 512M instead and the alignment
>> can be aper_size as it should. Any reason for such a big alignment?
>>
> 
> when using bootmem, try to use big alignment (512M ), so we could avoid take 
> ram range below 512M.
> 

Yes, his question was why on Earth are you using 0 as start if that is
the purpose.

On top of that, where the hell does the magic 512 MiB come from?  It
looks like it is either completly ad hoc, or it has something to do with
where the kexec kernel was allocated once upon a time.

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 03:22 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote:
>> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>>> -  addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>>> +  addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>>
>>> Btw, while looking at this code I wondered why the 512M goal is enforced
>>> by the alignment. Start could be set to 512M instead and the alignment
>>> can be aper_size as it should. Any reason for such a big alignment?
>>>
>>> Joerg
>>>
>>> P.S.: The box is still in the office, I will try this debug-patch
>>>   tomorrow.
>>
>> The only reason that I can think of is that the aperture itself can be
>> huge, and perhaps 512 MiB is the biggest such known. 
> 
> Well, that would work as well by just using aper_size as alignment, the
> aperture needs to be aligned on its size anyway. This code only runs
> when Linux allocates the aperture itself and if I am mistaken is uses
> always 64MB when doing this.

Yes, I would agree with that.  The sane thing would be to set the base
to whatever address needs to be guarded against (WHICH SHOULD BE
MOTIVATED), and use aper_size as alignment, *unless* we are only using
the initial portion of a much larger hardware structure that needs
natural alignment (which isn't clear to me, I do know we sometimes use
only a fraction of the GART, but that doesn't mean we need to
naturally-align the entire thing, nor that 512 MiB is sufficient to do so.)

-hpa


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> 
> so those two patches uncover some problems.
> 
> [0.00] Checking aperture...
> [0.00] No AGP bridge found
> [0.00] Node 0: aperture @ a000 size 32 MB
> [0.00] Aperture pointing to e820 RAM. Ignoring.
> [0.00] Your BIOS doesn't leave a aperture memory hole
> [0.00] Please enable the IOMMU option in the BIOS setup
> [0.00] This costs you 64 MB of RAM
> [0.00] memblock_x86_reserve_range: [0xa000-0xa3ff]   
> aperture64
> [0.00] Mapping aperture over 65536 KB of RAM @ a000
> 
> so kernel try to reallocate apperture. because BIOS allocated is pointed to 
> RAM or size is too small.
> 
> but your radeon does use [0xa000, 0xbfff)
> 
> [4.281993] radeon :01:05.0: VRAM: 320M 0xC000 - 
> 0xD3FF (320M used)
> [4.290672] radeon :01:05.0: GTT: 512M 0xA000 - 
> 0xBFFF
> [4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
> [4.309857] [drm] RAM width 32bits DDR
> [4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
> [4.320379] [TTM] Initializing pool allocator.
> [4.324948] [drm] radeon: 320M of VRAM memory ready
> [4.329832] [drm] radeon: 512M of GTT memory ready.
> 
> and the one seems working:
> 
> [0.00] Checking aperture...
> [0.00] No AGP bridge found
> [0.00] Node 0: aperture @ a000 size 32 MB
> [0.00] Aperture pointing to e820 RAM. Ignoring.
> [0.00] Your BIOS doesn't leave a aperture memory hole
> [0.00] Please enable the IOMMU option in the BIOS setup
> [0.00] This costs you 64 MB of RAM
> [0.00] memblock_x86_reserve_range: [0x8000-0x83ff]   
> aperture64
> [0.00] Mapping aperture over 65536 KB of RAM @ 8000
> [0.00] memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]
>   BOOTMEM
> 
> will use different position...
> 
> [4.250159] radeon :01:05.0: VRAM: 320M 0xC000 - 
> 0xD3FF (320M used)
> [4.258830] radeon :01:05.0: GTT: 512M 0xA000 - 
> 0xBFFF
> [4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
> [4.271549] [drm] RAM width 32bits DDR
> [4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
> [4.282066] [TTM] Initializing pool allocator.
> [4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
> [4.293076] [drm] radeon: 320M of VRAM memory ready
> [4.298277] [drm] radeon: 512M of GTT memory ready.
> [4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> [4.309854] [drm] Driver supports precise vblank timestamp query.
> [4.315970] [drm] radeon: irq initialized.
> [4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
> 
> So question is why radeon is using the address [0xa000 - 0xc00], and 
> in E820 it is RAM 
> 
> [0.00]  BIOS-e820: 0010 - acb8d000 (usable)
> [0.00]  BIOS-e820: acb8d000 - acb8f000 (reserved)
> [0.00]  BIOS-e820: acb8f000 - afce9000 (usable)
> [0.00]  BIOS-e820: afce9000 - afd21000 (reserved)
> [0.00]  BIOS-e820: afd21000 - afd4f000 (usable)
> [0.00]  BIOS-e820: afd4f000 - afdcf000 (reserved)
> [0.00]  BIOS-e820: afdcf000 - afecf000 (ACPI NVS)
> [0.00]  BIOS-e820: afecf000 - afeff000 (ACPI data)
> [0.00]  BIOS-e820: afeff000 - aff0 (usable)
> 
> so looks bios program wrong address to the radon card?
> 

Okay, staring at this, it definitely seems toxic to overlay the GART
over memory areas reserved by the BIOS.  If I were to guess, I would say
that the problem here seems to be that the kernel thinks it is
overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.

Alex D., could you comment on the "num cpu pages" bit?

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 04:39 PM, Linus Torvalds wrote:
> 
>  - Choice #2: understand exactly _what_ goes wrong, and fix it
> analytically (ie by _understanding_ the problem, and being able to
> solve it exactly, and in a way you can argue about without having to
> resort to "magic happens").
> 
> Now, the whole analytic approach (aka "computer sciency" approach),
> where you can actually think about the problem without having any
> pesky "reality" impact the solution is obviously the one we tend to
> prefer. Sadly, it's seldom the one we can use in reality when it comes
> to things like resource allocation, since we end up starting off with
> often buggy approximations of what the actual hardware is all about
> (ie broken firmware tables).
> 
> So I'd love to know exactly why one random number works, and why
> another one doesn't. But as long as we do _not_ know the "Why" of it,
> we will have to revert.
> 

Yes.  However, even if we *do* revert (and the time is running short on
not reverting) I would like to understand this particular one, simply
because I think it may very well be a problem that is manifesting itself
in other ways on other systems.

The other thing that this has uncovered is that we already have a bunch
of complete b*llsh*t magic numbers in this path, some of which are
trivially shown to be wrong or at least completely arbitrary, so there
are more issues here :(

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 07:07 PM, Dave Airlie wrote:
>>
>> Okay, staring at this, it definitely seems toxic to overlay the GART
>> over memory areas reserved by the BIOS.  If I were to guess, I would say
>> that the problem here seems to be that the kernel thinks it is
>> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
>> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
>>
>> Alex D., could you comment on the "num cpu pages" bit?
> 
> These are not CPU addresses. I think we've stated that already. Not the
> droids.
> 
> the num cpu pages is how many CPU pages would be needed to fill the GPU
> GTT, for those crazy cases where CPU pagesize != GPU pagesize.
> 

OK, well, something is still weird.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-14 Thread H. Peter Anvin
On 04/14/2011 02:11 AM, Ingo Molnar wrote:
> 
> I'd strongly suggest we revert back to the old and proven allocation order, 
> as 
> long as it results in valid layouts. Even if we figure out this particular 
> GART/GTT assumption there might be a dozen others in other types of hardware.
> 

Yes, but we might also be hiding a real bug which bites other hardware.
 We have found real and very serious bugs in the kernel this way before
-- things where drivers scribble over random memory and allocation order
exposed the failure in a predictable way, as opposed to random crashes.

    -hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 2.6.39-rc3

2011-04-15 Thread H. Peter Anvin
On 04/15/2011 12:18 PM, Yinghai Lu wrote:
> On 04/15/2011 12:06 PM, Ingo Molnar wrote:
> 
>>
>> Joerg, mind submitting it with a changelog that includes everything we 
>> learned 
>> about this bug and all the Tested-by's in place?
>>
>> Is anyone of the opinion that we should try to revert the allocation 
>> order/alignment changes in addition to this fix?
> 
> We should figure out what is written to 0xa0001000 (main memory) by GPU 
> before internal GART is setup.
> 
> Joerg,
> can you insert some dump code in the drm/radon code to find out which 
> function cause the problem?
> 

Yes, I would like to make sure we don't just paper over a real bug
(again).  I think we still should talk Joerg's patch since it seems to
be the right thing to do anyway, but I do want to make sure we don't
have a memory-overwrite bug in the kernel that we're papering over.

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: (Short?) merge window reminder

2011-05-23 Thread H. Peter Anvin
On 05/23/2011 04:17 PM, Ted Ts'o wrote:
> On Mon, May 23, 2011 at 01:33:48PM -0700, Linus Torvalds wrote:
>> On Mon, May 23, 2011 at 12:20 PM, Ingo Molnar  wrote:
>>>
>>> I really hope there's also a voice that tells you to wait until .42 before
>>> cutting 3.0.0! :-)
>>
>> So I'm toying with 3.0 (and in that case, it really would be "3.0",
>> not "3.0.0" - the stable team would get the third digit rather than
>> the fourth one.
> 
> If we change from 2.6.X to 3.X, then if we don't change anything else,
> then successive stable release will cause the LINUX_VERSION_CODE to be
> incremented.  This isn't necessary bad, but it would be a different
> from what we have now.
> 

That sounds like a good thing.

-hpa
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: (Short?) merge window reminder

2011-05-24 Thread H. Peter Anvin
On 05/24/2011 08:07 AM, jonsm...@gmail.com wrote:
> On Tue, May 24, 2011 at 10:43 AM, Alan Cox  wrote:
>> Can we drop most of MCA, EISA and ISA bus if we are going to have a big
>> version change ? A driver spring clean is much overdue and it's all in
>> git in case someone wishes to sneak out at midnight and bring some crawly
>> horror back from the dead.
> 
> 2.8 could mark the beginning of the great cleanup
>   --- work out the details of what needs to be cleaned and set a goal
>   --- remove old buses/driver, switch to device tree, graphics, 32/64
> merges, etc
> 3.0 would mark its completion
> 

I think this whole discussion misses the essence of the new development
model, which is that we no longer do these kinds of feature-based major
milestones.  If we want to to deprecate lots of drivers (which I
personally would advocate against -- I have built systems specifically
to run a real floppy drive since the Linux floppy driver is amazingly
flexible and can read/write a lot of formats that nothing else can,
including USB floppies) then we should do that in the normal course of
action, incrementally, and listed in feature-removal-schedule.txt, not
all at once due to some arbitrary milestone.

We have found it works better this way.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> 
> First of all, I bisected between v2.6.37-rc2..f005fe12b90c which where
> only a couple of patches and merged v2.6.38-rc4 in at every step. There
> was no failure found.
> Then I tried this again, but this time I merged v2.6.38-rc5 at every
> step and was successful. The bad commit in this branch turned out to be
> 
>   1a4a678b12c84db9ae5dce424e0e97f0559bb57c
> 
> which is related to memblock.
> 
> Then I tried to find out which change between 2.6.38-rc4 and 2.6.38-rc5
> is needed to trigger the failure, so I used f005fe12b90c as a base,
> bisected between v2.6.38-rc4..v2.6.38-rc5 and merged every bisect step
> into the base and tested. Here the bad commit turned out to be
> 
>   e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20
> 
> which is related to gart. It turned out that the gart aperture on that
> box is on another position with these patches. Before it was as
> 0xa400 and now it is at 0xa000. It seems like this has something
> to do with the root-cause.
> 
> Reverting commit 1a4a678b12c84db9ae5dce424e0e97f0559bb57c fixes the
> problem btw. and booting with iommu=soft also works, but I have no idea
> yet why the aperture at that address is a problem (with the patch
> reverted the aperture lands at 0x8000).
> 

Does reverting e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 solve the
problem for you?

1a4a678b12c84db9ae5dce424e0e97f0559bb57c is a memory-allocation-order
patch, which have a nasty tendency to unmask bugs elsewhere in the
kernel.  However, e6d2e2b2b1e1455df16d68a78f4a3874c7b3ad20 looks
positively strange (and it doesn't exactly help that the description is
written in Yinghai-ese and is therefore nearly impossible to decode,
never mind tell if it is remotely correct.)

-hpa




Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 10:21 AM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 08:46:09AM +0200, Ingo Molnar wrote:
>> Could you please send the before/after bootlog (in particular all memory 
>> init 
>> messages included) and your .config?
>>
>>  before:  f005fe12b90c: x86-64: Move out cleanup higmap [_brk_end, _end) out 
>> of init_memory_mapping()
>>   after:  d2137d5af425: Merge branch 'linus' into x86/bootmem
>>
>> I've Cc:-ed more people who might have an idea about it.
> 
> Okay, I have done some more bisecting and debugging today.
> 

First of all, *huge* thanks for this effort.  At least we need to track
down the bits that need to be reverted -- it is past rc3, and it's time
to see what we should revert and tell the submitter to try again next cycle.

This looks to be the same issue as in bugzilla 33012:

https://bugzilla.kernel.org/show_bug.cgi?id=33012

... so it would be good if we could keep the information in there.

-hpa


Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:50 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>> -addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>> +addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
> 
> Btw, while looking at this code I wondered why the 512M goal is enforced
> by the alignment. Start could be set to 512M instead and the alignment
> can be aper_size as it should. Any reason for such a big alignment?
> 
>   Joerg
> 
> P.S.: The box is still in the office, I will try this debug-patch
>   tomorrow.

The only reason that I can think of is that the aperture itself can be
huge, and perhaps 512 MiB is the biggest such known.  512ULL<<21 is of
course a particularly moronic way to write 1 GiB, but it was a debug patch.

The value 512 MiB apparently comes from
7677b2ef6c0c4fddc84f6473f3863f40eb71821b, which is apparently totally ad
hoc; effectively it tries to prevent a collision with kexec by
hardcoding the kdump allocation as it sat at that point in time in the
GART assignment rules.

Yeah.  Brilliant.

-hpa



Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 02:59 PM, Yinghai Lu wrote:
> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>> -   addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>> +   addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>
>> Btw, while looking at this code I wondered why the 512M goal is enforced
>> by the alignment. Start could be set to 512M instead and the alignment
>> can be aper_size as it should. Any reason for such a big alignment?
>>
> 
> when using bootmem, try to use big alignment (512M ), so we could avoid take 
> ram range below 512M.
> 

Yes, his question was why on Earth are you using 0 as start if that is
the purpose.

On top of that, where the hell does the magic 512 MiB come from?  It
looks like it is either completly ad hoc, or it has something to do with
where the kexec kernel was allocated once upon a time.

-hpa


Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 03:22 PM, Joerg Roedel wrote:
> On Wed, Apr 13, 2011 at 03:01:10PM -0700, H. Peter Anvin wrote:
>> On 04/13/2011 02:50 PM, Joerg Roedel wrote:
>>> On Wed, Apr 13, 2011 at 01:48:48PM -0700, Yinghai Lu wrote:
>>>> -  addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
>>>> +  addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<21);
>>>
>>> Btw, while looking at this code I wondered why the 512M goal is enforced
>>> by the alignment. Start could be set to 512M instead and the alignment
>>> can be aper_size as it should. Any reason for such a big alignment?
>>>
>>> Joerg
>>>
>>> P.S.: The box is still in the office, I will try this debug-patch
>>>   tomorrow.
>>
>> The only reason that I can think of is that the aperture itself can be
>> huge, and perhaps 512 MiB is the biggest such known. 
> 
> Well, that would work as well by just using aper_size as alignment, the
> aperture needs to be aligned on its size anyway. This code only runs
> when Linux allocates the aperture itself and if I am mistaken is uses
> always 64MB when doing this.

Yes, I would agree with that.  The sane thing would be to set the base
to whatever address needs to be guarded against (WHICH SHOULD BE
MOTIVATED), and use aper_size as alignment, *unless* we are only using
the initial portion of a much larger hardware structure that needs
natural alignment (which isn't clear to me, I do know we sometimes use
only a fraction of the GART, but that doesn't mean we need to
naturally-align the entire thing, nor that 512 MiB is sufficient to do so.)

-hpa




Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 12:14 PM, Yinghai Lu wrote:
> 
> so those two patches uncover some problems.
> 
> [0.00] Checking aperture...
> [0.00] No AGP bridge found
> [0.00] Node 0: aperture @ a000 size 32 MB
> [0.00] Aperture pointing to e820 RAM. Ignoring.
> [0.00] Your BIOS doesn't leave a aperture memory hole
> [0.00] Please enable the IOMMU option in the BIOS setup
> [0.00] This costs you 64 MB of RAM
> [0.00] memblock_x86_reserve_range: [0xa000-0xa3ff]   
> aperture64
> [0.00] Mapping aperture over 65536 KB of RAM @ a000
> 
> so kernel try to reallocate apperture. because BIOS allocated is pointed to 
> RAM or size is too small.
> 
> but your radeon does use [0xa000, 0xbfff)
> 
> [4.281993] radeon :01:05.0: VRAM: 320M 0xC000 - 
> 0xD3FF (320M used)
> [4.290672] radeon :01:05.0: GTT: 512M 0xA000 - 
> 0xBFFF
> [4.298550] [drm] Detected VRAM RAM=320M, BAR=256M
> [4.309857] [drm] RAM width 32bits DDR
> [4.313748] [TTM] Zone  kernel: Available graphics memory: 1896524 kiB.
> [4.320379] [TTM] Initializing pool allocator.
> [4.324948] [drm] radeon: 320M of VRAM memory ready
> [4.329832] [drm] radeon: 512M of GTT memory ready.
> 
> and the one seems working:
> 
> [0.00] Checking aperture...
> [0.00] No AGP bridge found
> [0.00] Node 0: aperture @ a000 size 32 MB
> [0.00] Aperture pointing to e820 RAM. Ignoring.
> [0.00] Your BIOS doesn't leave a aperture memory hole
> [0.00] Please enable the IOMMU option in the BIOS setup
> [0.00] This costs you 64 MB of RAM
> [0.00] memblock_x86_reserve_range: [0x8000-0x83ff]   
> aperture64
> [0.00] Mapping aperture over 65536 KB of RAM @ 8000
> [0.00] memblock_x86_reserve_range: [0xacb6bdc0-0xacb6bddf]
>   BOOTMEM
> 
> will use different position...
> 
> [4.250159] radeon :01:05.0: VRAM: 320M 0xC000 - 
> 0xD3FF (320M used)
> [4.258830] radeon :01:05.0: GTT: 512M 0xA000 - 
> 0xBFFF
> [4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
> [4.271549] [drm] RAM width 32bits DDR
> [4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
> [4.282066] [TTM] Initializing pool allocator.
> [4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
> [4.293076] [drm] radeon: 320M of VRAM memory ready
> [4.298277] [drm] radeon: 512M of GTT memory ready.
> [4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
> [4.309854] [drm] Driver supports precise vblank timestamp query.
> [4.315970] [drm] radeon: irq initialized.
> [4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072
> 
> So question is why radeon is using the address [0xa000 - 0xc00], and 
> in E820 it is RAM 
> 
> [0.00]  BIOS-e820: 0010 - acb8d000 (usable)
> [0.00]  BIOS-e820: acb8d000 - acb8f000 (reserved)
> [0.00]  BIOS-e820: acb8f000 - afce9000 (usable)
> [0.00]  BIOS-e820: afce9000 - afd21000 (reserved)
> [0.00]  BIOS-e820: afd21000 - afd4f000 (usable)
> [0.00]  BIOS-e820: afd4f000 - afdcf000 (reserved)
> [0.00]  BIOS-e820: afdcf000 - afecf000 (ACPI NVS)
> [0.00]  BIOS-e820: afecf000 - afeff000 (ACPI data)
> [0.00]  BIOS-e820: afeff000 - aff0 (usable)
> 
> so looks bios program wrong address to the radon card?
> 

Okay, staring at this, it definitely seems toxic to overlay the GART
over memory areas reserved by the BIOS.  If I were to guess, I would say
that the problem here seems to be that the kernel thinks it is
overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.

Alex D., could you comment on the "num cpu pages" bit?

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 04:39 PM, Linus Torvalds wrote:
> 
>  - Choice #2: understand exactly _what_ goes wrong, and fix it
> analytically (ie by _understanding_ the problem, and being able to
> solve it exactly, and in a way you can argue about without having to
> resort to "magic happens").
> 
> Now, the whole analytic approach (aka "computer sciency" approach),
> where you can actually think about the problem without having any
> pesky "reality" impact the solution is obviously the one we tend to
> prefer. Sadly, it's seldom the one we can use in reality when it comes
> to things like resource allocation, since we end up starting off with
> often buggy approximations of what the actual hardware is all about
> (ie broken firmware tables).
> 
> So I'd love to know exactly why one random number works, and why
> another one doesn't. But as long as we do _not_ know the "Why" of it,
> we will have to revert.
> 

Yes.  However, even if we *do* revert (and the time is running short on
not reverting) I would like to understand this particular one, simply
because I think it may very well be a problem that is manifesting itself
in other ways on other systems.

The other thing that this has uncovered is that we already have a bunch
of complete b*llsh*t magic numbers in this path, some of which are
trivially shown to be wrong or at least completely arbitrary, so there
are more issues here :(

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


Linux 2.6.39-rc3

2011-04-13 Thread H. Peter Anvin
On 04/13/2011 07:07 PM, Dave Airlie wrote:
>>
>> Okay, staring at this, it definitely seems toxic to overlay the GART
>> over memory areas reserved by the BIOS.  If I were to guess, I would say
>> that the problem here seems to be that the kernel thinks it is
>> overlaying 64 MiB of memory, but the actual GART is in fact 512 MiB in
>> size -- 131072 CPU pages -- which now overlaps the BIOS reserved areas.
>>
>> Alex D., could you comment on the "num cpu pages" bit?
> 
> These are not CPU addresses. I think we've stated that already. Not the
> droids.
> 
> the num cpu pages is how many CPU pages would be needed to fill the GPU
> GTT, for those crazy cases where CPU pagesize != GPU pagesize.
> 

OK, well, something is still weird.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Linux 2.6.39-rc3

2011-04-14 Thread H. Peter Anvin
On 04/14/2011 02:11 AM, Ingo Molnar wrote:
> 
> I'd strongly suggest we revert back to the old and proven allocation order, 
> as 
> long as it results in valid layouts. Even if we figure out this particular 
> GART/GTT assumption there might be a dozen others in other types of hardware.
> 

Yes, but we might also be hiding a real bug which bites other hardware.
 We have found real and very serious bugs in the kernel this way before
-- things where drivers scribble over random memory and allocation order
exposed the failure in a predictable way, as opposed to random crashes.

    -hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Linux 2.6.39-rc3

2011-04-15 Thread H. Peter Anvin
On 04/15/2011 12:18 PM, Yinghai Lu wrote:
> On 04/15/2011 12:06 PM, Ingo Molnar wrote:
> 
>>
>> Joerg, mind submitting it with a changelog that includes everything we 
>> learned 
>> about this bug and all the Tested-by's in place?
>>
>> Is anyone of the opinion that we should try to revert the allocation 
>> order/alignment changes in addition to this fix?
> 
> We should figure out what is written to 0xa0001000 (main memory) by GPU 
> before internal GART is setup.
> 
> Joerg,
> can you insert some dump code in the drm/radon code to find out which 
> function cause the problem?
> 

Yes, I would like to make sure we don't just paper over a real bug
(again).  I think we still should talk Joerg's patch since it seems to
be the right thing to do anyway, but I do want to make sure we don't
have a memory-overwrite bug in the kernel that we're papering over.

-hpa


(Short?) merge window reminder

2011-05-23 Thread H. Peter Anvin
On 05/23/2011 04:17 PM, Ted Ts'o wrote:
> On Mon, May 23, 2011 at 01:33:48PM -0700, Linus Torvalds wrote:
>> On Mon, May 23, 2011 at 12:20 PM, Ingo Molnar  wrote:
>>>
>>> I really hope there's also a voice that tells you to wait until .42 before
>>> cutting 3.0.0! :-)
>>
>> So I'm toying with 3.0 (and in that case, it really would be "3.0",
>> not "3.0.0" - the stable team would get the third digit rather than
>> the fourth one.
> 
> If we change from 2.6.X to 3.X, then if we don't change anything else,
> then successive stable release will cause the LINUX_VERSION_CODE to be
> incremented.  This isn't necessary bad, but it would be a different
> from what we have now.
> 

That sounds like a good thing.

-hpa


(Short?) merge window reminder

2011-05-24 Thread H. Peter Anvin
On 05/24/2011 08:07 AM, jonsmirl at gmail.com wrote:
> On Tue, May 24, 2011 at 10:43 AM, Alan Cox  
> wrote:
>> Can we drop most of MCA, EISA and ISA bus if we are going to have a big
>> version change ? A driver spring clean is much overdue and it's all in
>> git in case someone wishes to sneak out at midnight and bring some crawly
>> horror back from the dead.
> 
> 2.8 could mark the beginning of the great cleanup
>   --- work out the details of what needs to be cleaned and set a goal
>   --- remove old buses/driver, switch to device tree, graphics, 32/64
> merges, etc
> 3.0 would mark its completion
> 

I think this whole discussion misses the essence of the new development
model, which is that we no longer do these kinds of feature-based major
milestones.  If we want to to deprecate lots of drivers (which I
personally would advocate against -- I have built systems specifically
to run a real floppy drive since the Linux floppy driver is amazingly
flexible and can read/write a lot of formats that nothing else can,
including USB floppies) then we should do that in the normal course of
action, incrementally, and listed in feature-removal-schedule.txt, not
all at once due to some arbitrary milestone.

We have found it works better this way.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



Info: mapping multiple BARs. Your kernel is fine.

2014-02-25 Thread H. Peter Anvin
On 02/24/2014 12:19 PM, Borislav Petkov wrote:
> Btw,
> 
> I don't know whether the following observation is related or not, but it
> so happens that after resume from suspend-to-disk, I see the booting up
> of the resume kernel on the console but when it is time for the original
> kernel to take over and switch to graphics, the screen remains black but
> the machine is responsive over the network.
> 
> And this doesn't happen on every resume but only sporadically.
> 
> And yep, -rc3 was fine.
> 
> On Mon, Feb 24, 2014 at 05:24:00PM +0100, Borislav Petkov wrote:
>> This started happening this morning after booting -rc4+tip, let's
>> add *everybody* to CC :-)
>>
>> We have intel_uncore_init, snb_uncore_imc_init_box, uncore_pci_probe and
>> other goodies on the stack.
>>

snb_uncore_imc_init_box() is introduced new in tip:perf/core, and is a
relatively recent commit (b9e1ab6d4c0582cad97699285a6b3cf992251b00), so
I suspect that that wasn't in whatever -rc3 mix you were testing.

I am wondering if backing/disabling out that support (perhaps by
removing the relevant PCI ID) fixes the problem?

-hpa



[PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-23 Thread H. Peter Anvin
We should clear this bit presumably on switching either from or to 512-char 
mode, since the bit doesn't really make sense either way.

Dave Airlie  wrote:

>From: Dave Airlie 
>
>When we switch from 256->512 byte font rendering mode, it means the
>current contents of the screen is being reinterpreted. The bit that
>holds
>the high bit of the 9-bit font, may have been previously set, and thus
>the new font misrenders.
>
>The problem case we see is grub2 writes spaces with the bit set, so it
>ends up with data like 0x820, which gets reinterpreted into 0x120 char
>which the font translates into G with a circumflex. This flashes up on
>screen at boot and is quite ugly.
>
>A current side effect of this patch though is that any rendering on the
>screen changes color to a slightly darker color, but at least the
>screen
>no longer corrupts.
>
>Signed-off-by: Dave Airlie 
>---
> drivers/tty/vt/vt.c|  2 +-
> drivers/video/console/vgacon.c | 19 ---
> include/linux/vt_kern.h|  1 +
> 3 files changed, 14 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
>index 8fd8968..c8067ae 100644
>--- a/drivers/tty/vt/vt.c
>+++ b/drivers/tty/vt/vt.c
>@@ -638,7 +638,7 @@ static inline void save_screen(struct vc_data *vc)
>  *Redrawing of screen
>  */
> 
>-static void clear_buffer_attributes(struct vc_data *vc)
>+void clear_buffer_attributes(struct vc_data *vc)
> {
>   unsigned short *p = (unsigned short *)vc->vc_origin;
>   int count = vc->vc_screenbuf_size / 2;
>diff --git a/drivers/video/console/vgacon.c
>b/drivers/video/console/vgacon.c
>index d449a74..271b5d0 100644
>--- a/drivers/video/console/vgacon.c
>+++ b/drivers/video/console/vgacon.c
>@@ -1064,7 +1064,7 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
>   unsigned short video_port_status = vga_video_port_reg + 6;
>   int font_select = 0x00, beg, i;
>   char *charmap;
>-  
>+  bool clear_attribs = false;
>   if (vga_video_type != VIDEO_TYPE_EGAM) {
>   charmap = (char *) VGA_MAP_MEM(colourmap, 0);
>   beg = 0x0e;
>@@ -1169,12 +1169,6 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
> 
>   /* if 512 char mode is already enabled don't re-enable it. */
>   if ((set) && (ch512 != vga_512_chars)) {
>-  /* attribute controller */
>-  for (i = 0; i < MAX_NR_CONSOLES; i++) {
>-  struct vc_data *c = vc_cons[i].d;
>-  if (c && c->vc_sw == &vga_con)
>-  c->vc_hi_font_mask = ch512 ? 0x0800 : 0;
>-  }
>   vga_512_chars = ch512;
>   /* 256-char: enable intensity bit
>  512-char: disable intensity bit */
>@@ -1185,8 +1179,19 @@ static int vgacon_do_font_op(struct vgastate
>*state,char *arg,int set,int ch512)
>  it means, but it works, and it appears necessary */
>   inb_p(video_port_status);
>   vga_wattr(state->vgabase, VGA_AR_ENABLE_DISPLAY, 0);
>+  clear_attribs = true;
>   }
>   raw_spin_unlock_irq(&vga_lock);
>+
>+  if (clear_attribs) {
>+  for (i = 0; i < MAX_NR_CONSOLES; i++) {
>+  struct vc_data *c = vc_cons[i].d;
>+  if (c && c->vc_sw == &vga_con) {
>+  clear_buffer_attributes(c);
>+  c->vc_hi_font_mask = ch512 ? 0x0800 : 0;
>+  }
>+  }
>+  }
>   return 0;
> }
> 
>diff --git a/include/linux/vt_kern.h b/include/linux/vt_kern.h
>index 50ae7d0..1f55665 100644
>--- a/include/linux/vt_kern.h
>+++ b/include/linux/vt_kern.h
>@@ -47,6 +47,7 @@ int con_set_cmap(unsigned char __user *cmap);
> int con_get_cmap(unsigned char __user *cmap);
> void scrollback(struct vc_data *vc, int lines);
> void scrollfront(struct vc_data *vc, int lines);
>+void clear_buffer_attributes(struct vc_data *vc);
>void update_region(struct vc_data *vc, unsigned long start, int count);
> void redraw_screen(struct vc_data *vc, int is_switch);
> #define update_screen(x) redraw_screen(x, 0)

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


[PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
The characters will morph anyway, it is just a matter off having them randomly 
scream with the intensity bit.

512-character mode is definitely useful... we get much wider language coverage 
with 512 than with 256, which is why most distros use a 512 console font.

Dave Airlie  wrote:

>On Thu, Jan 24, 2013 at 3:20 PM, H. Peter Anvin  wrote:
>> We should clear this bit presumably on switching either from or to
>512-char mode, since the bit doesn't really make sense either way.
>
>Yeah the only problem going from 512-char is that chars above 256 will
>morph into garbage chars below 256, like ctrl chars, I don't really
>want to face into doing some sort of safe translation out of 512-char
>mode, from reading around the net, 512 char mode doesn't seem all that
>brilliantly useful.
>
>Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


[PATCH] vgacon/vt: clear buffer attributes when we load a 512 character font

2013-01-24 Thread H. Peter Anvin
It would be nice to support more than 512 characters when using a graphical 
console.  We should be able to support up to 2048 at least.

Dave Airlie  wrote:

>On Thu, Jan 24, 2013 at 10:18 PM, H. Peter Anvin  wrote:
>> The characters will morph anyway, it is just a matter off having them
>randomly scream with the intensity bit.
>>
>> 512-character mode is definitely useful... we get much wider language
>coverage with 512 than with 256, which is why most distros use a 512
>console font.
>
>Yeah really use a graphics console :-)
>
>But I've sent a v2 patch that should clear the attribute space going
>both directions.
>
>Dave.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


[PATCH v2 00/14] Platform Framebuffers and SimpleDRM

2013-07-04 Thread H. Peter Anvin
On 07/04/2013 05:25 AM, David Herrmann wrote:
>   - What FB formats are common on x86 that we should add to SIMPLEFB_FORMATS?
> (other than ARGB/XRGB32)

The common pixel formats on x86 are:

- Palettized 4-bit planar (bigendian, i.e. MSB to the left)
- Palettized 8-bit packed (one byte per pixel)
- 16-bit RGB555 (16-bit littleendian words with R=14:10, G=9:5, B=4:0)
- 16-bit RGB565 (16-bit littleendian words with R=15:11, G=10:5, B=4:0)
- 24-bit RGB888 in littleendian order (first byte in memory is B,
   second is G, third is R)
- 32-bit ARGB (first byte in memory is B, second G, third R, fourth
   unused in the framebuffer proper)
- 32-bit RGB10:10:10 (I *believe* 32-bit littleendian words with
   R=29:20, G=19:10, B=9:0)

-hpa



MTRR use in drivers

2013-06-20 Thread H. Peter Anvin
An awful lot of drivers, mostly DRI drivers, are still mucking with
MTRRs directly as opposed to using ioremap_wc() or similar interfaces.
In addition to the architecture dependency, this is really undesirable
because MTRRs are a limited resource, whereas page table attributes are not.

Furthermore, this perpetuates the need for the horrific hack known as
"MTRR cleanup".

What, if anything, can we do to clean up this mess?

-hpa


MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
Why do you care about performance when PAT is disabled?

Brice Goglin  wrote:

>Le 21/06/2013 07:00, H. Peter Anvin a ?crit :
>> An awful lot of drivers, mostly DRI drivers, are still mucking with
>> MTRRs directly as opposed to using ioremap_wc() or similar
>interfaces.
>> In addition to the architecture dependency, this is really
>undesirable
>> because MTRRs are a limited resource, whereas page table attributes
>are not.
>>
>> Furthermore, this perpetuates the need for the horrific hack known as
>> "MTRR cleanup".
>>
>> What, if anything, can we do to clean up this mess?
>>
>>  -hpa
>>
>
>The first network driver that used ioremap_wc() back in 2008 (myri10ge)
>had to keep using MTRR because ioremap_wc() silently falls back to
>ioremap_nocache() when PAT is disabled.
>
>I asked about this in https://lkml.org/lkml/2008/5/31/42 and there was
>some talk about putting the MTRR addition in the nocache fallback path
>but I guess nobody implemented the idea.
>
>Brice

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 12:29 PM, Henrique de Moraes Holschuh wrote:
> On Sun, 23 Jun 2013, H. Peter Anvin wrote:
>> Why do you care about performance when PAT is disabled?
> 
> It will regress already slow boxes.  We blacklist a LOT of P4s, PMs, etc and
> nobody ever took the pain to track down which ones of those actually have
> PAT+MTRR aliasing bugs.
> 
> These boxes have boards like the Radeon X300, which needs either PAT or MTRR
> to not become unusable...
> 

We're talking hardware which is now many years old, but this is causing
very serious problems on real, modern hardware.  As far as I understand
it, too, the blacklisting was precautionary (the only bug that I
personally know about is a performance bug, where WC would be
incorrectly converted to UC.)

We need a way forward here.  If it is the only way I think we would have
to sacrifice the old machines, but perhaps something can be worked out
(e.g. if PAT is disabled, fall back to MTRRs if available for ioremap_wc()).

-hpa



MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:30 PM, Dave Airlie wrote:
 Why do you care about performance when PAT is disabled?
> 
> breaking old boxes just because, is just going to get reverted when I
> get the first regression report that you broke old boxes.
> 

Not "just because", but *if* the choice is between breaking old boxes
and breaking new boxes I'll take the latter.

> Andy Lutomirski just submitted a bunch of patches to clean up the DRM
> usage of mtrrs, they are in drm-next, afaik we no longer add them on
> PAT systems.

Fantastic news.  No issue, then, and no need to break anything.

The only problem I see with having ioremap_wc() installing an MTRR on
non-PAT, rather than pushing that into the drivers which is clearly not
the right thing, is that we will need a hook to uninstall it when the
mapping is destroyed.

-hpa





MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 01:54 PM, Dave Airlie wrote:
>>> breaking old boxes just because, is just going to get reverted when I
>>> get the first regression report that you broke old boxes.
>>>
>>
>> Not "just because", but *if* the choice is between breaking old boxes
>> and breaking new boxes I'll take the latter.
>>
> 
> But Linus won't so your choice doesn't matter.

I hate to break it to you, but we regress on ancient hardware all the
time.  Optimization work gets done on modern machines, so the sweet spot
keeps moving.  In particular, if supporting ancient hardware means
leaving a lot of performance on modern hardware on the table, we may
have to take that penalty.

Fortunately, most of the time we don't have to.

-hpa





MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
> 
> And as far as I could find from Intel's not-that-complete public
> "specification updates", we are applying the errata workaround to a few more
> processors than strictly required, but since I have no idea how to write a
> test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor can I get
> ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and report
> back.
> 

Which specific erratum are you referring to, here?  The "WC becomes UC"
erratum?  I don't think there is a sane testcase for it since it needs a
very complicated setup to trigger.

-hpa



MTRR use in drivers

2013-06-23 Thread H. Peter Anvin
The aliasing doesn't matter for Linux because we map the high and low half the 
same.

Henrique de Moraes Holschuh  wrote:

>On Sun, 23 Jun 2013, H. Peter Anvin wrote:
>> On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
>> > 
>> > And as far as I could find from Intel's not-that-complete public
>> > "specification updates", we are applying the errata workaround to a
>few more
>> > processors than strictly required, but since I have no idea how to
>write a
>> > test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor
>can I get
>> > ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and
>report
>> > back.
>> 
>> Which specific erratum are you referring to, here?  The "WC becomes
>UC"
>> erratum?  I don't think there is a sane testcase for it since it
>needs a
>> very complicated setup to trigger.
>
>There are at least two different nasty PAT issues that are not always
>critical, and one that outright hangs the processor (if the unsupported
>aliasing of WB with UC/WC happens).
>
>Interestingly enough, most of the P4-Xeons and P4 do not appear to have
>the
>"WC becomes UC" errata.
>
>However, LOTS of P4, M-P4, Xeon PIII, Xeon, and Pentium M have a bug
>where
>the four highest entries in the PAT table are inactive (aliased to the
>four
>lowest entries) in mode B (PSE) and mode C (PAE) for 4k pages.  They
>work
>fine for large pages.
>
>Also, lots of them can hang if you ever alias WB with UC or WC (which
>is
>apparently an unsupported configuration anyway, or so it says in the
>errata).
>
>There are other weird aliasing nasties, such as one where you get
>memory
>corruption if you alias WB data with code (being accessed as UC or WC)
>in
>the same cacheline, and some stuff such as weirdness should the page
>table
>be on WC memory...
>
>I can track down most of the CPUIDs involved if you want, but someone
>from
>Intel would be better (I assume they actually have access to the errata
>documentation in some less idiotic way than reading a ton of badly
>indexed
>PDFs that take forever to find in their site).

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


[PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-07 Thread H. Peter Anvin
On 03/07/2013 09:28 PM, Tejun Heo wrote:
> On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu  wrote:
>> They are not using memblock_find_in_range(), so 1ULL<< will not help.
>>
>> Really hope i915 drm guys could clean that hacks.
> 
> The code isn't being used.  Just leave it alone.  Maybe add a comment.
>  The change is just making things more confusing.
> 

Indeed, but...

Daniel: can you guys clean this up or can we just remove the #if 0 clause?

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.



[PATCH 01/14] x86, ACPI, mm: Kill max_low_pfn_mapped

2013-03-11 Thread H. Peter Anvin
The problem is that the code will be broken, and so it makes no sense.  The #if 
0 is really confusing.

Daniel Vetter  wrote:

>On Thu, Mar 07, 2013 at 10:09:26PM -0800, H. Peter Anvin wrote:
>> On 03/07/2013 09:28 PM, Tejun Heo wrote:
>> > On Thu, Mar 7, 2013 at 9:27 PM, Yinghai Lu 
>wrote:
>> >> They are not using memblock_find_in_range(), so 1ULL<< will not
>help.
>> >>
>> >> Really hope i915 drm guys could clean that hacks.
>> > 
>> > The code isn't being used.  Just leave it alone.  Maybe add a
>comment.
>> >  The change is just making things more confusing.
>> > 
>> 
>> Indeed, but...
>> 
>> Daniel: can you guys clean this up or can we just remove the #if 0
>clause?
>
>I guess we could just put this into a comment explaining where stolen
>memory for the gfx devices is at on gen2. But tbh I don't mind if we
>just
>keep the #if 0 code around. For all newer platforms we can get at that
>offset through mch bar registers, so I don't really care.
>-Daniel

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.


Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()

2025-02-24 Thread H. Peter Anvin



On 2/24/25 14:08, Uros Bizjak wrote:

On Mon, Feb 24, 2025 at 10:56 PM H. Peter Anvin  wrote:


On 2/24/25 07:24, Uros Bizjak wrote:



On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:

Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.


The patch improves parity assembly code in bootflag.o from:

58:89 demov%ebx,%esi
5a:b9 08 00 00 00   mov$0x8,%ecx
5f:31 d2xor%edx,%edx
61:89 f0mov%esi,%eax
63:89 d7mov%edx,%edi
65:40 d0 ee shr%sil
68:83 e0 01 and$0x1,%eax
6b:31 c2xor%eax,%edx
6d:83 e9 01 sub$0x1,%ecx
70:75 efjne61 
72:39 c7cmp%eax,%edi
74:74 7fje f5 
76:

to:

54:89 d8mov%ebx,%eax
56:ba 96 69 00 00   mov$0x6996,%edx
5b:c0 e8 04 shr$0x4,%al
5e:31 d8xor%ebx,%eax
60:83 e0 0f and$0xf,%eax
63:0f a3 c2 bt %eax,%edx
66:73 64jaecc 
68:

which is faster and smaller (-10 bytes) code.



Of course, on x86, parity8() and parity16() can be implemented very simply:

(Also, the parity functions really ought to return bool, and be flagged
__attribute_const__.)

static inline __attribute_const__ bool _arch_parity8(u8 val)
{
 bool parity;
 asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));


asm("test %0,%0" : "=@ccnp" (parity) : "q" (val));

because we are interested only in flags.



Also, needs to be %1,%1 (my mistake, thought flags outputs didn't count.)

Finally, this is kind of an obvious improvement:

 static void __init sbf_write(u8 v)
 {
unsigned long flags;

if (sbf_port != -1) {
-   v &= ~SBF_PARITY;
if (!parity(v))
-   v |= SBF_PARITY;
+   v ^= SBF_PARITY;

-hpa



Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()

2025-02-24 Thread H. Peter Anvin

On 2/24/25 07:24, Uros Bizjak wrote:



On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:

Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.


The patch improves parity assembly code in bootflag.o from:

   58:    89 de    mov    %ebx,%esi
   5a:    b9 08 00 00 00   mov    $0x8,%ecx
   5f:    31 d2    xor    %edx,%edx
   61:    89 f0    mov    %esi,%eax
   63:    89 d7    mov    %edx,%edi
   65:    40 d0 ee shr    %sil
   68:    83 e0 01 and    $0x1,%eax
   6b:    31 c2    xor    %eax,%edx
   6d:    83 e9 01 sub    $0x1,%ecx
   70:    75 ef    jne    61 
   72:    39 c7    cmp    %eax,%edi
   74:    74 7f    je f5 
   76:

to:

   54:    89 d8    mov    %ebx,%eax
   56:    ba 96 69 00 00   mov    $0x6996,%edx
   5b:    c0 e8 04 shr    $0x4,%al
   5e:    31 d8    xor    %ebx,%eax
   60:    83 e0 0f and    $0xf,%eax
   63:    0f a3 c2 bt %eax,%edx
   66:    73 64    jae    cc 
   68:

which is faster and smaller (-10 bytes) code.



Of course, on x86, parity8() and parity16() can be implemented very simply:

(Also, the parity functions really ought to return bool, and be flagged 
__attribute_const__.)


static inline __attribute_const__ bool _arch_parity8(u8 val)
{
bool parity;
asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
return parity;
}

static inline __attribute_const__ bool _arch_parity16(u16 val)
{
bool parity;
asm("xor %h0,%b0" : "=@ccnp" (parity), "+Q" (val));
return parity;
}

In the generic algorithm, you probably should implement parity16() in 
terms of parity8(), parity32() in terms of parity16() and so on:


static inline __attribute_const__ bool parity16(u16 val)
{
#ifdef ARCH_HAS_PARITY16
if (!__builtin_const_p(val))
return _arch_parity16(val);
#endif
return parity8(val ^ (val >> 8));
}

This picks up the architectural versions when available.

Furthermore, if a popcnt instruction is known to exist, then the parity 
is simply popcnt(x) & 1.


-hpa



Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()

2025-02-24 Thread H. Peter Anvin
On February 24, 2025 2:08:05 PM PST, Uros Bizjak  wrote:
>On Mon, Feb 24, 2025 at 10:56 PM H. Peter Anvin  wrote:
>>
>> On 2/24/25 07:24, Uros Bizjak wrote:
>> >
>> >
>> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
>> >> Refactor parity calculations to use the standard parity8() helper. This
>> >> change eliminates redundant implementations and improves code
>> >> efficiency.
>> >
>> > The patch improves parity assembly code in bootflag.o from:
>> >
>> >58:89 demov%ebx,%esi
>> >5a:b9 08 00 00 00   mov$0x8,%ecx
>> >5f:31 d2xor%edx,%edx
>> >61:89 f0mov%esi,%eax
>> >63:89 d7mov%edx,%edi
>> >65:40 d0 ee shr%sil
>> >68:83 e0 01 and$0x1,%eax
>> >6b:31 c2xor%eax,%edx
>> >6d:83 e9 01 sub$0x1,%ecx
>> >70:75 efjne61 
>> >72:39 c7cmp%eax,%edi
>> >74:74 7fje f5 
>> >76:
>> >
>> > to:
>> >
>> >54:89 d8mov%ebx,%eax
>> >56:ba 96 69 00 00   mov$0x6996,%edx
>> >5b:c0 e8 04 shr$0x4,%al
>> >5e:31 d8xor%ebx,%eax
>> >60:83 e0 0f and$0xf,%eax
>> >63:0f a3 c2 bt %eax,%edx
>> >66:73 64jaecc 
>> >68:
>> >
>> > which is faster and smaller (-10 bytes) code.
>> >
>>
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>>
>> (Also, the parity functions really ought to return bool, and be flagged
>> __attribute_const__.)
>>
>> static inline __attribute_const__ bool _arch_parity8(u8 val)
>> {
>> bool parity;
>> asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
>
>asm("test %0,%0" : "=@ccnp" (parity) : "q" (val));
>
>because we are interested only in flags.
>
>Uros.
>

Same thing, really, but yes, using test is cleaner.


Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()

2025-02-24 Thread H. Peter Anvin
On February 24, 2025 2:17:29 PM PST, Yury Norov  wrote:
>On Mon, Feb 24, 2025 at 01:55:28PM -0800, H. Peter Anvin wrote:
>> On 2/24/25 07:24, Uros Bizjak wrote:
>> > 
>> > 
>> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
>> > > Refactor parity calculations to use the standard parity8() helper. This
>> > > change eliminates redundant implementations and improves code
>> > > efficiency.
>> > 
>> > The patch improves parity assembly code in bootflag.o from:
>> > 
>> >    58:    89 de    mov    %ebx,%esi
>> >    5a:    b9 08 00 00 00   mov    $0x8,%ecx
>> >    5f:    31 d2    xor    %edx,%edx
>> >    61:    89 f0    mov    %esi,%eax
>> >    63:    89 d7    mov    %edx,%edi
>> >    65:    40 d0 ee shr    %sil
>> >    68:    83 e0 01 and    $0x1,%eax
>> >    6b:    31 c2    xor    %eax,%edx
>> >    6d:    83 e9 01 sub    $0x1,%ecx
>> >    70:    75 ef    jne    61 
>> >    72:    39 c7    cmp    %eax,%edi
>> >    74:    74 7f    je f5 
>> >    76:
>> > 
>> > to:
>> > 
>> >    54:    89 d8    mov    %ebx,%eax
>> >    56:    ba 96 69 00 00   mov    $0x6996,%edx
>> >    5b:    c0 e8 04 shr    $0x4,%al
>> >    5e:    31 d8    xor    %ebx,%eax
>> >    60:    83 e0 0f and    $0xf,%eax
>> >    63:    0f a3 c2 bt %eax,%edx
>> >    66:    73 64    jae    cc 
>> >    68:
>> > 
>> > which is faster and smaller (-10 bytes) code.
>> > 
>> 
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>> 
>> (Also, the parity functions really ought to return bool, and be flagged
>> __attribute_const__.)
>
>There was a discussion regarding return type when parity8() was added.
>The integer type was taken over bool with a sort of consideration that
>bool should be returned as an answer to some question, like parity_odd().
>
>To me it's not a big deal. We can switch to boolean and describe in
>comment what the 'true' means for the parity() function.

Bool is really the single-bit type, and gives the compiler more information. 
You could argue that the function really should be called parity_odd*() in 
general, but that's kind of excessive IMO.


Re: [PATCH 02/17] bitops: Add generic parity calculation for u64

2025-02-25 Thread H. Peter Anvin
On February 25, 2025 1:43:27 PM PST, Andrew Cooper  
wrote:
>> Incidentally, in all of this, didn't anyone notice __builtin_parity()?
>
>Yes.  It it has done sane for a decade on x86, yet does things such as
>emitting a library call on other architectures.
>
>https://godbolt.org/z/6qG3noebq
>
>~Andrew

And not even a smart one at that.


Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()

2025-02-25 Thread H. Peter Anvin
On February 25, 2025 2:46:23 PM PST, David Laight 
 wrote:
>On Mon, 24 Feb 2025 13:55:28 -0800
>"H. Peter Anvin"  wrote:
>
>> On 2/24/25 07:24, Uros Bizjak wrote:
>> > 
>> > 
>> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:  
>> >> Refactor parity calculations to use the standard parity8() helper. This
>> >> change eliminates redundant implementations and improves code
>> >> efficiency.  
>...
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>> 
>> (Also, the parity functions really ought to return bool, and be flagged 
>> __attribute_const__.)
>> 
>> static inline __attribute_const__ bool _arch_parity8(u8 val)
>> {
>>  bool parity;
>>  asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
>>  return parity;
>> }
>> 
>> static inline __attribute_const__ bool _arch_parity16(u16 val)
>> {
>>  bool parity;
>>  asm("xor %h0,%b0" : "=@ccnp" (parity), "+Q" (val));
>>  return parity;
>> }
>
>The same (with fixes) can be done for parity64() on 32bit.
>
>> 
>> In the generic algorithm, you probably should implement parity16() in 
>> terms of parity8(), parity32() in terms of parity16() and so on:
>> 
>> static inline __attribute_const__ bool parity16(u16 val)
>> {
>> #ifdef ARCH_HAS_PARITY16
>>  if (!__builtin_const_p(val))
>>  return _arch_parity16(val);
>> #endif
>>  return parity8(val ^ (val >> 8));
>> }
>> 
>> This picks up the architectural versions when available.
>
>Not the best way to do that.
>Make the name in the #ifdef the same as the function and define
>a default one if the architecture doesn't define one.
>So:
>
>static inline parity16(u16 val)
>{
>   return __builtin_const_p(val) ? _parity_const(val) : _parity16(val);
>}
>
>#ifndef _parity16
>static inline _parity16(u15 val)
>{
>   return _parity8(val ^ (val >> 8));
>}
>#endif
>
>You only need one _parity_const().
>
>> 
>> Furthermore, if a popcnt instruction is known to exist, then the parity 
>> is simply popcnt(x) & 1.
>
>Beware that some popcnt instructions are slow.
>
>   David
>
>> 
>>  -hpa
>> 
>> 
>

Seems more verbose than just #ifdef _arch_parity8 et al since the const and 
generic code cases are the same (which they aren't always.)

But that part is a good idea, especially since on at least *some* architectures 
like x86 doing: 

#define _arch_parity8(x) __builtin_parity(x)

... etc is entirely reasonable and lets gcc use an already available parity 
flag should one be available.

The inline wrapper, of course, takes care of the type mangling.


Re: [PATCH 02/17] bitops: Add generic parity calculation for u64

2025-02-25 Thread H. Peter Anvin
On February 24, 2025 5:34:31 AM PST, David Laight 
 wrote:
>On Mon, 24 Feb 2025 08:09:43 +0100
>Jiri Slaby  wrote:
>
>> On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
>> > Several parts of the kernel open-code parity calculations using
>> > different methods. Add a generic parity64() helper implemented with the
>> > same efficient approach as parity8().
>> > 
>> > Co-developed-by: Yu-Chun Lin 
>> > Signed-off-by: Yu-Chun Lin 
>> > Signed-off-by: Kuan-Wei Chiu 
>> > ---
>> >   include/linux/bitops.h | 22 ++
>> >   1 file changed, 22 insertions(+)
>> > 
>> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> > index fb13dedad7aa..67677057f5e2 100644
>> > --- a/include/linux/bitops.h
>> > +++ b/include/linux/bitops.h
>> > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
>> >return (0x6996 >> (val & 0xf)) & 1;
>> >   }
>> >   
>> > +/**
>> > + * parity64 - get the parity of an u64 value
>> > + * @value: the value to be examined
>> > + *
>> > + * Determine the parity of the u64 argument.
>> > + *
>> > + * Returns:
>> > + * 0 for even parity, 1 for odd parity
>> > + */
>> > +static inline int parity64(u64 val)
>> > +{
>> > +  /*
>> > +   * One explanation of this algorithm:
>> > +   * https://funloop.org/codex/problem/parity/README.html
>> > +   */
>> > +  val ^= val >> 32;  
>> 
>> Do we need all these implementations? Can't we simply use parity64() for 
>> any 8, 16 and 32-bit values too? I.e. have one parity().
>
>I'm not sure you can guarantee that the compiler will optimise away
>the unnecessary operations.
>
>But:
>static inline int parity64(u64 val)
>{
>   return parity32(val ^ (val >> 32))
>}
>
>should be ok.
>It will also work on x86-32 where parity32() can just check the parity flag.
>Although you are unlikely to manage to use the the PF the xor sets.
>
>   David
>
>> 
>> > +  val ^= val >> 16;
>> > +  val ^= val >> 8;
>> > +  val ^= val >> 4;
>> > +  return (0x6996 >> (val & 0xf)) & 1;
>> > +}
>> > +
>> >   /**
>> >* __ffs64 - find first set bit in a 64 bit word
>> >* @word: The 64 bit word  
>> 
>> 
>

Sure you can; you do need an 8- and a 16-bit arch implementation though (the 16 
bit one being xor %rh,%rl)


Re: [PATCH 02/17] bitops: Add generic parity calculation for u64

2025-02-25 Thread H. Peter Anvin
On February 24, 2025 5:34:31 AM PST, David Laight 
 wrote:
>On Mon, 24 Feb 2025 08:09:43 +0100
>Jiri Slaby  wrote:
>
>> On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
>> > Several parts of the kernel open-code parity calculations using
>> > different methods. Add a generic parity64() helper implemented with the
>> > same efficient approach as parity8().
>> > 
>> > Co-developed-by: Yu-Chun Lin 
>> > Signed-off-by: Yu-Chun Lin 
>> > Signed-off-by: Kuan-Wei Chiu 
>> > ---
>> >   include/linux/bitops.h | 22 ++
>> >   1 file changed, 22 insertions(+)
>> > 
>> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> > index fb13dedad7aa..67677057f5e2 100644
>> > --- a/include/linux/bitops.h
>> > +++ b/include/linux/bitops.h
>> > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
>> >return (0x6996 >> (val & 0xf)) & 1;
>> >   }
>> >   
>> > +/**
>> > + * parity64 - get the parity of an u64 value
>> > + * @value: the value to be examined
>> > + *
>> > + * Determine the parity of the u64 argument.
>> > + *
>> > + * Returns:
>> > + * 0 for even parity, 1 for odd parity
>> > + */
>> > +static inline int parity64(u64 val)
>> > +{
>> > +  /*
>> > +   * One explanation of this algorithm:
>> > +   * https://funloop.org/codex/problem/parity/README.html
>> > +   */
>> > +  val ^= val >> 32;  
>> 
>> Do we need all these implementations? Can't we simply use parity64() for 
>> any 8, 16 and 32-bit values too? I.e. have one parity().
>
>I'm not sure you can guarantee that the compiler will optimise away
>the unnecessary operations.
>
>But:
>static inline int parity64(u64 val)
>{
>   return parity32(val ^ (val >> 32))
>}
>
>should be ok.
>It will also work on x86-32 where parity32() can just check the parity flag.
>Although you are unlikely to manage to use the the PF the xor sets.
>
>   David
>
>> 
>> > +  val ^= val >> 16;
>> > +  val ^= val >> 8;
>> > +  val ^= val >> 4;
>> > +  return (0x6996 >> (val & 0xf)) & 1;
>> > +}
>> > +
>> >   /**
>> >* __ffs64 - find first set bit in a 64 bit word
>> >* @word: The 64 bit word  
>> 
>> 
>

Incidentally, in all of this, didn't anyone notice __builtin_parity()?


Re: [PATCH 02/17] bitops: Add generic parity calculation for u64

2025-02-27 Thread H. Peter Anvin
On February 27, 2025 1:57:41 PM PST, David Laight 
 wrote:
>On Thu, 27 Feb 2025 13:05:29 -0500
>Yury Norov  wrote:
>
>> On Wed, Feb 26, 2025 at 10:29:11PM +, David Laight wrote:
>> > On Mon, 24 Feb 2025 14:27:03 -0500
>> > Yury Norov  wrote:
>> >   
>> > > +#define parity(val) \
>> > > +({  \
>> > > +u64 __v = (val);\
>> > > +int __ret;  \
>> > > +switch (BITS_PER_TYPE(val)) {   \
>> > > +case 64:\
>> > > +__v ^= __v >> 32;   \
>> > > +fallthrough;\
>> > > +case 32:\
>> > > +__v ^= __v >> 16;   \
>> > > +fallthrough;\
>> > > +case 16:\
>> > > +__v ^= __v >> 8;\
>> > > +fallthrough;\
>> > > +case 8: \
>> > > +__v ^= __v >> 4;\
>> > > +__ret =  (0x6996 >> (__v & 0xf)) & 1;   \
>> > > +break;  \
>> > > +default:\
>> > > +BUILD_BUG();\
>> > > +}   \
>> > > +__ret;  \
>> > > +})
>> > > +  
>> > 
>> > You really don't want to do that!
>> > gcc makes a right hash of it for x86 (32bit).
>> > See https://www.godbolt.org/z/jG8dv3cvs  
>> 
>> GCC fails to even understand this. Of course, the __v should be an
>> __auto_type. But that way GCC fails to understand that case 64 is
>> a dead code for all smaller type and throws a false-positive 
>> Wshift-count-overflow. This is a known issue, unfixed for 25 years!
>
>Just do __v ^= __v >> 16 >> 16
>
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210
>>  
>> > You do better using a __v32 after the 64bit xor.  
>> 
>> It should be an __auto_type. I already mentioned. So because of that,
>> we can either do something like this:
>> 
>>   #define parity(val)\
>>   ({ \
>>   #ifdef CLANG  \
>>  __auto_type __v = (val);\
>>   #else /* GCC; because of this and that */ \
>>  u64 __v = (val);\
>>   #endif\
>>  int __ret;  \
>> 
>> Or simply disable Wshift-count-overflow for GCC.
>
>For 64bit values on 32bit it is probably better to do:
>int p32(unsigned long long x)
>{
>unsigned int lo = x;
>lo ^= x >> 32;
>lo ^= lo >> 16;
>lo ^= lo >> 8;
>lo ^= lo >> 4;
>return (0x6996 >> (lo & 0xf)) & 1;
>}
>That stops the compiler doing 64bit shifts (ok on x86, but probably not 
>elsewhere).
>It is likely to be reasonably optimal for most 64bit cpu as well.
>(For x86-64 it probably removes a load of REX prefix.)
>(It adds an extra instruction to arm because if its barrel shifter.)
>
>
>> 
>> > Even the 64bit version is probably sub-optimal (both gcc and clang).
>> > The whole lot ends up being a bit single register dependency chain.
>> > You want to do:  
>> 
>> No, I don't. I want to have a sane compiler that does it for me.
>> 
>> >mov %eax, %edx
>> >shrl $n, %eax
>> >xor %edx, %eax
>> > so that the 'mov' and 'shrl' can happen in the same clock
>> > (without relying on the register-register move being optimised out).
>> > 
>> > I dropped in the arm64 for an example of where the magic shift of 6996
>> > just adds an extra instruction.  
>> 
>> It's still unclear to me that this parity thing is used in hot paths.
>> If that holds, it's unclear that your hand-made version is better than
>> what's generated by GCC.
>
>I wasn't seriously considering doing that optimisation.
>Perhaps just hoping is might make a compiler person think :-)
>
>   David
>
>> 
>> Do you have any perf test?
>> 
>> Thanks,
>> Yury
>

What the compiler people need to do is to not make __builtin_parity*() generate 
crap.


Re: [PATCH v3 01/16] bitops: Change parity8() return type to bool

2025-03-07 Thread H. Peter Anvin
On March 7, 2025 11:30:08 AM PST, Yury Norov  wrote:
>On Fri, Mar 07, 2025 at 04:14:34AM -0800, H. Peter Anvin wrote:
>> On March 7, 2025 4:13:26 AM PST, Ingo Molnar  wrote:
>> >
>> >* Jiri Slaby  wrote:
>> >
>> >> On 07. 03. 25, 12:38, Ingo Molnar wrote:
>> >> > 
>> >> > * Jiri Slaby  wrote:
>> >> > 
>> >> > > On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:
>> >> > > > Change return type to bool for better clarity. Update the kernel doc
>> >> > > > comment accordingly, including fixing "@value" to "@val" and 
>> >> > > > adjusting
>> >> > > > examples. Also mark the function with __attribute_const__ to allow
>> >> > > > potential compiler optimizations.
>> >> > > > 
>> >> > > > Co-developed-by: Yu-Chun Lin 
>> >> > > > Signed-off-by: Yu-Chun Lin 
>> >> > > > Signed-off-by: Kuan-Wei Chiu 
>> >> > > > ---
>> >> > > >include/linux/bitops.h | 10 +-
>> >> > > >1 file changed, 5 insertions(+), 5 deletions(-)
>> >> > > > 
>> >> > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> >> > > > index c1cb53cf2f0f..44e5765b8bec 100644
>> >> > > > --- a/include/linux/bitops.h
>> >> > > > +++ b/include/linux/bitops.h
>> >> > > > @@ -231,26 +231,26 @@ static inline int 
>> >> > > > get_count_order_long(unsigned long l)
>> >> > > >/**
>> >> > > > * parity8 - get the parity of an u8 value
>> >> > > > - * @value: the value to be examined
>> >> > > > + * @val: the value to be examined
>> >> > > > *
>> >> > > > * Determine the parity of the u8 argument.
>> >> > > > *
>> >> > > > * Returns:
>> >> > > > - * 0 for even parity, 1 for odd parity
>> >> > > > + * false for even parity, true for odd parity
>> >> > > 
>> >> > > This occurs somehow inverted to me. When something is in parity means 
>> >> > > that
>> >> > > it has equal number of 1s and 0s. I.e. return true for even 
>> >> > > distribution.
>> >> > > Dunno what others think? Or perhaps this should be dubbed 
>> >> > > odd_parity() when
>> >> > > bool is returned? Then you'd return true for odd.
>> >> > 
>> >> > OTOH:
>> >> > 
>> >> >   - '0' is an even number and is returned for even parity,
>> >> >   - '1' is an odd  number and is returned for odd  parity.
>> >> 
>> >> Yes, that used to make sense for me. For bool/true/false, it no longer 
>> >> does.
>> >> But as I wrote, it might be only me...
>> >
>> >No strong opinion on this from me either, I'd guess existing practice 
>> >with other parity functions should probably control. (If a coherent 
>> >praxis exists.).
>> >
>> >Thanks,
>> >
>> >Ingo
>> 
>> Instead of "bool" think of it as "bit" and it makes more sense
>
>So, to help people thinking that way we can introduce a corresponding
>type:
>typedef unsigned _BitInt(1) u1;
>
>It already works for clang, and GCC is going to adopt it with std=c23.
>We can make u1 an alias to bool for GCC for a while. If you guys like
>it, I can send a patch.
>
>For clang it prints quite a nice overflow warning:
>
>tst.c:59:9: warning: implicit conversion from 'int' to 'u1' (aka 'unsigned 
>_BitInt(1)') changes value from 2 to 0 [-Wconstant-conversion]
>   59 | u1 r = 2;
>  |~   ^
>
>Thanks,
>Yury

No, for a whole bunch of reasons.


Re: [PATCH v3 01/16] bitops: Change parity8() return type to bool

2025-03-07 Thread H. Peter Anvin
On March 7, 2025 4:13:26 AM PST, Ingo Molnar  wrote:
>
>* Jiri Slaby  wrote:
>
>> On 07. 03. 25, 12:38, Ingo Molnar wrote:
>> > 
>> > * Jiri Slaby  wrote:
>> > 
>> > > On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:
>> > > > Change return type to bool for better clarity. Update the kernel doc
>> > > > comment accordingly, including fixing "@value" to "@val" and adjusting
>> > > > examples. Also mark the function with __attribute_const__ to allow
>> > > > potential compiler optimizations.
>> > > > 
>> > > > Co-developed-by: Yu-Chun Lin 
>> > > > Signed-off-by: Yu-Chun Lin 
>> > > > Signed-off-by: Kuan-Wei Chiu 
>> > > > ---
>> > > >include/linux/bitops.h | 10 +-
>> > > >1 file changed, 5 insertions(+), 5 deletions(-)
>> > > > 
>> > > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> > > > index c1cb53cf2f0f..44e5765b8bec 100644
>> > > > --- a/include/linux/bitops.h
>> > > > +++ b/include/linux/bitops.h
>> > > > @@ -231,26 +231,26 @@ static inline int get_count_order_long(unsigned 
>> > > > long l)
>> > > >/**
>> > > > * parity8 - get the parity of an u8 value
>> > > > - * @value: the value to be examined
>> > > > + * @val: the value to be examined
>> > > > *
>> > > > * Determine the parity of the u8 argument.
>> > > > *
>> > > > * Returns:
>> > > > - * 0 for even parity, 1 for odd parity
>> > > > + * false for even parity, true for odd parity
>> > > 
>> > > This occurs somehow inverted to me. When something is in parity means 
>> > > that
>> > > it has equal number of 1s and 0s. I.e. return true for even distribution.
>> > > Dunno what others think? Or perhaps this should be dubbed odd_parity() 
>> > > when
>> > > bool is returned? Then you'd return true for odd.
>> > 
>> > OTOH:
>> > 
>> >   - '0' is an even number and is returned for even parity,
>> >   - '1' is an odd  number and is returned for odd  parity.
>> 
>> Yes, that used to make sense for me. For bool/true/false, it no longer does.
>> But as I wrote, it might be only me...
>
>No strong opinion on this from me either, I'd guess existing practice 
>with other parity functions should probably control. (If a coherent 
>praxis exists.).
>
>Thanks,
>
>   Ingo

Instead of "bool" think of it as "bit" and it makes more sense


Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-06 Thread H. Peter Anvin
On March 6, 2025 8:25:25 AM PST, Kuan-Wei Chiu  wrote:
>Several parts of the kernel contain redundant implementations of parity
>calculations for 16/32/64-bit values. Introduces generic
>parity16/32/64() helpers in bitops.h, providing a standardized
>and optimized implementation. 
>
>Subsequent patches refactor various kernel components to replace
>open-coded parity calculations with the new helpers, reducing code
>duplication and improving maintainability.
>
>Co-developed-by: Yu-Chun Lin 
>Signed-off-by: Yu-Chun Lin 
>Signed-off-by: Kuan-Wei Chiu 
>---
>In v3, I use parityXX() instead of the parity() macro since the
>parity() macro may generate suboptimal code and requires special hacks
>to make GCC happy. If anyone still prefers a single parity() macro,
>please let me know.
>
>Additionally, I changed parityXX() << y users to !!parityXX() << y
>because, unlike C++, C does not guarantee that true casts to int as 1.
>
>Changes in v3:
>- Avoid using __builtin_parity.
>- Change return type to bool.
>- Drop parity() macro.
>- Change parityXX() << y to !!parityXX() << y.
>
>
>Changes in v2:
>- Provide fallback functions for __builtin_parity() when the compiler
>  decides not to inline it
>- Use __builtin_parity() when no architecture-specific implementation
>  is available
>- Optimize for constant folding when val is a compile-time constant
>- Add a generic parity() macro
>- Drop the x86 bootflag conversion patch since it has been merged into
>  the tip tree
>
>v1: https://lore.kernel.org/lkml/20250223164217.2139331-1-visitor...@gmail.com/
>v2: https://lore.kernel.org/lkml/20250301142409.2513835-1-visitor...@gmail.com/
>
>Kuan-Wei Chiu (16):
>  bitops: Change parity8() return type to bool
>  bitops: Add parity16(), parity32(), and parity64() helpers
>  media: media/test_drivers: Replace open-coded parity calculation with
>parity8()
>  media: pci: cx18-av-vbi: Replace open-coded parity calculation with
>parity8()
>  media: saa7115: Replace open-coded parity calculation with parity8()
>  serial: max3100: Replace open-coded parity calculation with parity8()
>  lib/bch: Replace open-coded parity calculation with parity32()
>  Input: joystick - Replace open-coded parity calculation with
>parity32()
>  net: ethernet: oa_tc6: Replace open-coded parity calculation with
>parity32()
>  wifi: brcm80211: Replace open-coded parity calculation with parity32()
>  drm/bridge: dw-hdmi: Replace open-coded parity calculation with
>parity32()
>  mtd: ssfdc: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity64()
>  Input: joystick - Replace open-coded parity calculation with
>parity64()
>  nfp: bpf: Replace open-coded parity calculation with parity64()
>
> drivers/fsi/fsi-master-i2cr.c | 18 ++-
> .../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +--
> drivers/input/joystick/grip_mp.c  | 17 +-
> drivers/input/joystick/sidewinder.c   | 24 ++---
> drivers/media/i2c/saa7115.c   | 12 +
> drivers/media/pci/cx18/cx18-av-vbi.c  | 12 +
> .../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +--
> drivers/mtd/ssfdc.c   | 20 ++-
> drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
> drivers/net/ethernet/oa_tc6.c | 19 ++-
> .../broadcom/brcm80211/brcmsmac/dma.c | 16 +-
> drivers/tty/serial/max3100.c  |  3 +-
> include/linux/bitops.h| 52 +--
> lib/bch.c | 14 +
> 14 files changed, 77 insertions(+), 153 deletions(-)
>

(int)true most definitely is guaranteed to be 1.


Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-06 Thread H. Peter Anvin
On March 6, 2025 8:25:25 AM PST, Kuan-Wei Chiu  wrote:
>Several parts of the kernel contain redundant implementations of parity
>calculations for 16/32/64-bit values. Introduces generic
>parity16/32/64() helpers in bitops.h, providing a standardized
>and optimized implementation. 
>
>Subsequent patches refactor various kernel components to replace
>open-coded parity calculations with the new helpers, reducing code
>duplication and improving maintainability.
>
>Co-developed-by: Yu-Chun Lin 
>Signed-off-by: Yu-Chun Lin 
>Signed-off-by: Kuan-Wei Chiu 
>---
>In v3, I use parityXX() instead of the parity() macro since the
>parity() macro may generate suboptimal code and requires special hacks
>to make GCC happy. If anyone still prefers a single parity() macro,
>please let me know.
>
>Additionally, I changed parityXX() << y users to !!parityXX() << y
>because, unlike C++, C does not guarantee that true casts to int as 1.
>
>Changes in v3:
>- Avoid using __builtin_parity.
>- Change return type to bool.
>- Drop parity() macro.
>- Change parityXX() << y to !!parityXX() << y.
>
>
>Changes in v2:
>- Provide fallback functions for __builtin_parity() when the compiler
>  decides not to inline it
>- Use __builtin_parity() when no architecture-specific implementation
>  is available
>- Optimize for constant folding when val is a compile-time constant
>- Add a generic parity() macro
>- Drop the x86 bootflag conversion patch since it has been merged into
>  the tip tree
>
>v1: https://lore.kernel.org/lkml/20250223164217.2139331-1-visitor...@gmail.com/
>v2: https://lore.kernel.org/lkml/20250301142409.2513835-1-visitor...@gmail.com/
>
>Kuan-Wei Chiu (16):
>  bitops: Change parity8() return type to bool
>  bitops: Add parity16(), parity32(), and parity64() helpers
>  media: media/test_drivers: Replace open-coded parity calculation with
>parity8()
>  media: pci: cx18-av-vbi: Replace open-coded parity calculation with
>parity8()
>  media: saa7115: Replace open-coded parity calculation with parity8()
>  serial: max3100: Replace open-coded parity calculation with parity8()
>  lib/bch: Replace open-coded parity calculation with parity32()
>  Input: joystick - Replace open-coded parity calculation with
>parity32()
>  net: ethernet: oa_tc6: Replace open-coded parity calculation with
>parity32()
>  wifi: brcm80211: Replace open-coded parity calculation with parity32()
>  drm/bridge: dw-hdmi: Replace open-coded parity calculation with
>parity32()
>  mtd: ssfdc: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity64()
>  Input: joystick - Replace open-coded parity calculation with
>parity64()
>  nfp: bpf: Replace open-coded parity calculation with parity64()
>
> drivers/fsi/fsi-master-i2cr.c | 18 ++-
> .../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +--
> drivers/input/joystick/grip_mp.c  | 17 +-
> drivers/input/joystick/sidewinder.c   | 24 ++---
> drivers/media/i2c/saa7115.c   | 12 +
> drivers/media/pci/cx18/cx18-av-vbi.c  | 12 +
> .../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +--
> drivers/mtd/ssfdc.c   | 20 ++-
> drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
> drivers/net/ethernet/oa_tc6.c | 19 ++-
> .../broadcom/brcm80211/brcmsmac/dma.c | 16 +-
> drivers/tty/serial/max3100.c  |  3 +-
> include/linux/bitops.h| 52 +--
> lib/bch.c | 14 +
> 14 files changed, 77 insertions(+), 153 deletions(-)
>

!!x is used with a value that is not necessary booleanized already, and is 
exactly equivalent to (x ? true : false). It is totally redundant on a value 
known to be bool.

If (int)true wasn't inherently 1, then !! wouldn't work either. 

There was a time when some code would use as a temporary hack: 

typedef enum { false, true } bool;

... when compiling on pre-C99 compilers; in that case a (bool) case wouldn't 
necessarily work as expected, whereas !! would. Furthermore, unlike (bool), !! 
works in the preprocessor.


Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-07 Thread H. Peter Anvin
On March 7, 2025 11:53:10 AM PST, David Laight  
wrote:
>On Fri, 07 Mar 2025 11:30:35 -0800
>"H. Peter Anvin"  wrote:
>
>> On March 7, 2025 10:49:56 AM PST, Andrew Cooper  
>> wrote:
>> >> (int)true most definitely is guaranteed to be 1.  
>> >
>> >That's not technically correct any more.
>> >
>> >GCC has introduced hardened bools that intentionally have bit patterns
>> >other than 0 and 1.
>> >
>> >https://gcc.gnu.org/gcc-14/changes.html
>> >
>> >~Andrew  
>> 
>> Bit patterns in memory maybe (not that I can see the Linux kernel using 
>> them) but
>> for compiler-generated conversations that's still a given, or the manager 
>> isn't C
>> or anything even remotely like it.
>> 
>
>The whole idea of 'bool' is pretty much broken by design.
>The underlying problem is that values other than 'true' and 'false' can
>always get into 'bool' variables.
>
>Once that has happened it is all fubar.
>
>Trying to sanitise a value with (say):
>int f(bool v)
>{
>   return (int)v & 1;
>}
>just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>
>I really don't see how using (say) 0xaa and 0x55 helps.
>What happens if the value is wrong? a trap or exception?, good luck recovering
>from that.
>
>   David

Did you just discover GIGO?


Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-07 Thread H. Peter Anvin
On March 7, 2025 10:49:56 AM PST, Andrew Cooper  
wrote:
>> (int)true most definitely is guaranteed to be 1.
>
>That's not technically correct any more.
>
>GCC has introduced hardened bools that intentionally have bit patterns
>other than 0 and 1.
>
>https://gcc.gnu.org/gcc-14/changes.html
>
>~Andrew

Bit patterns in memory maybe (not that I can see the Linux kernel using them) 
but for compiler-generated conversations that's still a given, or the manager 
isn't C or anything even remotely like it.


Re: [PATCH v3 01/16] bitops: Change parity8() return type to bool

2025-03-07 Thread H. Peter Anvin
On March 7, 2025 11:36:43 AM PST, David Laight  
wrote:
>On Fri, 7 Mar 2025 12:42:41 +0100
>Jiri Slaby  wrote:
>
>> On 07. 03. 25, 12:38, Ingo Molnar wrote:
>> > 
>> > * Jiri Slaby  wrote:
>> >   
>> >> On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:  
>> >>> Change return type to bool for better clarity. Update the kernel doc
>> >>> comment accordingly, including fixing "@value" to "@val" and adjusting
>> >>> examples. Also mark the function with __attribute_const__ to allow
>> >>> potential compiler optimizations.
>> >>>
>> >>> Co-developed-by: Yu-Chun Lin 
>> >>> Signed-off-by: Yu-Chun Lin 
>> >>> Signed-off-by: Kuan-Wei Chiu 
>> >>> ---
>> >>>include/linux/bitops.h | 10 +-
>> >>>1 file changed, 5 insertions(+), 5 deletions(-)
>> >>>
>> >>> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> >>> index c1cb53cf2f0f..44e5765b8bec 100644
>> >>> --- a/include/linux/bitops.h
>> >>> +++ b/include/linux/bitops.h
>> >>> @@ -231,26 +231,26 @@ static inline int get_count_order_long(unsigned 
>> >>> long l)
>> >>>/**
>> >>> * parity8 - get the parity of an u8 value
>> >>> - * @value: the value to be examined
>> >>> + * @val: the value to be examined
>> >>> *
>> >>> * Determine the parity of the u8 argument.
>> >>> *
>> >>> * Returns:
>> >>> - * 0 for even parity, 1 for odd parity
>> >>> + * false for even parity, true for odd parity  
>> >>
>> >> This occurs somehow inverted to me. When something is in parity means that
>> >> it has equal number of 1s and 0s. I.e. return true for even distribution.
>> >> Dunno what others think? Or perhaps this should be dubbed odd_parity() 
>> >> when
>> >> bool is returned? Then you'd return true for odd.  
>> > 
>> > OTOH:
>> > 
>> >   - '0' is an even number and is returned for even parity,
>> >   - '1' is an odd  number and is returned for odd  parity.  
>> 
>> Yes, that used to make sense for me. For bool/true/false, it no longer 
>> does. But as I wrote, it might be only me...
>
>No me as well, I've made the same comment before.
>When reading code I don't want to have to look up a function definition.
>There is even scope for having parity_odd() and parity_even().
>And, with the version that shifts a constant right you want to invert
>the constant!
>
>   David
>
>
>
>

Of course, for me, if I saw "parity_odd()" I would think of it as a function 
that caused the parity to become odd, i.e.

if (!parity(x))
  x ^= 1 << 7;


Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-09 Thread H. Peter Anvin
On March 9, 2025 8:48:26 AM PDT, Kuan-Wei Chiu  wrote:
>On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
>> On March 7, 2025 11:53:10 AM PST, David Laight 
>>  wrote:
>> >On Fri, 07 Mar 2025 11:30:35 -0800
>> >"H. Peter Anvin"  wrote:
>> >
>> >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper 
>> >>  wrote:
>> >> >> (int)true most definitely is guaranteed to be 1.  
>> >> >
>> >> >That's not technically correct any more.
>> >> >
>> >> >GCC has introduced hardened bools that intentionally have bit patterns
>> >> >other than 0 and 1.
>> >> >
>> >> >https://gcc.gnu.org/gcc-14/changes.html
>> >> >
>> >> >~Andrew  
>> >> 
>> >> Bit patterns in memory maybe (not that I can see the Linux kernel using 
>> >> them) but
>> >> for compiler-generated conversations that's still a given, or the manager 
>> >> isn't C
>> >> or anything even remotely like it.
>> >> 
>> >
>> >The whole idea of 'bool' is pretty much broken by design.
>> >The underlying problem is that values other than 'true' and 'false' can
>> >always get into 'bool' variables.
>> >
>> >Once that has happened it is all fubar.
>> >
>> >Trying to sanitise a value with (say):
>> >int f(bool v)
>> >{
>> >return (int)v & 1;
>> >}
>> >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>> >
>> >I really don't see how using (say) 0xaa and 0x55 helps.
>> >What happens if the value is wrong? a trap or exception?, good luck 
>> >recovering
>> >from that.
>> >
>> >David
>> 
>> Did you just discover GIGO?
>
>Thanks for all the suggestions.
>
>I don't have a strong opinion on the naming or return type. I'm still a
>bit confused about whether I can assume that casting bool to int always
>results in 0 or 1.
>
>If that's the case, since most people prefer bool over int as the
>return type and some are against introducing u1, my current plan is to
>use the following in the next version:
>
>bool parity_odd(u64 val);
>
>This keeps the bool return type, renames the function for better
>clarity, and avoids extra maintenance burden by having just one
>function.
>
>If I can't assume that casting bool to int always results in 0 or 1,
>would it be acceptable to keep the return type as int?
>
>Would this work for everyone?
>
>Regards,
>Kuan-Wei

You *CAN* safely assume that bool is an integer type which always has the value 
0 or 1.


Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-11 Thread H. Peter Anvin
On March 11, 2025 3:01:30 PM PDT, Yury Norov  wrote:
>On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
>> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
>> > On March 7, 2025 11:53:10 AM PST, David Laight 
>> >  wrote:
>> > >On Fri, 07 Mar 2025 11:30:35 -0800
>> > >"H. Peter Anvin"  wrote:
>> > >
>> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper 
>> > >>  wrote:
>> > >> >> (int)true most definitely is guaranteed to be 1.  
>> > >> >
>> > >> >That's not technically correct any more.
>> > >> >
>> > >> >GCC has introduced hardened bools that intentionally have bit patterns
>> > >> >other than 0 and 1.
>> > >> >
>> > >> >https://gcc.gnu.org/gcc-14/changes.html
>> > >> >
>> > >> >~Andrew  
>> > >> 
>> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using 
>> > >> them) but
>> > >> for compiler-generated conversations that's still a given, or the 
>> > >> manager isn't C
>> > >> or anything even remotely like it.
>> > >> 
>> > >
>> > >The whole idea of 'bool' is pretty much broken by design.
>> > >The underlying problem is that values other than 'true' and 'false' can
>> > >always get into 'bool' variables.
>> > >
>> > >Once that has happened it is all fubar.
>> > >
>> > >Trying to sanitise a value with (say):
>> > >int f(bool v)
>> > >{
>> > >  return (int)v & 1;
>> > >}
>> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>> > >
>> > >I really don't see how using (say) 0xaa and 0x55 helps.
>> > >What happens if the value is wrong? a trap or exception?, good luck 
>> > >recovering
>> > >from that.
>> > >
>> > >  David
>> > 
>> > Did you just discover GIGO?
>> 
>> Thanks for all the suggestions.
>> 
>> I don't have a strong opinion on the naming or return type. I'm still a
>> bit confused about whether I can assume that casting bool to int always
>> results in 0 or 1.
>> 
>> If that's the case, since most people prefer bool over int as the
>> return type and some are against introducing u1, my current plan is to
>> use the following in the next version:
>> 
>> bool parity_odd(u64 val);
>> 
>> This keeps the bool return type, renames the function for better
>> clarity, and avoids extra maintenance burden by having just one
>> function.
>> 
>> If I can't assume that casting bool to int always results in 0 or 1,
>> would it be acceptable to keep the return type as int?
>> 
>> Would this work for everyone?
>
>Alright, it's clearly a split opinion. So what I would do myself in
>such case is to look at existing code and see what people who really
>need parity invent in their drivers:
>
> bool  parity_odd
>static inline int parity8(u8 val)   -   -
>static u8 calc_parity(u8 val)   -   -
>static int odd_parity(u8 c) -   +
>static int saa711x_odd_parity   -   +
>static int max3100_do_parity-   -
>static inline int parity(unsigned x)-   -
>static int bit_parity(u32 pkt)  -   -
>static int oa_tc6_get_parity(u32 p) -   -
>static u32 parity32(__le32 data)-   -
>static u32 parity(u32 sample)   -   -
>static int get_parity(int number,   -   -
>  int size)
>static bool i2cr_check_parity32(u32 v,  +   -
>bool parity)
>static bool i2cr_check_parity64(u64 v)  +   -
>static int sw_parity(__u64 t)   -   -
>static bool parity(u64 value)   +   -
>
>Now you can refer to that table say that int parity(uXX) is what
>people want to see in their drivers.
>
>Whichever interface you choose, please discuss it's pros and cons.
>What bloat-o-meter says for each option? What's maintenance burden?
>Perf test? Look at generated code?
>
>I personally for a macro returning boolean, something like I
>proposed at the very beginning.
>
>Thanks,
>Yury

Also, please at least provide a way for an arch to opt in to using the 
builtins, which seem to produce as good results or better at least on some 
architectures like x86 and probably with CPU options that imply fast popcnt is 
available.


Re: [PATCH v3 01/16] bitops: Change parity8() return type to bool

2025-03-13 Thread H. Peter Anvin
On March 13, 2025 9:24:38 AM PDT, Yury Norov  wrote:
>On Wed, Mar 12, 2025 at 05:09:16PM -0700, H. Peter Anvin wrote:
>> On March 12, 2025 4:56:31 PM PDT, Jacob Keller  
>> wrote:
>
>[...]
>
>> >This is really a question of whether you expect odd or even parity as
>> >the "true" value. I think that would depend on context, and we may not
>> >reach a good consensus.
>> >
>> >I do agree that my brain would jump to "true is even, false is odd".
>> >However, I also agree returning the value as 0 for even and 1 for odd
>> >kind of made sense before, and updating this to be a bool and then
>> >requiring to switch all the callers is a bit obnoxious...
>> 
>> Odd = 1 = true is the only same definition. It is a bitwise XOR, or sum mod 
>> 1.
>
>The x86 implementation will be "popcnt(val) & 1", right? So if we
>choose to go with odd == false, we'll have to add an extra negation.
>So because it's a purely conventional thing, let's just pick a simpler
>one?
>
>Compiler's builtin parity() returns 1 for odd.
>
>Thanks,
>Yury

The x86 implementation, no, but there will be plenty of others having that 
exact definition.


Re: [PATCH v3 01/16] bitops: Change parity8() return type to bool

2025-03-12 Thread H. Peter Anvin
On March 12, 2025 4:56:31 PM PDT, Jacob Keller  wrote:
>
>
>On 3/7/2025 11:36 AM, David Laight wrote:
>> On Fri, 7 Mar 2025 12:42:41 +0100
>> Jiri Slaby  wrote:
>> 
>>> On 07. 03. 25, 12:38, Ingo Molnar wrote:

 * Jiri Slaby  wrote:
   
> On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:  
>> Change return type to bool for better clarity. Update the kernel doc
>> comment accordingly, including fixing "@value" to "@val" and adjusting
>> examples. Also mark the function with __attribute_const__ to allow
>> potential compiler optimizations.
>>
>> Co-developed-by: Yu-Chun Lin 
>> Signed-off-by: Yu-Chun Lin 
>> Signed-off-by: Kuan-Wei Chiu 
>> ---
>>include/linux/bitops.h | 10 +-
>>1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> index c1cb53cf2f0f..44e5765b8bec 100644
>> --- a/include/linux/bitops.h
>> +++ b/include/linux/bitops.h
>> @@ -231,26 +231,26 @@ static inline int get_count_order_long(unsigned 
>> long l)
>>/**
>> * parity8 - get the parity of an u8 value
>> - * @value: the value to be examined
>> + * @val: the value to be examined
>> *
>> * Determine the parity of the u8 argument.
>> *
>> * Returns:
>> - * 0 for even parity, 1 for odd parity
>> + * false for even parity, true for odd parity  
>
> This occurs somehow inverted to me. When something is in parity means that
> it has equal number of 1s and 0s. I.e. return true for even distribution.
> Dunno what others think? Or perhaps this should be dubbed odd_parity() 
> when
> bool is returned? Then you'd return true for odd.  

 OTOH:

   - '0' is an even number and is returned for even parity,
   - '1' is an odd  number and is returned for odd  parity.  
>>>
>>> Yes, that used to make sense for me. For bool/true/false, it no longer 
>>> does. But as I wrote, it might be only me...
>> 
>> No me as well, I've made the same comment before.
>> When reading code I don't want to have to look up a function definition.
>> There is even scope for having parity_odd() and parity_even().
>> And, with the version that shifts a constant right you want to invert
>> the constant!
>> 
>>  David
>
>This is really a question of whether you expect odd or even parity as
>the "true" value. I think that would depend on context, and we may not
>reach a good consensus.
>
>I do agree that my brain would jump to "true is even, false is odd".
>However, I also agree returning the value as 0 for even and 1 for odd
>kind of made sense before, and updating this to be a bool and then
>requiring to switch all the callers is a bit obnoxious...

Odd = 1 = true is the only same definition. It is a bitwise XOR, or sum mod 1.


Re: [PATCH v3 01/16] bitops: Change parity8() return type to bool

2025-03-15 Thread H. Peter Anvin
On March 14, 2025 12:06:04 PM PDT, David Laight  
wrote:
>On Thu, 13 Mar 2025 14:09:24 -0700
>Jacob Keller  wrote:
>
>> On 3/13/2025 9:36 AM, H. Peter Anvin wrote:
>> > On March 13, 2025 9:24:38 AM PDT, Yury Norov  wrote: 
>> >  
>> >> On Wed, Mar 12, 2025 at 05:09:16PM -0700, H. Peter Anvin wrote:  
>> >>> On March 12, 2025 4:56:31 PM PDT, Jacob Keller 
>> >>>  wrote:  
>> >>
>> >> [...]
>> >>  
>> >>>> This is really a question of whether you expect odd or even parity as
>> >>>> the "true" value. I think that would depend on context, and we may not
>> >>>> reach a good consensus.
>> >>>>
>> >>>> I do agree that my brain would jump to "true is even, false is odd".
>> >>>> However, I also agree returning the value as 0 for even and 1 for odd
>> >>>> kind of made sense before, and updating this to be a bool and then
>> >>>> requiring to switch all the callers is a bit obnoxious...  
>> >>>
>> >>> Odd = 1 = true is the only same definition. It is a bitwise XOR, or sum 
>> >>> mod 1.  
>> >>
>> >> The x86 implementation will be "popcnt(val) & 1", right? So if we
>> >> choose to go with odd == false, we'll have to add an extra negation.
>> >> So because it's a purely conventional thing, let's just pick a simpler
>> >> one?
>> >>
>> >> Compiler's builtin parity() returns 1 for odd.
>> >>
>> >> Thanks,
>> >> Yury  
>> > 
>> > The x86 implementation, no, but there will be plenty of others having that 
>> > exact definition.  
>> 
>> Makes sense to stick with that existing convention then. Enough to
>> convince me.
>
>There is the possibility that the compiler will treat the builtin as having
>an 'int' result without the constraint of it being zero or one.
>In which case the conversion to bool will be a compare.
>This doesn't happen on x86-64 (gcc or clang) - but who knows elsewhere.
>
>For x86 popcnt(val) & 1 is best (except for parity8) but requires a 
>non-archaic cpu.
>(Probably Nehelem or K10 or later - includes Sandy bridge and all the 'earth 
>movers'.)
>Since performance isn't critical the generic C code is actually ok.
>(The 'parity' flag bit is only set on the low 8 bits.)
>
>   David
>
>

You seem confused. We have already established that the built-in didn't 
currently produce good code on some cpus, but it does on others, with very 
little in between, so it would make sense to use the builtins on an opt-in 
basis.

On x86 8- or 16-bit parity is best don't with test or xor respectively; 32- or 
64-bit parity may use popcnt; test or by reducing down to a parity16 xor.


Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-23 Thread H. Peter Anvin
On March 23, 2025 8:16:24 AM PDT, Kuan-Wei Chiu  wrote:
>On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote:
>> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
>> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
>> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
>> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov  
>> > > > wrote:
>> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
>> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
>> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight 
>> > > > >> >  wrote:
>> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
>> > > > >> > >"H. Peter Anvin"  wrote:
>> > > > >> > >
>> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper 
>> > > > >> > >>  wrote:
>> > > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
>> > > > >> > >> >
>> > > > >> > >> >That's not technically correct any more.
>> > > > >> > >> >
>> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit 
>> > > > >> > >> >patterns
>> > > > >> > >> >other than 0 and 1.
>> > > > >> > >> >
>> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
>> > > > >> > >> >
>> > > > >> > >> >~Andrew  
>> > > > >> > >> 
>> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux 
>> > > > >> > >> kernel using them) but
>> > > > >> > >> for compiler-generated conversations that's still a given, or 
>> > > > >> > >> the manager isn't C
>> > > > >> > >> or anything even remotely like it.
>> > > > >> > >> 
>> > > > >> > >
>> > > > >> > >The whole idea of 'bool' is pretty much broken by design.
>> > > > >> > >The underlying problem is that values other than 'true' and 
>> > > > >> > >'false' can
>> > > > >> > >always get into 'bool' variables.
>> > > > >> > >
>> > > > >> > >Once that has happened it is all fubar.
>> > > > >> > >
>> > > > >> > >Trying to sanitise a value with (say):
>> > > > >> > >int f(bool v)
>> > > > >> > >{
>> > > > >> > > return (int)v & 1;
>> > > > >> > >}
>> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>> > > > >> > >
>> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
>> > > > >> > >What happens if the value is wrong? a trap or exception?, good 
>> > > > >> > >luck recovering
>> > > > >> > >from that.
>> > > > >> > >
>> > > > >> > > David
>> > > > >> > 
>> > > > >> > Did you just discover GIGO?
>> > > > >> 
>> > > > >> Thanks for all the suggestions.
>> > > > >> 
>> > > > >> I don't have a strong opinion on the naming or return type. I'm 
>> > > > >> still a
>> > > > >> bit confused about whether I can assume that casting bool to int 
>> > > > >> always
>> > > > >> results in 0 or 1.
>> > > > >> 
>> > > > >> If that's the case, since most people prefer bool over int as the
>> > > > >> return type and some are against introducing u1, my current plan is 
>> > > > >> to
>> > > > >> use the following in the next version:
>> > > > >> 
>> > > > >> bool parity_odd(u64 val);
>> > > > >> 
>> > &g

Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

2025-03-25 Thread H. Peter Anvin

On 3/23/25 08:16, Kuan-Wei Chiu wrote:


Interface 3: Multiple Functions
Description: bool parity_odd8/16/32/64()
Pros: No need for explicit casting; easy to integrate
   architecture-specific optimizations; except for parity8(), all
   functions are one-liners with no significant code duplication
Cons: More functions may increase maintenance burden
Opinions: Only I support this approach



OK, so I responded to this but I can't find my reply or any of the 
followups, so let me go again:


I prefer this option, because:

a. Virtually all uses of parity is done in contexts where the sizes of 
the items for which parity is to be taken are well-defined, but it is 
*really* easy for integer promotion to cause a value to be extended to 
32 bits unnecessarily (sign or zero extend, although for parity it 
doesn't make any difference -- if the compiler realizes it.)


b. It makes it easier to add arch-specific implementations, notably 
using __builtin_parity on architectures where that is known to generate 
good code.


c. For architectures where only *some* parity implementations are 
fast/practical, the generic fallbacks will either naturally synthesize 
them from components via shift-xor, or they can be defined to use a 
larger version; the function prototype acts like a cast.


d. If there is a reason in the future to add a generic version, it is 
really easy to do using the size-specific functions as components; this 
is something we do literally all over the place, using a pattern so 
common that it, itself, probably should be macroized:


#define parity(x)   \
({  \
typeof(x) __x = (x);\
bool __y;   \
switch (sizeof(__x)) {  \
case 1: \
__y = parity8(__x); \
break;  \
case 2: \
__y = parity16(__x);\
break;  \
case 4: \
__y = parity32(__x);\
break;  \
case 8: \
__y = parity64(__x);\
break;  \
default:\
BUILD_BUG();\
break;  \
}   \
__y;\
})



[PATCH] drm/bochs: add Bochs PCI ID for Simics model

2021-09-09 Thread H. Peter Anvin (Intel)
Current (and older) Simics models for the Bochs VGA used the wrong PCI
vendor ID (0x4321 instead of 0x1234).  Although this can hopefully be
fixed in the future, it is a problem for users of the current version,
not the least because to update the device ID the BIOS has to be
rebuilt in order to see BIOS output.

Add support for the 4321: device number in addition to the
1234: one.

Signed-off-by: H. Peter Anvin (Intel) 
---
 drivers/gpu/drm/tiny/bochs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
index 73415fa9ae0f..2ce3bd903b70 100644
--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -63,6 +63,7 @@ MODULE_PARM_DESC(defy, "default y resolution");
 
 enum bochs_types {
BOCHS_QEMU_STDVGA,
+   BOCHS_SIMICS,
BOCHS_UNKNOWN,
 };
 
@@ -695,6 +696,13 @@ static const struct pci_device_id bochs_pci_tbl[] = {
.subdevice   = PCI_ANY_ID,
.driver_data = BOCHS_UNKNOWN,
},
+   {
+   .vendor  = 0x4321,
+   .device  = 0x,
+   .subvendor   = PCI_ANY_ID,
+   .subdevice   = PCI_ANY_ID,
+   .driver_data = BOCHS_SIMICS,
+   },
{ /* end of list */ }
 };
 
-- 
2.31.1