Re: [PATCH 0/7] Serialise oopses, BUGs, WARNs, dump_stack, soft lockups and hard lockups
>> Some architectures already have their own recursive >> locking for oopses and we have another version for >> serialising dump_stack. >> >> Create a common version and use it everywhere (oopses, >> BUGs, WARNs, dump_stack, soft lockups and hard lockups). > > Dunno. I've had cases where the simultaneity of the oopses > (i.e. their garbled nature) gave me the clue about the type > of race to expect. > one of the question is if you want to serialize, or if you just want to label. If you take a cookie (could just be a monotonic increasing number) at the start of the oops and then prefix/postfix the stack printing with that number, you don't serialize (risk of locking up), but you can pretty trivially see which line came from where.. if you do the monotonic increasing number approach, you even get an ordering out of it. it does mean changing the dump_stack() and co function fingerprint to take an extra argument, but that is not TOO insane. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure with next-20120208
On 2/12/2012 7:04 PM, Michael Neuling wrote: >> Just a quick note to say I got a boot OOPs with next-20120208 and 9 on a >> Power7 blade (my other PowerPC boot tests are ok. I'll investigate this >> further on Monday. >> >> The line referenced below is: >> >> BUG_ON(!kobj || !kobj->sd || !attr); >> >> in sysfs_create_file(). >> >> calling .topology_init+0x0/0x1ac @ 1 >> initcall 7_.async_cpu_up+0x0/0x40 returned 0 after 9765 usecs >> async_continuing @ 20 after 9765 usec >> [ cut here ] >> kernel BUG at fs/sysfs/file.c:573! >> Oops: Exception in kernel mode, sig: 5 [#1] >> SMP NR_CPUS=32 NUMA pSeries >> Modules linked in: >> NIP: c024a35c LR: c04ee050 CTR: c083ca24 >> REGS: c003fd9e7560 TRAP: 0700 Not tainted (3.3.0-rc2-autokern1) >> MSR: 80029032 CR: 88002082 XER: 000f >> CFAR: c024a370 >> TASK = c003fd9e8000[20] 'kworker/u:6' THREAD: c003fd9e4000 CPU: 0 >> GPR00: 0001 c003fd9e77e0 c0d19bb8 >> GPR04: c0bf37a8 0008 82096400 >> GPR08: c0f80028 c0d52bd8 >> GPR12: 48002088 cf33b000 01affa78 009aa000 >> GPR16: 00e1f3c8 02d517f0 01aff984 0060 >> GPR20: c0c45128 >> GPR24: 0008 c0c44200 >> GPR28: c0f80028 0008 c0c85038 0002 >> NIP [c024a35c] .sysfs_create_file+0x1c/0x40 >> LR [c04ee050] .device_create_file+0x20/0x40 >> Call Trace: >> [c003fd9e77e0] [c003fd9e78a0] 0xc003fd9e78a0 (unreliable) >> [c003fd9e7850] [c083c9a4] .register_cpu_online+0x1d0/0x250 >> [c003fd9e7900] [c083ca8c] .sysfs_cpu_notify+0x68/0x28c >> [c003fd9e79b0] [c083769c] .notifier_call_chain+0x9c/0x100 >> [c003fd9e7a50] [c00a5878] .__cpu_notify+0x38/0x80 >> [c003fd9e7ad0] [c083e124] ._cpu_up+0x10c/0x178 >> [c003fd9e7b90] [c083e2c8] .cpu_up+0x138/0x164 >> [c003fd9e7c20] [c0ba46d0] .async_cpu_up+0x28/0x40 >> [c003fd9e7ca0] [c00d81ec] .async_run_entry_fn+0xbc/0x1f0 >> [c003fd9e7d50] [c00c7cbc] .process_one_work+0x19c/0x590 >> [c003fd9e7e10] [c00c8618] .worker_thread+0x188/0x4b0 >> [c003fd9e7ed0] [c00ce57c] .kthread+0xbc/0xd0 >> [c003fd9e7f90] [c0021448] .kernel_thread+0x54/0x70 >> Instruction dump: >> 7fa3eb78 ebe1fff8 eba1ffe8 7c0803a6 4e800020 2c23 41820024 e8630030 >> 7c800074 7800d182 2fa3 419e0014 <0b00> 38a2 4bfffebc e8630030 >> ---[ end trace 31fd0ba7d8756001 ]--- >> initcall .topology_init+0x0/0x1ac returned 0 after 0 usecs >> calling .pcibios_init+0x0/0xe8 @ 1 >> PCI: Probing PCI hardware >> PCI: Probing PCI hardware done >> initcall .pcibios_init+0x0/0xe8 returned 0 after 0 usecs >> calling .add_system_ram_resources+0x0/0x140 @ 1 >> initcall .add_system_ram_resources+0x0/0x140 returned 0 after 0 usecs >> calling >> .__machine_initcall_powermac_pmac_i2c_create_platform_devices+0x0/0xc8 @ 1 >> initcall >> .__machine_initcall_powermac_pmac_i2c_create_platform_devices+0x0/0xc8 >> returned 0 after 0 usecs >> calling .opal_init+0x0/0x1cc @ 1 >> opal: Node not found >> initcall .opal_init+0x0/0x1cc returned -19 after 0 usecs >> calling .__machine_initcall_pseries_ioei_init+0x0/0xa0 @ 1 > > Reverting "smp: start up non-boot CPUs asynchronously" (8de7a96405 from > next-20120208) fixes this problem for me. > if that fixes it, it means PPC has a race somewhere in the cpu hotplug code, since all the patch does is hotplug the cpus one by one (which the normal kernel also does, just not in parallel with other work) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure with next-20120208
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2/13/2012 12:05 PM, Andrew Morton wrote: > On Mon, 13 Feb 2012 06:18:34 -0800 Arjan van de Ven > wrote: > >> On 2/12/2012 7:04 PM, Michael Neuling wrote: >>>> Just a quick note to say I got a boot OOPs with next-20120208 >>>> and 9 on a Power7 blade (my other PowerPC boot tests are ok. >>>> I'll investigate this further on Monday. > > Thanks for testing linux-next. Very useful. > >>>> The line referenced below is: >>>> >>>> BUG_ON(!kobj || !kobj->sd || !attr); >>>> >>>> in sysfs_create_file(). > > Yes, this is exactly why we should never use BUG_ON(a || b). We > don't know which of those three expressions triggered. > >>>> calling .topology_init+0x0/0x1ac @ 1 initcall >>>> 7_.async_cpu_up+0x0/0x40 returned 0 after 9765 usecs >>>> async_continuing @ 20 after 9765 usec [ cut here >>>> ] kernel BUG at fs/sysfs/file.c:573! Oops: >>>> Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=32 NUMA >>>> pSeries Modules linked in: NIP: c024a35c LR: >>>> c04ee050 CTR: c083ca24 REGS: c003fd9e7560 >>>> TRAP: 0700 Not tainted (3.3.0-rc2-autokern1) MSR: >>>> 80029032 CR: 88002082 XER: >>>> 000f CFAR: c024a370 TASK = c003fd9e8000[20] >>>> 'kworker/u:6' THREAD: c003fd9e4000 CPU: 0 GPR00: >>>> 0001 c003fd9e77e0 c0d19bb8 >>>> GPR04: c0bf37a8 0008 >>>> 82096400 GPR08: >>>> c0f80028 c0d52bd8 GPR12: >>>> 48002088 cf33b000 01affa78 >>>> 009aa000 GPR16: 00e1f3c8 02d517f0 >>>> 01aff984 0060 GPR20: >>>> c0c45128 GPR24: >>>> 0008 >>>> c0c44200 GPR28: c0f80028 0008 >>>> c0c85038 0002 NIP [c024a35c] >>>> .sysfs_create_file+0x1c/0x40 LR [c04ee050] >>>> .device_create_file+0x20/0x40 Call Trace: [c003fd9e77e0] >>>> [c003fd9e78a0] 0xc003fd9e78a0 (unreliable) >>>> [c003fd9e7850] [c083c9a4] >>>> .register_cpu_online+0x1d0/0x250 [c003fd9e7900] >>>> [c083ca8c] .sysfs_cpu_notify+0x68/0x28c >>>> [c003fd9e79b0] [c083769c] >>>> .notifier_call_chain+0x9c/0x100 [c003fd9e7a50] >>>> [c00a5878] .__cpu_notify+0x38/0x80 [c003fd9e7ad0] >>>> [c083e124] ._cpu_up+0x10c/0x178 [c003fd9e7b90] >>>> [c083e2c8] .cpu_up+0x138/0x164 [c003fd9e7c20] >>>> [c0ba46d0] .async_cpu_up+0x28/0x40 [c003fd9e7ca0] >>>> [c00d81ec] .async_run_entry_fn+0xbc/0x1f0 >>>> [c003fd9e7d50] [c00c7cbc] >>>> .process_one_work+0x19c/0x590 [c003fd9e7e10] >>>> [c00c8618] .worker_thread+0x188/0x4b0 >>>> [c003fd9e7ed0] [c00ce57c] .kthread+0xbc/0xd0 >>>> [c003fd9e7f90] [c0021448] >>>> .kernel_thread+0x54/0x70 Instruction dump: 7fa3eb78 ebe1fff8 >>>> eba1ffe8 7c0803a6 4e800020 2c23 41820024 e8630030 >>>> 7c800074 7800d182 2fa3 419e0014 <0b00> 38a2 >>>> 4bfffebc e8630030 ---[ end trace 31fd0ba7d8756001 ]--- >>>> initcall .topology_init+0x0/0x1ac returned 0 after 0 usecs >>>> calling .pcibios_init+0x0/0xe8 @ 1 PCI: Probing PCI >>>> hardware PCI: Probing PCI hardware done initcall >>>> .pcibios_init+0x0/0xe8 returned 0 after 0 usecs calling >>>> .add_system_ram_resources+0x0/0x140 @ 1 initcall >>>> .add_system_ram_resources+0x0/0x140 returned 0 after 0 usecs >>>> calling >>>> .__machine_initcall_powermac_pmac_i2c_create_platform_devices+0x0/0xc8 >>>> @ 1 initcall >>>> .__machine_initcall_powermac_pmac_i2c_create_platform_devices+0x0/0xc8 >>>> returned 0 after 0 usecs calling .opal_init+0x0/0x1cc @ 1 >>>> opal: Node not found initcall .opal_init+0x0/0x1cc returned >>>> -19 after 0 usecs calling >>>> .__machine_initcall_pseries_ioei_init+0x0/0xa0 @ 1 >>> >>> Reverting "smp: start up non-boot CPUs asynchronously" >>> (8de7a96405 fro
Re: smp: Start up non-boot CPUs asynchronously
one coments; will comment more when I get to work On Tue, Feb 14, 2012 at 1:48 AM, Srivatsa S. Bhat 7. And whichever code between smp_init() and async_synchronize_full() didn't > > care about CPU hotplug till today but depended on all cpus being online > must > suddenly start worrying about CPU Hotplug. They must register a cpu > notifier > and handle callbacks etc etc.. Or if they are not worth that complexity, > they > should atleast be redesigned or moved around - like the print statements > that > tell how many cpus came up, for example. > > frankly, such code HAS to worry about cpus going online and offline even today; the firmware, at least on X86, can start taking cores offline/online once ACPI is initialized (as controlled by a data center manager from outside the box, usually done based on thermal or power conditions on a datacenter level). Now, no doubt that we have bugs in this space, since this only happened very rarely before. Question is what to do from a longer term strategy: Either we declare the number of online CPUs invariant during a certain phase of the boot (and make ACPI and co honor this as well somehow) or We decide to go about fixing these (maybe with the help of lockdep?) In addition to this, the reality is that the whole "bring cpus up" sequence needs to be changed; the current one is very messy and requires the hotplug lock for the whole bring up of each individual cpu... which is a very unfortunate design; a much better design would be to only take the lock for the actual registration of the newly brought up CPU to the kernel, while running the physical bringup without the global lock. If/when that change gets made, we can do the physical bring up in parallel (with each other, but also with the rest of the kernel boot), and do the registration en-mass at some convenient time in the boot, potentially late. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: smp: Start up non-boot CPUs asynchronously
Its more than acpi ... machine checks can do it too etc -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Peter Zijlstra wrote: On Tue, 2012-02-14 at 06:31 -0800, Arjan van de Ven wrote: > > frankly, such code HAS to worry about cpus going online and offline > even today; the firmware, at least on X86, can start taking cores > offline/online once ACPI is initialized (as controlled by a data > center manager from outside the box, usually done based on thermal or > power conditions on a datacenter level). Now, no doubt that we have > bugs in this space, since this only happened very rarely before. Which frankly is an utter piece of crap, that ACPI spec is total garbage and completely useless. You might have noticed that the ACPI code supporting that failure carries a big nacked-by from me. That's not to say we shouldn't try to fix hotplug, but bringing that ACPI nonsense to the table makes me care less, not more. I mean, really, that spec is broken, the support is worse. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: smp: Start up non-boot CPUs asynchronously
On 2/14/2012 11:57 AM, Srivatsa S. Bhat wrote: >> In addition to this, the reality is that the whole "bring cpus up" >> sequence needs to be changed; the current one is very messy and requires >> the hotplug lock for the whole bring up of each individual cpu... which >> is a very unfortunate design; a much better design would be to only take >> the lock for the actual registration of the newly brought up CPU to the >> kernel, while running the physical bringup without the global lock. >> If/when that change gets made, we can do the physical bring up in >> parallel (with each other, but also with the rest of the kernel boot), >> and do the registration en-mass at some convenient time in the boot, >> potentially late. >> > > > Sounds like a good idea, but how will we take care of CPU_UP_PREPARE and > CPU_STARTING callbacks then? Because, CPU_UP_PREPARE callbacks are run > before bringing up the cpu and CPU_STARTING is called from the cpu that is > coming up. Also, CPU_UP_PREPARE callbacks can be failed, which can lead > to that particular cpu boot getting aborted. With the "late commissioning > of CPUs" idea you proposed above, retaining such semantics could become > very challenging. some of these callbacks may need to be redesigned as well; or at least, we may need to decouple the "physical" state of the CPU that's getting brought up from the "logical" OS visible one. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Boot failure with next-20120208
On 3/23/2012 12:22 PM, Andrew Morton wrote: > On Mon, 13 Feb 2012 12:16:41 -0800 > Arjan van de Ven wrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >>> The bug looks pretty generic, nothing very PPC-specific there. It >>> might affect other architectures - we won't know until we find out >>> wht caused it. >> >> well one half of the race looks pretty generic... >> . doesn't mean the other half of the race is though >> >> >>> >>> Ho hum, I suppose I should pull the patch out of linux-next, to >>> avoid disrupting other testing. This means it's going to be hard >>> to get the bug fixed. >> >> it means losing this one big PPC machine indeed until they hit >> that same race some other way with regular real cpu hotplug ;-( > > So we're kinda stuck with this. As I can't merge it, I guess I'll make > smp-start-up-non-boot-cpus-asynchronously.patch disappear. well yeah, PPC is throwing things in the spanner we're now working on an x86-only patch with basically the same improvement, but done in a way that does not touch the other architectures so by all means drop the patch ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
On Thu, 24 Sep 2009 13:33:07 +0200 Peter Zijlstra wrote: > On Thu, 2009-09-24 at 18:38 +1000, Benjamin Herrenschmidt wrote: > > On Thu, 2009-09-24 at 09:51 +0200, Peter Zijlstra wrote: > > > > I don't quite follow your logic here. This is useful for more > > > > than just hypervisors. For example, take the HV out of the > > > > picture for a moment and imagine that the HW has the ability to > > > > offline CPU in various power levels, with varying latencies to > > > > bring them back. > > > > > > cpu-hotplug is an utter slow path, anybody saying latency and > > > hotplug in the same sentence doesn't seem to grasp either or both > > > concepts. > > > > Let's forget about latency then. Let's imagine I want to set a CPU > > offline to save power, vs. setting it offline -and- opening the back > > door of the machine to actually physically replace it :-) > > If the hardware is capable of physical hotplug, then surely powering > the socket down saves most power and is the preferred mode? btw just to take away a perception that generally powering down sockets help; it does not help for all cpus. Some cpus are so efficient in idle that the incremental gain one would get by "offlining" a core is just not worth it (in fact, in x86, it's the same thing) I obviously can't speak for p-series cpus, just wanted to point out that there is no universal truth about "offlining saves power". -- Arjan van de VenIntel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
On Thu, 24 Sep 2009 10:42:41 +0530 Arun R Bharadwaj wrote: > * Arun R Bharadwaj [2009-09-22 16:55:27]: > > Hi Len, (or other acpi folks), > > I had a question regarding ACPI-cpuidle interaction in the current > implementation. > > Currently, every cpu (i.e. acpi_processor) registers to cpuidle as > a cpuidle_device. So every cpu has to go through the process of > setting up the idle states and then registering as a cpuidle device. > > What exactly is the reason behind this? > technically a BIOS can opt to give you C states via ACPI on some cpus, but not on others. in practice when this happens it tends to be a bug.. but it's technically a valid configuration -- Arjan van de VenIntel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 0/2] cpu: pseries: Offline state framework.
On Fri, 25 Sep 2009 12:55:49 +0530 Vaidyanathan Srinivasan wrote: > > I obviously can't speak for p-series cpus, just wanted to point out > > that there is no universal truth about "offlining saves power". > > Hi Arjan, > > As you have said, on some cpus the extra effort of offlining does not > save us any extra power, and the state will be same as idle. The > assertion that offlining saves power is still valid, it could be same > as idle or better depending on the architecture and implementation. > > On x86 we still need the code (Venki posted) to take cpus to C6 on > offline to save power or else offlining consumes more power than idle > due to C1/hlt state. This framework can help here as well if we have > any apprehension on making lowest sleep state as default on x86 and > want the administrator to decide. even with Venki's patch, all our measurements indicate that taking cores away is damage on x86. Don't let that stop what you do for powerpc, but for x86 it's not a win. Linux is good at keeping cores in C6 long enough that the downside of offlining is bigger... -- Arjan van de VenIntel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v6 PATCH 0/7]: cpuidle/x86/POWER: Cleanup idle power management code in x86, cleanup drivers/cpuidle/cpuidle.c and introduce cpuidle to POWER.
On Fri, 25 Sep 2009 10:54:24 +0200 Peter Zijlstra wrote: > On Fri, 2009-09-25 at 12:36 +0530, Vaidyanathan Srinivasan wrote: > > * Arjan van de Ven [2009-09-24 14:22:28]: > > > > > On Thu, 24 Sep 2009 10:42:41 +0530 > > > Arun R Bharadwaj wrote: > > > > > > > * Arun R Bharadwaj [2009-09-22 > > > > 16:55:27]: > > > > > > > > Hi Len, (or other acpi folks), > > > > > > > > I had a question regarding ACPI-cpuidle interaction in the > > > > current implementation. > > > > > > > > Currently, every cpu (i.e. acpi_processor) registers to cpuidle > > > > as a cpuidle_device. So every cpu has to go through the process > > > > of setting up the idle states and then registering as a cpuidle > > > > device. > > > > > > > > What exactly is the reason behind this? > > > > > > > > > > technically a BIOS can opt to give you C states via ACPI on some > > > cpus, but not on others. > > > > > > in practice when this happens it tends to be a bug.. but it's > > > technically a valid configuration > > > > So we will need to keep the per-cpu registration as of now because > > we may have such buggy BIOS in the field and we don't want the > > cpuidle framework to malfunction there. > > If the BIOS doesn't mention a certain C state on a cpu, and you try to > set it anyway, does that go boom? > > This whole per-cpu registration thing is horridly ugly, can't you > have a per-cpu C state exception mask and leave it at that -- if its > really needed? the real solution is to make the acpi code always know about C1, even if the bios doesn't That's one for Len :) (C1 is just "hlt", what we do in the other idle loop ;-) -- Arjan van de VenIntel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [stable] [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
On Sat, 28 Feb 2009 09:46:01 -0800 Greg KH wrote: > On Sat, Feb 28, 2009 at 09:23:36AM -0800, Linus Torvalds wrote: > > On Fri, 27 Feb 2009, Roland McGrath wrote: > > > But here is the patch you asked for. > > > > Yes, this looks much more straightforward. > > > > And I guess the seccomp interaction means that this is potentially > > a 2.6.29 thing. Not that I know whether anybody actually _uses_ > > seccomp. It does seem to be enabled in at least Fedora kernels, but > > it might not be used anywhere. > > It's enabled in SuSE kernels. > but are there users of it? I thought Andrea stopped the cpushare thing that was the only user of this -- Arjan van de VenIntel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/3] Fix Unlikely(x) == y
On Sat, 16 Feb 2008 17:08:01 +0100 Roel Kluin <[EMAIL PROTECTED]> wrote: > The patch below was not yet tested. If it's correct as it is, please > comment. --- > Fix Unlikely(x) == y > you found a great set of bugs.. but to be honest... I suspect it's just best to remove unlikely altogether for these cases; unlikely() is almost a go-faster-stripes thing, and if you don't know how to use it you shouldn't be using it... so just removing it for all wrong cases is actually the best thing to do imo. -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/3] Fix Unlikely(x) == y
On Sat, 16 Feb 2008 18:58:49 +0100 > > If you think unlikely() means something else, we should fix what it > > maps to towards gcc ;) (to.. be empty ;) > > eventhough the gcc docs say it's just a hint to help the compiler > optimize the branch it takes by default, I too have noticed that it > more often does bad than good. Code gets completely reordered and > even sometimes partially duplicated (especially when the branch is a > return). > > Last but not least, gcc 4 tends to emit stupid checks, to the point > that I have replaced unlikely(x) with (x) in my code when gcc >= 4 is > detected. What I observe is that the following code : > > if (unlikely(p == NULL)) ... this is pure bad since GCC already assumes that NULL checks are unlikely, and with the unlikely() code needing to normalize stuff... it will generate worse code for sure yes. > > often gets coded like this : > > reg1 = (p == NULL) > if (reg1 != 0) ... > > ... which clobbers reg1 for nothing and performs a double test. > > But yes, I assumed that the author considered its use to be > legitimate (I've not looked at the code). Maybe you're right and it > should be removed, but in this case we would need a large audit of > the abuses of unlikely()... no argument.. how about we start with all the cases where the author just got it entirely wrong ... like the ones from this patch ;) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/3] Fix Unlikely(x) == y
On Sat, 16 Feb 2008 10:31:26 -0800 Geoff Levand <[EMAIL PROTECTED]> wrote: > On 02/16/2008 09:42 AM, Arjan van de Ven wrote: > > On Sat, 16 Feb 2008 18:33:16 +0100 > > Willy Tarreau <[EMAIL PROTECTED]> wrote: > > > >> On Sat, Feb 16, 2008 at 09:25:52AM -0800, Arjan van de Ven wrote: > >> > On Sat, 16 Feb 2008 17:08:01 +0100 > >> > Roel Kluin <[EMAIL PROTECTED]> wrote: > >> > > >> > > The patch below was not yet tested. If it's correct as it is, > >> > > please comment. --- > >> > > Fix Unlikely(x) == y > >> > > > >> > > >> > you found a great set of bugs.. > >> > but to be honest... I suspect it's just best to remove unlikely > >> > altogether for these cases; unlikely() is almost a > >> > go-faster-stripes thing, and if you don't know how to use it you > >> > shouldn't be using it... so just removing it for all wrong cases > >> > is actually the best thing to do imo. > >> > >> Well, eventhough the author may not know how to use it, "unlikely" > >> at least indicates the intention of the author, or his knowledge > >> of what should happen here. I'd suggest leaving it where it is > >> because the authot of this code is in best position to know that > >> this branch is unlikely to happen, eventhough he does not > >> correctly use the macro. > >> > > > > you have more faith in the authors knowledge of how his code > > actually behaves than I think is warranted :) Or faith in that he > > knows what "unlikely" means. I should write docs about this; but > > unlikely() means: 1) It happens less than 0.01% of the cases. > > 2) The compiler couldn't have figured this out by itself > >(NULL pointer checks are compiler done already, same for some > > other conditions) 3) It's a hot codepath where shaving 0.5 cycles > > (less even on x86) matters (and the author is ok with taking a 500 > > cycles hit if he's wrong) > > > > If you think unlikely() means something else, we should fix what it > > maps to towards gcc ;) (to.. be empty ;) > > Well, I didn't consider what today's compiler does, but used it as a > general indicator, because I think that code will be around a long > time. If you show me some test results that prove it causes harm I > might consider removing it. for mordern (last 10 years) x86 cpus... the cpu branchpredictor is better than the coder in general. Same for most other architectures. unlikely() creates bigger code as well. Now... we're talking about your super duper hotpath function here right? One where you care about 0.5 cycle speed improvement? (less on modern systems ;) Because if not, the bigger code and general "compiler second guessing" is just more harmful than good, shown already here by the fact that the code was just incorrect as a virtue of having the unlikely() in. -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/3] Fix Unlikely(x) == y
On Sat, 16 Feb 2008 18:33:16 +0100 Willy Tarreau <[EMAIL PROTECTED]> wrote: > On Sat, Feb 16, 2008 at 09:25:52AM -0800, Arjan van de Ven wrote: > > On Sat, 16 Feb 2008 17:08:01 +0100 > > Roel Kluin <[EMAIL PROTECTED]> wrote: > > > > > The patch below was not yet tested. If it's correct as it is, > > > please comment. --- > > > Fix Unlikely(x) == y > > > > > > > you found a great set of bugs.. > > but to be honest... I suspect it's just best to remove unlikely > > altogether for these cases; unlikely() is almost a > > go-faster-stripes thing, and if you don't know how to use it you > > shouldn't be using it... so just removing it for all wrong cases is > > actually the best thing to do imo. > > Well, eventhough the author may not know how to use it, "unlikely" at > least indicates the intention of the author, or his knowledge of what > should happen here. I'd suggest leaving it where it is because the > authot of this code is in best position to know that this branch is > unlikely to happen, eventhough he does not correctly use the macro. > you have more faith in the authors knowledge of how his code actually behaves than I think is warranted :) Or faith in that he knows what "unlikely" means. I should write docs about this; but unlikely() means: 1) It happens less than 0.01% of the cases. 2) The compiler couldn't have figured this out by itself (NULL pointer checks are compiler done already, same for some other conditions) 3) It's a hot codepath where shaving 0.5 cycles (less even on x86) matters (and the author is ok with taking a 500 cycles hit if he's wrong) If you think unlikely() means something else, we should fix what it maps to towards gcc ;) (to.. be empty ;) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/3] Fix Unlikely(x) == y
On Mon, 18 Feb 2008 13:11:06 -0500 [EMAIL PROTECTED] wrote: > On Mon, 18 Feb 2008 14:27:10 GMT, David Howells said: > > > __builtin_expect() is useful on FRV where you _have_ to give each > > branch and conditional branch instruction a measure of probability > > whether the branch will be taken. > > What does gcc do the 99.998% of the time we don't have > likely/unlikely coded? see Andi's email. It gets the exact same hints that 95%+ of the kernels unlikely/likely get you, because the heuristics in it are usually the same as the kernel programmers heuristics. -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [Patch 0/2] powerpc: avoid userspace poking to legacy ioports
On Mon, 18 Feb 2008 21:58:42 +0100 Jean Delvare <[EMAIL PROTECTED]> wrote: > On Tue, 19 Feb 2008 07:42:03 +1100, Benjamin Herrenschmidt wrote: > > > > > Maybe Christian's patch can be improved to not do the check on > > > these? As long as /dev/port exists, it seems reasonable that the > > > kernel should behave, no matter what I/O ports are accessed from > > > user-space. > > > > nonsense. > > > > /dev/mem exists for example, but you are still not supposed to go > > bang all over the place in it. > > You should at least be able to read from it without crashing the > machine. Of course writing is a different story. keep dreaming. This is not how /dev/mem works today, not on x86 and very likely not on ppc either. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/3] Fix Unlikely(x) == y
On Tue, 19 Feb 2008 13:33:53 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > > Actually one thing I don't like about gcc is that I think it still > emits cmovs for likely/unlikely branches, which is silly (the gcc > developers seem to be in love with that instruction). If that goes > away, then branch hints may be even better. only for -Os and only if the result is smaller afaik. (cmov tends to be a performance loss most of the time so for -O2 and such it isn't used as far as I know.. it does make for nice small code however ;-) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: the printk problem
Linus Torvalds wrote: On Fri, 4 Jul 2008, Linus Torvalds wrote: Still all happily untested, of course. And still with no actual users converted. Ok, it's tested, and here's an example usage conversion. The diffstat pretty much says it all. It _does_ change the format of the stack trace entry a bit, but I don't think it's for the worse (unless it breaks things like the oops tracker - Arjan?) It changes the symbol-in-module format from :ext3:add_dirent_to_buf+0x6c/0x26c to add_dirent_to_buf+0x6c/0x26c [ext3] but quite frankly, the latter was the standard format anyway (it's what "sprint_symbol()" gives you), and traps_64.c was the odd man out. This won't break anything for me actually; I already deal with either case. $ cat oopsparse.pl | wc -l 1252 the kernel is so inconsistent with oops formats (over time/across architectures) that once you deal with what there is today... you pretty much deal with everything. I also like the improvement; I wished something like this existed several times already in the last few months so for sure Acked-by: Arjan van de Ven <[EMAIL PROTECTED]> ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: dma_alloc_coherent() on PPC32: physical addresses above 2G possible?
On Sun, 20 Jul 2008 20:36:23 +0200 Stefan Richter <[EMAIL PROTECTED]> wrote: > PS: I don't want to set the DMA mask of this device to > DMA_31BIT_MASK because that would be detrimental to other functions > of the device. It's a TI TSB43AB22A FireWire controller. Hi, just want to mention that you can set the coherent mask separately from the generic mask... is that sufficient for your load? (you can even set it just for this allocation..) -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: dma_alloc_coherent() on PPC32: physical addresses above 2G possible?
On Sun, 20 Jul 2008 21:25:51 +0200 Stefan Richter <[EMAIL PROTECTED]> wrote: > > Later on: > > if (dev->needs_dma_mask_workaround) > pci_set_consistent_dma_mask(pdev, DMA_31BIT_MASK); > allocate_something_special; > if (dev->needs_dma_mask_workaround) > pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK); > > Or is there a variant of dma_alloc_coherent() which directly accepts a > mask argument? something like this. But realistically, how many consistent/coherent allocations do you have? some ring buffers and other one time stuff surely... but not after that? -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: dma_alloc_coherent() on PPC32: physical addresses above 2G possible?
On Sun, 20 Jul 2008 21:25:51 +0200 Stefan Richter <[EMAIL PROTECTED]> wrote: > Arjan van de Ven wrote: > > On Sun, 20 Jul 2008 20:36:23 +0200 > > Stefan Richter <[EMAIL PROTECTED]> wrote: > > > >> PS: I don't want to set the DMA mask of this device to > >> DMA_31BIT_MASK because that would be detrimental to other functions > >> of the device. It's a TI TSB43AB22A FireWire controller. > > > > Hi, > > > > just want to mention that you can set the coherent mask separately > > from the generic mask... is that sufficient for your load? > > (you can even set it just for this allocation..) > > Hmm. Would that be done this way? > During probe: > > if (chip_is_tsb43ab22a) { > if (dma_supported(dev, DMA_31BIT_MASK)) > chip->needs_dma_mask_workaround = 1; > else > chip->needs_some_other_workaround = 1; > } btw it might be nicer to make this chip->something_special_mask = DMA_31BIT_MASK; then you can just use the mask from this struct rather than another check -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [BUG] linux-next: Tree for August 26 - Badness at kernel/notifier.c:25
Kamalesh Babulal wrote: Hi Stephen, Badness warning is seen, while booting up the next-20080825/26 kernels on the powerpc boxes this is fixed in the patch I sent to Ingo earlier today (attached again for reference) >From eafa461d187448998b1f66c9134e66b125db9531 Mon Sep 17 00:00:00 2001 From: Arjan van de Ven <[EMAIL PROTECTED]> Date: Tue, 26 Aug 2008 09:01:06 -0700 Subject: [PATCH] debug: add notifier chain debugging during some development we suspected a case where we left something in a notifier chain that was from a module that was unloaded already... and that sort of thing is rather hard to track down. This patch adds a very simple sanity check (which isn't all that expensive) to make sure the notifier we're about to call is actually from either the kernel itself of from a still-loaded module, avoiding a hard-to-chase-down crash. Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> Acked-by: Tony Luck <[EMAIL PROTECTED]> --- include/linux/kernel.h |3 +++ kernel/extable.c | 16 kernel/notifier.c |6 ++ lib/vsprintf.c |2 +- 4 files changed, 26 insertions(+), 1 deletions(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2651f80..4e1366b 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -187,6 +187,9 @@ extern unsigned long long memparse(char *ptr, char **retptr); extern int core_kernel_text(unsigned long addr); extern int __kernel_text_address(unsigned long addr); extern int kernel_text_address(unsigned long addr); +extern int func_ptr_is_kernel_text(void *ptr); +extern void *dereference_function_descriptor(void *ptr); + struct pid; extern struct pid *session_of_pgrp(struct pid *pgrp); diff --git a/kernel/extable.c b/kernel/extable.c index a26cb2e..adf0cc9 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -66,3 +66,19 @@ int kernel_text_address(unsigned long addr) return 1; return module_text_address(addr) != NULL; } + +/* + * On some architectures (PPC64, IA64) function pointers + * are actually only tokens to some data that then holds the + * real function address. As a result, to find if a function + * pointer is part of the kernel text, we need to do some + * special dereferencing first. + */ +int func_ptr_is_kernel_text(void *ptr) +{ + unsigned long addr; + addr = (unsigned long) dereference_function_descriptor(ptr); + if (core_kernel_text(addr)) + return 1; + return module_text_address(addr) != NULL; +} diff --git a/kernel/notifier.c b/kernel/notifier.c index 823be11..522277c 100644 --- a/kernel/notifier.c +++ b/kernel/notifier.c @@ -82,6 +82,12 @@ static int __kprobes notifier_call_chain(struct notifier_block **nl, while (nb && nr_to_call) { next_nb = rcu_dereference(nb->next); + if (!func_ptr_is_kernel_text(nb->notifier_call)) { + WARN(1, "Invalid notifier called!"); + nb = next_nb; + continue; + } + ret = nb->notifier_call(nb, val, v); if (nr_calls) diff --git a/lib/vsprintf.c b/lib/vsprintf.c index d8d1d11..f5e5ffb 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -513,7 +513,7 @@ static char *string(char *buf, char *end, char *s, int field_width, int precisio return buf; } -static inline void *dereference_function_descriptor(void *ptr) +void *dereference_function_descriptor(void *ptr) { #if defined(CONFIG_IA64) || defined(CONFIG_PPC64) void *p; -- 1.5.5.1 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [BUG] linux-next: Tree for August 26 - Badness at kernel/notifier.c:25
Kamalesh Babulal wrote: Thanks for reference of the patch, After replacing the patch with the latest one above on the powerpc, the warning still remains Badness at kernel/notifier.c:86 sadly you have something going on that doesn't list the modules loaded etc... is this during boot or way later? (because if it's the later, you might be hitting a legitimate bug ;-) NIP: c0081470 LR: c0081494 CTR: c005a2d0 REGS: c021ce0bfaf0 TRAP: 0700 Not tainted (2.6.27-rc4-next-20080826-autotest) MSR: 80029032 CR: 24008042 XER: 0005 TASK = c015de08[1] 'swapper' THREAD: c021ce0bc000 CPU: 0 GPR00: c0081494 c021ce0bfd70 c081e940 c0749c38 GPR04: 0003 0001 c021ce0bfe90 GPR08: c04fd9f0 c04fd9f0 GPR12: 2442 c089c300 02307ef0 c06332a0 GPR16: c0631f28 c0633388 018bf8b0 0270 GPR20: c070b07c c0707ef0 c0708160 c0631c58 GPR24: 0003 0001 c021ce0bfe90 GPR28: c0749c20 c07bf338 c0749c38 NIP [c0081470] .notifier_call_chain+0x70/0x140 LR [c0081494] .notifier_call_chain+0x94/0x140 Call Trace: [c021ce0bfd70] [c0081494] .notifier_call_chain+0x94/0x140 (unreliable) [c021ce0bfe20] [c04fe3fc] .cpu_up+0x10c/0x200 [c021ce0bfee0] [c06cdcc0] .kernel_init+0x1b0/0x440 [c021ce0bff90] [c00299cc] .kernel_thread+0x4c/0x68 Instruction dump: e863 2fa3 419e00f0 2fa6 419e00e8 2e27 7c7f1b78 3b60 4828 6000 6000 6000 <0fe0> 2fbd 2f3c 7fbfeb78 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: powerpc allmodconfig
On Thu, 16 Oct 2008 12:49:23 -0700 (PDT) David Miller <[EMAIL PROTECTED]> wrote: > #endif > #define __WARN() warn_on_slowpath(__FILE__, __LINE__) > #define __WARN_printf(arg...) warn_slowpath(__FILE__, __LINE__, arg) > #else > #define __WARN_printf(arg...) __WARN() the easiest way I suppose would be to do #define __WARN_printf(arg..) do { printk(arg); __WARN(); } while (0) any obvious problems with this ? -- Arjan van de VenIntel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] bootgraph: fix for use with dot symbols
Michael Neuling wrote: powerpc has dot symbols, so the dmesg output looks like: <4>[0.327310] calling .migration_init+0x0/0x9c @ 1 <4>[0.327595] initcall .migration_init+0x0/0x9c returned 1 after 0 usecs The below fixes bootgraph.pl so it handles this correctly. question for the ppc folks why does the "print symbol" magic format string thing print a dot symbol and not the real function name? Should that be fixed instead? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev