RE: linux-next: manual merge of the ia64 tree with Linus' tree
> I fixed it up (see below) and can carry the fix as necessary. I rebased the series onto 3.7-rc7 (using the same merge fix that you did) ... so you shouldn't see the merge error next time. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 0/5] Add movablecore_map boot option
> 1. use firmware information > According to ACPI spec 5.0, SRAT table has memory affinity structure > and the structure has Hot Pluggable Filed. See "5.2.16.2 Memory > Affinity Structure". If we use the information, we might be able to > specify movable memory by firmware. For example, if Hot Pluggable > Filed is enabled, Linux sets the memory as movable memory. > > 2. use boot option > This is our proposal. New boot option can specify memory range to use > as movable memory. Isn't this just moving the work to the user? To pick good values for the movable areas, they need to know how the memory lines up across node boundaries ... because they need to make sure to allow some non-movable memory allocations on each node so that the kernel can take advantage of node locality. So the user would have to read at least the SRAT table, and perhaps more, to figure out what to provide as arguments. Since this is going to be used on a dynamic system where nodes might be added an removed - the right values for these arguments might change from one boot to the next. So even if the user gets them right on day 1, a month later when a new node has been added, or a broken node removed the values would be stale. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 0/5] Add movablecore_map boot option
> The other bit is that if you really really want high reliability, memory > mirroring is the way to go; it is the only way you will be able to > hotremove memory without having to have a pre-event to migrate the > memory away from the affected node before the memory is offlined. Some platforms don't support cross-node mirrors ... but we still want to be able to remove a node. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 0/5] Add movablecore_map boot option
> If any significant percentage of memory is in ZONE_MOVABLE then the memory > hotplug people will have to deal with all the lowmem/highmem problems > that used to be faced by 32-bit x86 with PAE enabled. While these problems may still exist on large systems - I think it becomes harder to construct workloads that run into problems. In those bad old days a significant fraction of lowmem was consumed by the kernel ... so it was pretty easy to find meta-data intensive workloads that would push it over a cliff. Here we are talking about systems with say 128GB per node divided into 64GB moveable and 64GB non-moveable (and I'd regard this as a rather low-end machine). Unless the workload consists of zillions of tiny processes all mapping shared memory blocks, the percentage of memory allocated to the kernel is going to be tiny compared with the old 4GB days. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] pstore: Create a convenient mount point for pstore
> > Signed-off-by: Josh Boyer > > Acked-by: Kees Cook Queued for next merge window. Thanks. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v6 -next 0/2] make efivars/efi_pstore interrupt-safe
> Changelog > v5 -> v6 > - Rebase to a latest linux-next tree. > - Modify a comment from "efivar_update_sysfs_entry" to > "efivar_update_sysfs_entries" in include/linux/efi.h (Patch 2/2) Applied to my internal pstore topic branch - which feeds to linux-next. Note that my branch was based on 3.8-rc2 so I unwound the changes to match up with Linus latest. Hope I didn't break anything in the process. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: linux-next: manual merge of the vfs tree with the ia64 tree
> Today's linux-next merge of the vfs tree got a conflict in > arch/ia64/kernel/palinfo.c between commit 40c275bd92b8 ("[IA64] Fix stack > overflow in create_palinfo_proc_entries") from the ia64 tree and commit > d8e904861a28 ("palinfo fixes") from the vfs tree. > > I fixed it up (arbitrarily choosing the vfs tree version) and can carry > the fix as necessary (no action is required). It looks like you picked the version from the ia64 tree - but it would have been better to pick Al's version. He wastes less space on the stack by only declaring cpustr[3+4+1] instead of cpustr[32], but more importantly he checks the return value from proc_mkdir(). I'll drop 40c275bd92b8 out of my tree so Al's can go in without a conflict. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/2] PCI/IA64: fix pci_dev->enable_cnt balance when doing pci hotplug
> But this patch mainly to fix the unbalanced dev->enable_cnt in IA64 which > will print WARNING Calltrace > in dmesg. Thanks for the explanation. > If you think it is valuable, I will try to improve resource assignment in > IA64 like other arch (eg arm, m68k, mips and sh..) > in another patch. Making the ia64 code more like the x86 code might help avoid such problems in the future (lots more people look at x86 than ia64 - if ours is the same, or very similar, then it is likely that changes made to x86 will be correct for ia64 too). Only you can decide how much this is worth to you and your company - perhaps there will be no more changes that break ia64 even with the code differences. Or perhaps it will be easier for you to just fix things as they break than to undertake a restructure of the code. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
>> Tony mentioned that this patch worked fine for him. So could you >> kindly pick up this patch? > > Normally, Tony picks up the Intel side of MCE. Tony, want me to do it? I'll pick it up. Thanks. -Tony
[GIT PULL] x86/mce - clean up cmci_rediscover()
The following changes since commit 07961ac7c0ee8b546658717034fe692fd12eefa9: Linux 3.9-rc5 (2013-03-31 15:12:43 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-cmci_rediscover for you to fetch changes up to 7a0c819d28f5c91955854e048766d6afef7c8a3d: x86/mce: Rework cmci_rediscover() to play well with CPU hotplug (2013-04-02 14:04:01 -0700) Clean up cmci_rediscover code to fix problems found by Dave Jones Srivatsa S. Bhat (1): x86/mce: Rework cmci_rediscover() to play well with CPU hotplug arch/x86/include/asm/mce.h | 4 ++-- arch/x86/kernel/cpu/mcheck/mce.c | 2 +- arch/x86/kernel/cpu/mcheck/mce_intel.c | 25 + 3 files changed, 8 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/2] efivars: Check max_size only if it is non-zero.
> Some (broken?) EFI implementations return always a MaximumVariableSize of 0, > check against max_size only if it is non-zero. The spec doesn't say that zero has any special meaning - so if an implementation returns max_size == 0 but lets you set a variable to a size > 0, then I don't think there is a need for parentheses or a "?" in this commit comment. But if Linux silently accepts such broken EFI, then there is no feedback loop to let EFI implementations know that they are broken. In other areas we have thrown out messages about firmware being broken ... perhaps: if (max_size == 0) printk_once("Broken EFI implementation is returning MaxVariableSize=0\n"); would help? After all there probably *is* a maximum size - but EFI isn't telling us what it is. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86, amd, mce: Prevent potential cpu-online oops
+ if (WARN_ON_ONCE(!nb)) + goto out; + WARN_ON_ONCE() will drop a stack trace to the console - is that going to be useful? If you want a message perhaps: if (!nb) { printk_once("something interesting about not having access to north bridge\n") goto out; } -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] staging/adt7316 Fix some 'interesting' string operations
Calling memcmp() to check the value of the first byte in a string is overkill. Just use buf[0] == '1' or buf[0] != '1' as appropriate. Signed-off-by: Tony Luck --- [Inspired by a rant on IRC about a different driver doing something similar] diff --git a/drivers/staging/iio/addac/adt7316.c b/drivers/staging/iio/addac/adt7316.c index 0b431bc..506b5a7 100644 --- a/drivers/staging/iio/addac/adt7316.c +++ b/drivers/staging/iio/addac/adt7316.c @@ -256,7 +256,7 @@ static ssize_t adt7316_store_enabled(struct device *dev, struct adt7316_chip_info *chip = iio_priv(dev_info); int enable; - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') enable = 1; else enable = 0; @@ -299,7 +299,7 @@ static ssize_t adt7316_store_select_ex_temp(struct device *dev, return -EPERM; config1 = chip->config1 & (~ADT7516_SEL_EX_TEMP); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') config1 |= ADT7516_SEL_EX_TEMP; ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG1, config1); @@ -495,7 +495,7 @@ static ssize_t adt7316_store_disable_averaging(struct device *dev, int ret; config2 = chip->config2 & (~ADT7316_DISABLE_AVERAGING); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') config2 |= ADT7316_DISABLE_AVERAGING; ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG2, config2); @@ -534,7 +534,7 @@ static ssize_t adt7316_store_enable_smbus_timeout(struct device *dev, int ret; config2 = chip->config2 & (~ADT7316_EN_SMBUS_TIMEOUT); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') config2 |= ADT7316_EN_SMBUS_TIMEOUT; ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG2, config2); @@ -597,7 +597,7 @@ static ssize_t adt7316_store_powerdown(struct device *dev, int ret; config1 = chip->config1 & (~ADT7316_PD); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') config1 |= ADT7316_PD; ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG1, config1); @@ -635,7 +635,7 @@ static ssize_t adt7316_store_fast_ad_clock(struct device *dev, int ret; config3 = chip->config3 & (~ADT7316_ADCLK_22_5); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') config3 |= ADT7316_ADCLK_22_5; ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG3, config3); @@ -681,7 +681,7 @@ static ssize_t adt7316_store_da_high_resolution(struct device *dev, chip->dac_bits = 8; - if (!memcmp(buf, "1", 1)) { + if (buf[0] == '1') { config3 = chip->config3 | ADT7316_DA_HIGH_RESOLUTION; if (chip->id == ID_ADT7316 || chip->id == ID_ADT7516) chip->dac_bits = 12; @@ -731,7 +731,7 @@ static ssize_t adt7316_store_AIN_internal_Vref(struct device *dev, if ((chip->id & ID_FAMILY_MASK) != ID_ADT75XX) return -EPERM; - if (memcmp(buf, "1", 1)) + if (buf[0] != '1') config3 = chip->config3 & (~ADT7516_AIN_IN_VREF); else config3 = chip->config3 | ADT7516_AIN_IN_VREF; @@ -773,7 +773,7 @@ static ssize_t adt7316_store_enable_prop_DACA(struct device *dev, int ret; config3 = chip->config3 & (~ADT7316_EN_IN_TEMP_PROP_DACA); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') config3 |= ADT7316_EN_IN_TEMP_PROP_DACA; ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG3, config3); @@ -812,7 +812,7 @@ static ssize_t adt7316_store_enable_prop_DACB(struct device *dev, int ret; config3 = chip->config3 & (~ADT7316_EN_EX_TEMP_PROP_DACB); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') config3 |= ADT7316_EN_EX_TEMP_PROP_DACB; ret = chip->bus.write(chip->bus.client, ADT7316_CONFIG3, config3); @@ -1018,7 +1018,7 @@ static ssize_t adt7316_store_DA_AB_Vref_bypass(struct device *dev, return -EPERM; dac_config = chip->dac_config & (~ADT7316_VREF_BYPASS_DAC_AB); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') dac_config |= ADT7316_VREF_BYPASS_DAC_AB; ret = chip->bus.write(chip->bus.client, ADT7316_DAC_CONFIG, dac_config); @@ -1063,7 +1063,7 @@ static ssize_t adt7316_store_DA_CD_Vref_bypass(struct device *dev, return -EPERM; dac_config = chip->dac_config & (~ADT7316_VREF_BYPASS_DAC_CD); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1') dac_config |= ADT7316_VREF_BYPASS_DAC_CD; ret = chip->bus.write(chip->bus.client, ADT7316_DAC_CONFIG, dac_config); @@ -1982,7 +1982,7 @@ static ssize_t adt7316_set_int_enabled(struct device *dev, int ret; config1 = chip->config1 & (~ADT7316_INT_EN); - if (!memcmp(buf, "1", 1)) + if (buf[0] == '1')
RE: [PATCH 3/3] acpi, memory-hotplug: Support getting hotplug info from SRAT.
> I will post a patch to fix it. How about always keep node0 unhotpluggable ? Node 0 (or more specifically the node that contains memory <4GB) will be full of BIOS reserved holes in the memory map. It probably isn't removable even if Linux thinks it is. Someday we might have a smart BIOS that can relocate itself to another node - but for now making node0 unhotpluggable looks to be a plausible interim move. Ultimately we'd like to be able to remove any node (just not all of them at the same time ... just like we can now offline any cpu - but not all of them together). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
> assume first cpu only have 1G ram, and other 31 socket will have bunch of ram That doesn't seem to be a very realistic assumption. Can you even still buy 1G DIMMs for servers? I'd think that a minimum would be to have each of four channels populated with a 4G DIMM - so 16GB on first cpu. But even that feels rather low. I think that making sure that the system can boot is good (and maybe it should ignore/override[*] parameters that would prevent booting). But let's be realistic about the cases we actually have to deal with (before somebody comes and talks about systems with just 16MB). -Tony [*] with some noisy warnings in the console log -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
> b. it will be freed to slub before run time. > like init code and initrd disk. If this is a problem - I'd be inclined to disable the code that frees it. It's only a few hundred KB of code, and possibly a few MB of initrd. Too small to worry about on a hot pluggable server. > In that case, so they should just boot system with numa=off. But we will still care about NUMA locality. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH V3] ia64/mm: fix a bad_page bug when crash kernel booting
> In efi_init() memory aligns in IA64_GRANULE_SIZE(16M). If set > "crashkernel=1024M-:600M" Is this where the real problem begins? Should we insist that users provide crashkernel parameters rounded to GRANULE boundaries? -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
[GIT PULL] pstore patches for 3.9 merge window
The following changes since commit d1c3ed669a2d452cacfb48c2d171a1f364dae2ed: Linux 3.8-rc2 (2013-01-02 18:13:21 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-pstore for you to fetch changes up to fb0af3f2b1b613e5ea75426d454c7e5b1d1eef49: pstore: Create a convenient mount point for pstore (2013-02-12 13:07:22 -0800) A few fixes to reduce places where pstore might hang a system in the crash path. Plus a new mountpoint (/sys/fs/pstore ... makes more sense then /dev/pstore). Josh Boyer (1): pstore: Create a convenient mount point for pstore Seiji Aguchi (4): pstore: Avoid deadlock in panic and emergency-restart path efi_pstore: Avoid deadlock in non-blocking paths efivars: Disable external interrupt while holding efivars->lock efi_pstore: Introducing workqueue updating sysfs Documentation/ABI/testing/pstore | 10 +-- drivers/firmware/efivars.c | 180 +-- fs/pstore/inode.c| 18 +++- fs/pstore/platform.c | 35 ++-- include/linux/efi.h | 3 +- include/linux/pstore.h | 6 ++ 6 files changed, 192 insertions(+), 60 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Fix ia64 build breakage
The following changes since commit 2ef14f465b9e096531343f5b734cffc5f759f4a6: Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2013-02-21 18:06:55 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-fix-ia64-build for you to fetch changes up to bc681593b588786e6326b3e5f78ccc1683e2269c: sched: move RR_TIMESLICE from sysctl.h to rt.h (2013-02-22 09:20:11 -0800) Fix ia64 build Clark Williams (1): sched: move RR_TIMESLICE from sysctl.h to rt.h include/linux/sched/rt.h | 6 ++ include/linux/sched/sysctl.h | 6 -- 2 files changed, 6 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC] x86/mce: Move MCE sysfs attributes out of the per-cpu location
> Note: I'm not sure if it's ok to change sysfs entries and this does break > userspace tools that depend on the current path for some of these attributes. > So, they will need to be updated to use the new path. However, if we ever get > to a point where cpu0 can be offlined, these tools will need to be updated > anyway (as they mostly hardcode machinecheck0 currently) Linus' clarified his "never break user space" edict at the kernel summit on Monday. Paraphrasing: If nobody notices, or nobody complains, then we can make changes. But if anyone does complain, then the patch gets reverted. So if you want to do this, the right approach would be to change the utilities that use this to look in the new location for these sysfs files first, and fall back to looking in the old per-cpu place. Next (or in parallel) have the kernel provide both interfaces. Wait a long[1] time so that most people have updated utilities. Delete the per-cpu interfaces from the kernel. Delete the per-cpu references from the utilities. -Tony [1] Long enough that there are no complaints. At least a year, probably two or more. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RESEND] memory hotplug: fix a double register section info bug
> This is an unusual configuration but it's not unheard of. PPC64 in rare > (and usually broken) configurations can have one node span another. Tony > should know if such a configuration is normally allowed on Itanium or if > this should be considered a platform bug. Tony? We definitely have platforms where the physical memory on node 0 that we skipped to leave physical address space for PCI mem mapped devices gets tagged back at the very top of memory, after other nodes. E.g. A 2-node system with 8G on each might look like this: 0-2G RAM on node 0 2G-4G PCI map space 4G-8G RAM on node 0 8G-16GRAM on node 1 16G-18G RAM on node 0 Is this the situation that we are talking about? Or something different? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: x86/non-x86: percpu, node ids, apic ids x86.git fixup
> this i believe builds an implicit dependency between the mca_asm.o > position within the image and the ia64_mca_data percpu variable it > accesses - it relies on the immediate 22 addressing mode that has 4MB of > scope. Per chance, the .config you sent creates a 14MB image, and the > percpu variables moved too far away for the linker to be able to fulfill > this constraint. Sounds very plausible. > The workaround is to define PER_CPU_ATTRIBUTES to link percpu variables > back into the .percpu section on UP too - which ia64 links specially > into its vmlinux.lds. But ultimately i think the better solution would > be to remove this dependency between arch/ia64/kernel/mca_asm.S and the > position of the percpu data. Yup. That fixes the build ... the resulting binary doesn't boot though :-( I just realized that it has been a while since I tried booting a UP kernel ... so the problem may be unrelated bitrot elsewhere. Overall you are right that the mca_asm.S code should not be dependent on the relative location of the data objects. I'll start digging on why this doesn't boot ... but you might as well send the fixes so far upstream to Linus so that the SMP fix is available (which is all anyone really cares about ... there are very, very few UP ia64 systems in existence). Acked-by: Tony Luck <[EMAIL PROTECTED]> -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: x86/non-x86: percpu, node ids, apic ids x86.git fixup
> I'll start digging on why this doesn't boot ... but you might as well > send the fixes so far upstream to Linus so that the SMP fix is available Well a pure 2.6.24 version compiled with CONFIG_SMP=n booted just fine, so the breakage is recent ... and more than likely related to this change. I've only had a casual dig at the failing case ... kernel dies in memset() as called from kmem_cache_alloc() with the address being written as 0x40117b48 (which is off in the virtual address space range used by users ... not a kernel address). I'll dig some more tomorrow. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: x86/non-x86: percpu, node ids, apic ids x86.git fixup
> hm, as far as i could check, on ia64 UP the .percpu section link > difference was the only ia64 difference i could find out of those > changes. Could you try to copy a 2.6.24 include/asm-generic/percpu.h, > include/asm-ia64.h and include/linux/percpu.h into your current tree, > and see whether that boots? If yes, then it's the percpu changes. The > patch below does this ontop of very latest -git - and it builds fine > with your UP config with a crosscompiler. Applied that patch and UP kernel built ok, and then crashed in the same place with the memset() to a user-looking address from kmem_cache_alloc() So the percpu changes are innocent ... something else since 2.6.24 is to blame. Only 5749 commits :-) I'll start bisecting. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: x86/non-x86: percpu, node ids, apic ids x86.git fixup
> So the percpu changes are innocent ... something else since 2.6.24 is > to blame. Only 5749 commits :-) I'll start bisecting. 12 bisections later ... nothing! I think I got lost in the maze. Bisection #5 had a crash, but it looked to be a very differnt crash (and looked to happen later than the bug I was hunting). So I marked that as "good" on the theory that it looked like this bug wasn't in the kernel. Same thing happened at bisection #9. But I ended up with: commit bfada697bd534d2c16fd07fbef3a4924c4d4e014 Author: Pavel Emelyanov <[EMAIL PROTECTED]> Date: Sun Dec 2 00:57:08 2007 +1100 [IPV4]: Use ctl paths to register devinet sysctls Which just looks too improbable to be the cause of the UP crash. Git won't revert it out from top of tree automatically so I can't easily test whether some weird magic means that this is the buggy commit. Perhaps the issue is another offset of object X in kernel w.r.t. object Y ... and so the good/bad choices in the bisection are actually pretty random depending on how much code is stuffed between X & Y at each bisection point. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: x86/non-x86: percpu, node ids, apic ids x86.git fixup
> Applied that patch and UP kernel built ok, and then crashed in the > same place with the memset() to a user-looking address from kmem_cache_alloc() > > So the percpu changes are innocent ... something else since 2.6.24 is > to blame. Only 5749 commits :-) I'll start bisecting. The bisection narrowed in on an innocent patch in ipv4 space. Meanwhile the rush of patches continues. When I retested yesterday when Linus HEAD was 8af03e782... the CONFIG_SMP=n kernel worked perfectly. So maybe it was fixed? Or maybe the bug depends on the relative location of various bits of code/data and as the kernel grows and shrinks with incoming changes the problem comes and goes :-( -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] MAINTAINERS: Add myself as the SWIOTLB maintainer.
> Now that I've an IA64 box on top of the other boxes > (IBM with Calgary-X, Intel VT-d, AMD Vi, and AMD GART - that > can use SWIOTLB as fallback) I can reliably do regression > testing. Acked-by: Tony Luck -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
> First of all, I do think I was answering your question. As I said > before, if an online cpu == dying here, there must be something wrong. > Am I right here ? Yes - but there is a fuzzy line over where it is good to check for "something wrong" or whether to trust that the caller of the function knew what they were doing. For example we trust that "dying" is a valid cpu number. If we were super-paranoid that someone might change the code and call us with a bad argument, we might add: BUG_ON(dying < 0 || dying >= MAX_NR_CPUS); This would certainly help debug the case if someone did make a bogus change ... but I think it is clear that this test is way past the fuzzy line and into pointless. Back to the case in question: do we think there is a credible case where the "dying" cpu can show up in our "for_each_cpu_online()" loop? The original author of the code was worried enough to make a test, but thought that the appropriate action was to silently skip it. You want to add a WARN_ON, which will cause users who read the console logs to worry, but that most users will never see. -Tony
RE: [PATCH 02/26] pstore: add flags
> I wonder if the default should be to not show headers, and to add this > flag to the backends that want the pstore-added header. I think the > more common case going forward will to be without headers since > backends should arguably storing metadata themselves. Perhaps just add the headings when pstore breaks a dump into pieces because of a back-end size limitation. I.e. if there is only one piece, then no headings. If there are two or more, include a heading to aid with putting the pieces together later. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 011/193] arch/ia64: remove CONFIG_EXPERIMENTAL
> This config item has not carried much meaning for a while now and is > almost always enabled by default. As agreed during the Linux kernel > summit, remove it. Acked-by: Tony Luck [ditto for parts 012 and 013 of 193] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/5] Rework MCA configuration handling code, v2
> Third round, incorporating feedback from the last time. Paste one of these onto each piece: Acked-by: Tony Luck Acked-by: Tony Luck Acked-by: Tony Luck Acked-by: Tony Luck Acked-by: Tony Luck -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
> Just curious -- can you reproduce the same problem with > CONFIG_PRINTK_TIME as I'm seeing? Yes I can reproduce this (on latest Linus tree). System dies with no console output ... looks like the boot cpu may have taken a machine check (it isn't responding to my debugger). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Strange hang on ia64 with CONFIG_PRINTK_TIME=y
> I guess sched_init() is too early... it does seem really strange to > me, but I just double checked with Ingo's patch and it does indeed > hang. The slow way to make progress is just to go through > start_kernel() line-by-line and enable cpu_clock() at each stage, and > see where it stops hanging. I'll give that a shot as a background > process (my ia64 box takes quite a while to boot, so each test takes a > long time but requires very little of my attention). We *ought* to be safe after cpu_init() ... which is called from setup_arch(), which is several calls before sched_init(). Thanks for looking at this though ... my ability to test just went away for a while ... some lab re-organization means all my systems just got powered off and removed from their rack so the rack can be moved to a new location. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/5] driver core / ACPI: Move ACPI support to core device and driver types
On 10/31/2012 11:42 AM, Rafael J. Wysocki wrote: > I wonder if the x86 and/or ia64 maintainers have any reservations? Can you elaborate on the "tested by mika" that you put into the 0/5 message. Especially w.r.t. ia64. Compile tested? Boot tested? Ran with some new device that uses the ACPI enumeration provided by this series? Nothing in the concept or code scares me ... but I'd like to know that it actually works :-) -Tony
RE: [PATCH 1/5] driver core / ACPI: Move ACPI support to core device and driver types
> By "tested" I mean "run with some new devices that use the ACPI enumeration > provided here, on x86". Sorry for being too vague. Do you or Mika have access to an ia64 box to test. If not, can you suggest some way that I could exercise this code w/o the new devices. Or at least reassure myself that all is benign in a system full of old devices. -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH 1/5] driver core / ACPI: Move ACPI support to core device and driver types
> The BIOSes of currently available ia64 systems don't contain ACPI nodes whose > IDs will match the IDs of the new devices (ie. the ones that are going to be > added to acpi_platform_device_ids[]), so for ia64 it should be sufficient to > test that code as is (ie. without any new devices in the system). Ok - built cleanly on ia64. Boots too. Just one new console message: ACPI: bus type platform registered that seems pretty harmless. Acked-by: Tony Luck
RE: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts
> That is correct, unfortunately. That information is not available to > software in all cases. Maybe APEI could be used for that DIMM location > mapping through simple tables instead of letting it fumble the error > handling path. Not much hope for "simple"[1] tables. There is also a timings issue on system with rank sparing, memory mirroring etc. ... you need to decode to the DIMM at the time the error happened. If you wait until later, then the system may have switched over to the spare rank or mirror ... and then your decode will point at the new target, rather than the old. -Tony [1] Consider a 4 cpu-socket machine with 4 channels per socket and three DIMMs per channel - so there are 48 sockets on the motherboard. Then some lab monkey takes a box of random 1, 2, 4, 8 GB DIMMs and fills most of the sockets. BIOS will somehow make sense out of this and interleave where it finds matching speeds across pairs/quads of channels (though size need not match ... if you have a 2G and 4G DIMM you may get interleaving for the part. then non-interleaved for the "extra" 2G).
RE: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts
> Right, but at least in the csrow case, we still can compute back the > csrow even with the interleaving, after we know how it is done exactly > (on which address bits, etc). I think this should be doable on Intel > controllers too but I don't know. No. Architecturally all Intel provides is the physical address in MCi_ADDR. To do anything with that you are into per-system space, and the registers that define the mappings are not necessarily available to OS code ... sometimes they are, and sometimes they are even documented in places where Mauro can use them to write an EDAC driver ... but there are no guarantees. -Tony
RE: [PATCH] debug: Do not permit CONFIG_DEBUG_STACK_USAGE=y on IA64 or PARISC
> I agree with this. Most of it looks easily fixable, but how would I > enable the fix for ia64? For PA it's simple: I'll just use > CONFIG_STACK_GROWSUP, but that won't work for you. ia64 has an ugly chicken vs. egg build dependency. When trying to build our asm-offsets.h file (to get #define constants for various structure sizes and offsets in a format that is usable in assembly code) we get: include/linux/sched.h:2539: error: 'IA64_TASK_SIZE' undeclared (first use in this function) Which is sad because IA64_TASK_SIZE is one of the #defines that asm-offsets.h is trying to produce. Which is why I just threw up my hands in despair and said "!IA64" for this option. -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH] pstore: avoid recursive spinlocks in the oops_in_progress case
> And my plan was to get rid of the fact that backends touch pstore->buf > directly. Backends would always receive anonymous 'buf' pointer (we > already have write_buf callback that does exactly this), and thus it It feels like we are just shuffling the lock problem from one place to another. In the panic case we have to use a pre-allocated buffer (hoping that we can allocate one seems to be a foolish plan). So we'd need a lock around use of that buffer somewhere - whether it is in the panic code, the pstore generic code, or the back-end driver. Can you describe where you'd like to end up? -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: [PATCH 2/3] ext4: introduce ext4_error_remove_page
> If we go back to first principles, what do we want to do? We want the > system administrator to know that a file might be potentially > corrupted. And perhaps, if a program tries to read from that file, it > should get an error. If we have a program that has that file mmap'ed > at the time of the error, perhaps we should kill the program with some > kind of signal. But to force a reboot of the entire system? Or to > remounte the file system read-only? That seems to be completely > disproportionate for what might be 2 or 3 bits getting flipped in a > page cache for a file. I think that we know that the file *is* corrupted, not just "potentially". We probably know the location of the corruption to cache-line granularity. Perhaps better on systems where we have access to ecc syndrome bits, perhaps worse ... we do have some errors where the low bits of the address are not known. I'm in total agreement that forcing a reboot or fsck is unhelpful here. But what should we do? We don't want to let the error be propagated. That could cause a cascade of more failures as applications make bad decisions based on the corrupted data. Perhaps we could ask the filesystem to move the file to a top-level "corrupted" directory (analogous to "lost+found") with some attached metadata to help recovery tools know where the file came from, and the range of corrupted bytes in the file? We'd also need to invalidate existing open file descriptors (or less damaging - flag them to avoid the corrupted area??). Whatever we do, it needs to be persistent across a reboot ... the lost bits are not going to magically heal themselves. We already have code to send SIGBUS to applications that have the corrupted page mmap(2)'d (see mm/memory-failure.c). Other ideas? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/3] ext4: introduce ext4_error_remove_page
> Well, we could set a new attribute bit on the file which indicates > that the file has been corrupted, and this could cause any attempts to > open the file to return some error until the bit has been cleared. That sounds a lot better than renaming/moving the file. > This would persist across reboots. The only problem is that system > administrators might get very confused (at least at first, when they > first run a kernel or a distribution which has this feature enabled). Yes. This would require some education. But new attributes have been added in the past (e.g. immutable) that caused confusion to users and tools that didn't know about them. > Application programs could also get very confused when any attempt to > open or read from a file suddenly returned some new error code (EIO, > or should we designate a new errno code for this purpose, so there is > a better indication of what the heck was going on?) EIO sounds wrong ... but it is perhaps the best of the existing codes. Adding a new one is also challenging too. > Also, if we just log the message in dmesg, if the system administrator > doesn't find the "this file is corrupted" bit right away This is pretty much a given. Nobody will see the message in the console log until it is far too late. > I'm not sure it's worth it to go to these extents, but I could imagine > some customers wanting to have this sort of information. Do we know > what their "nice to have" / "must have" requirements might be? 18 years ago Intel rather famously attempted to sell users on the idea that a rare divide error that sometimes gave the wrong answer could be ignored. Before my time at Intel, but it is still burned into the corporate psyche that customers really don't like to get the wrong answers from their computers. Whether it is worth it may depend on the relative frequency of data being corrupted this way, compared to all the other ways that it might get messed up. If it were a thousand times more likely that data got silently corrupted on its path to media, sitting spinning on the media, and then back off the drive again - then all this fancy stuff wouldn't make any real difference. I have no data on the relative error rates of memory and i/o - so I can't answer this. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/3] ext4: introduce ext4_error_remove_page
> What I would recommend is adding a > > #define FS_CORRUPTED_FL 0x0100 /* File is corrupted */ > > ... and which could be accessed and cleared via the lsattr and chattr > programs. Good - but we need some space to save the corrupted range information too. These errors should be quite rare, so one range per file should be enough. New file systems should plan to add space in their on-disk format. The corruption isn't going to go away across a reboot. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Fix a cmci discovery problem
The following changes since commit 8f0d8163b50e01f398b14bcd4dc039ac5ab18d64: Linux 3.7-rc3 (2012-10-28 12:24:48 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-tangchen for you to fetch changes up to 85b97637bb40a9f486459dd254598759af9c3d50: x86/mce: Do not change worker's running cpu in cmci_rediscover(). (2012-10-30 14:38:12 -0700) Fix problem in CMCI rediscovery code that was illegally migrating worker threads to other cpus. Tang Chen (1): x86/mce: Do not change worker's running cpu in cmci_rediscover(). arch/x86/kernel/cpu/mcheck/mce_intel.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [GIT PULL] EFI changes for v3.11
>> >> Tony Luck (1): >> [IA64] sim: Add casts to avoid assignment warnings >> >> arch/ia64/hp/sim/boot/fw-emu.c | 20 ++-- >> 1 file changed, 10 insertions(+), 10 deletions(-) > > I don't see this commit in Linus' tree so presumably Tony is still > seeing these warnings. Correct - I see 10 warning about "assignment makes pointer from integer" when building Linus' tree (HEAD = d2b4a646). My patch doesn't appear to be in linux-next either (next-20130708). I had hoped to have this patch follow in the same path that the one that changed the types and introduced the warnings took ... but since that didn't work perhaps I should just ask Linus to pull it from my ia64 tree. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report
> Nope, this is a just-in-case thing. I think you or Tony asked to have > this in a previous discussion so that we're covered if firmware starts > acting up. Other than that, I'm ok if this is left out. I'm struggling to think of a case where this would help. It implies that we are on a running system, and we somehow notice that the BIOS is telling us to take some pages offline - and that we know better than the BIOS that we'd like to just ignore any more such messages from the BIOS. But we still leave the BIOS in charge of logging the errors and keeping track of the thresholds. I'm happy with just the acpi=nocmcff to avoid a BIOS that does weird stuff. Or do you think we might still have to deal with a string of APEI messages? -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
RE: BUG: key ffff880c1148c478 not in .data! (V3.10.0)
> What would be a reasonable maximum limit for the number of memory > controllers, on a -EX machine? Westmere-EX has one memory controller per socket ... and there are glueless systems up to 8 sockets. So 8 there. Not sure if any OEM is building larger machines with a node controller (SGI? Not sure if they build their behemoths from -EP or -EX parts). Ivy Bridge ups the ante with two memory controllers on a socket. So plan on doubling soon. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
This is good - but the real solution is to stop poisoning entire huge pages ... they should be broken into 4K pages and just one 4K page should be poisoned. Naoya Horiguchi: I thought that you were looking at this problem some months ago. Any progress? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
>>Sorry, I have no meaningful progress on this. Splitting hugepages is not >>a trivial operation, and introduce more complexity on hugetlbfs code. >>I don't hit on any usecase of it rather than memory failure, so I'm not >>sure that it's worth doing now. > > Agreed. ;-) Agreed that huge pages should be split - or that it is not worth splitting them? Actually I wonder how useful huge pages still are - transparent huge pages may give most of the benefits without having to modify applications to use them. Plus the kernel does know how to split them when an error occurs (which I care about more than most people). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RESEND PATCH v2 1/4] mm/hwpoison: fix traverse hugetlbfs page to avoid printk flood
> Transparent huge pages are not helpful for DB workload which there is a lot > of > shared memory Hmm. Perhaps they should be. If a database allocates most[1] of the memory on a machine to a shared memory segment - that *ought* to be a candidate for using transparent huge pages. Now that we have them they seem a better choice (much more flexibility) than hugetlbfs. -Tony [1] I've been told that it is normal to configure over 95% of physical memory to the shared memory region to run a particular transaction based benchmark with one commercial data base application. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] pstore/compression fixes
The following changes since commit e831cbfc1ad843b5542cc45f777e1a00b73c0685: Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux (2013-09-11 08:36:03 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-pstore for you to fetch changes up to 802e4c6f5887205eda110c6bfb90c9bfa93dc8a7: pstore: Remove the messages related to compression failure (2013-09-16 09:28:29 -0700) Three pstore fixes related to compression: 1) Better adjustment of size of compression buffer (was too big for EFIVARS backend resulting in compression failure 2) Use zlib_inflateInit2 instead of zlib_inflateInit 3) Don't print messages about compression failure. They will waste space that may better be used to log console output leading to the crash. Aruna Balakrishnaiah (3): pstore: Adjust buffer size for compression for smaller registered buffers pstore: Use zlib_inflateInit2 instead of zlib_inflateInit pstore: Remove the messages related to compression failure fs/pstore/platform.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH v2 04/11] pstore: Add compression support to pstore
> The reason behind compression failure is the size of big_oops_buf which is too > big for efivars case. I will do some experiments with different kind of texts > for buffer size 1024 to check if 100/53 suits for all the cases. ... > Yes this can be changed to zlib_inflateInit2(). Original patch series was just pulled by Linus ... so we'll need a patch on top of current Linus git tree to fix these issues. But let's make sure that efivars, erst, etc. are all happy with the changes we make before I ask Linus to pull another pstore piece. Thanks -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lockref: remove cpu_relax() again
> *If* however the cpu_relax() makes sense on other platforms maybe we could > add something like we have already with "arch_mutex_cpu_relax()": I'll do some more measurements on ia64. During my first tests cpu_relax() seemed to be a big win - but I only ran "./t" a couple of times. Later (with the cpu_relax() in place) I ran a bunch more iterations, and found that the variation from run to run is much larger with lockref. The mean score is 60% higher, but the standard deviation is an order of magnitude bigger (enough that one run out of 20 with lockref scored lower than the pre-lockref kernel). I think this is expected ... cmpxchg is a free-for-all - and sometimes poor placement across the four socket system might cause short term starvation to a thread while threads on another socket monopolize the cache line. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] lockref: remove cpu_relax() again
> And there can't be any livelock, since by definition somebody else > _did_ make progress. In fact, adding the cpu_relax() probably just > makes things much less fair - once somebody else raced on you, the > cpu_relax() now makes it more likely that _another_ cpu does so too. > > That said, let's see Tony's numbers are. Data from 20 runs of "./t" 3.11 + Linus enabling patches, but ia64 not enabled (commit bc08b449ee14a from Linus tree). mean 3469553.80 min 3367709.00 max 3494154.00 stddev = 43613.722742 Now add ia64 enabling (including the cpu_relax()) mean 5509067.15 // nice boost min 3191639.00 // worst case is worse than worst case before we made the change max 6508629.00 stddev = 793243.943875 // much more variation from run to run Comment out the cpu_relax() mean 2185864.40 // this sucks min 2141242.00 max 2286505.00 stddev = 40847.960152 // but it consistently sucks So Linus is right that the cpu_relax() makes things less fair ... but without it performance sucks so much that I don't want to use the clever cmpxchg at all - I'm much better off without it! This may be caused by Itanium hyper-threading (SOEMT - switch on event multi-threading) where the spinning thread means that its buddy retires no instructions until h/w times it out and forces a switch. But that's just a guess - losing the cacheline to whoever made the change that caused the cmpxchg to fail should also force a thread switch. -Tony
RE: [PATCH] lockref: remove cpu_relax() again
> Also, it strikes me that ia64 has tons of different versions of > cmpxchg, and the one you use by default is the one with "acquire" > semantics Not "tons", just two. You can ask for "acquire" or "release" semantics, there is no relaxed option. Worse still - early processor implementations actually just ignored the acquire/release and did a full fence all the time. Unfortunately this meant a lot of badly written code that used .acq when they really wanted .rel became legacy out in the wild - so when we made a cpu that strictly did the .acq or .rel ... all that code started breaking - so we had to back-pedal and keep the "legacy" behavior of a full fence :-( -Tony
RE: [PATCH] lockref: remove cpu_relax() again
> That said, another thing that strikes me is that you have 32 CPU > threads, and the stupid test-program I sent out had MAX_THREADS set to > 16. Did you change that? Becuase if not, then some of the extreme > performance profile might be about how the threads get scheduled on > your machine (HT threads vs full cores etc). I'll try to get new numbers with 32 threads[*] - but even if they look good, I'd be upset about the 16 thread case being worse with the cmpxchg/no-cpu-relax case than the original code. -Tony [*] probably not till tomorrow
RE: [PATCH] lockref: remove cpu_relax() again
>> Worse still - early processor implementations actually just ignored >> the acquire/release and did a full fence all the time. Unfortunately >> this meant a lot of badly written code that used .acq when they really >> wanted .rel became legacy out in the wild - so when we made a cpu >> that strictly did the .acq or .rel ... all that code started breaking - so >> we had to back-pedal and keep the "legacy" behavior of a full fence :-( > > Ugh. Can you try what happens with the weaker release-semantics > performance-wise for that code? Do it *just* for the lockref code.. No. I can change the Linux code to say "cmpxchg.rel" here ... but the h/w will do exactly the same thing it did when I had "cmpxchg.acq". -Tony
RE: [PATCH 1/3] pstore: Adjust buffer size for compression for smaller registered buffers
- big_oops_buf_sz = (psinfo->bufsize * 100) / 45; + big_oops_buf_sz = (psinfo->bufsize * 100) / cmpr; Tested on an ERST backed system. Seems to be working (we save a little less information per ERST record than before this change (uncompressed size goes down from ~17500 to ~16400 bytes) - but this patch switched the denominator from 45 to 48 (for ERST) - so that seems plausible. Seiji: let me know how the efivars tests go. -Tony
RE: [PATCH v2] pstore: Adjust buffer size for compression for smaller registered buffers
+ default: + cmpr = 60; + break; + } Is this the right "default"? It may be a good choice for a backend with a really tiny buffer (1 ... 999). But less good for a (theoretical) backend with a larger buffer (10001 ... infinity and beyond). Which are you trying to catch here? -Tony
RE: [PATCH] pstore/ram: (really) fix undefined usage of rounddown_pow_of_two
>> Previous attempt to fix was b042e47491ba5f487601b5141a3f1d8582304170 >> >> Suggested use of is_power_of_2() was bogus because is_power_of_2(0) is >> false (documented behaviour). >> >> Signed-off-by: Maxime Bizon > > Yes, excellent point. :) > > Acked-by: Kees Cook Applied. Thanks. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] pstore changes for 3.12
The following changes since commit b36f4be3de1b123d8601de062e7dbfc904f305fb: Linux 3.11-rc6 (2013-08-18 14:36:53 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-pstore for you to fetch changes up to 3bd11cf56e4d9c9a79c0c1a4ebe381c674ec9709: pstore/ram: (really) fix undefined usage of rounddown_pow_of_two (2013-08-30 15:57:01 -0700) Big part of this is the addition of compression to the generic pstore layer so that all backends can use the pitiful amounts of storage they control more effectively. Three other small fixes/cleanups too. Aruna Balakrishnaiah (11): powerpc/pseries: Remove (de)compression in nvram with pstore enabled pstore: Add new argument 'compressed' in pstore write callback pstore/Kconfig: Select ZLIB_DEFLATE and ZLIB_INFLATE when PSTORE is selected pstore: Add compression support to pstore pstore: Introduce new argument 'compressed' in the read callback pstore: Add decompression support to pstore pstore: Add file extension to pstore file if compressed powerpc/pseries: Read and write to the 'compressed' flag of pstore erst: Read and write to the 'compressed' flag of pstore efi-pstore: Read and write to the 'compressed' flag of pstore pstore/ram: Read and write to the 'compressed' flag of pstore Dan Carpenter (1): pstore: d_alloc_name() doesn't return an ERR_PTR Maxime Bizon (1): pstore/ram: (really) fix undefined usage of rounddown_pow_of_two Wei Yongjun (1): acpi/apei/erst: Add missing iounmap() on error in erst_exec_move_data() arch/powerpc/platforms/pseries/nvram.c | 112 - drivers/acpi/apei/erst.c | 25 +++- drivers/firmware/efi/efi-pstore.c | 27 - fs/pstore/Kconfig | 2 + fs/pstore/inode.c | 10 +- fs/pstore/internal.h | 5 +- fs/pstore/platform.c | 212 ++--- fs/pstore/ram.c| 47 ++-- include/linux/pstore.h | 6 +- 9 files changed, 306 insertions(+), 140 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lockref: Relax in cmpxchg loop
While we are likley to succeed and break out of this loop, it isn't guaranteed. We should be power and thread friendly if we do have to go around for a second (or third, or more) attempt. Signed-off-by: Tony Luck --- diff --git a/lib/lockref.c b/lib/lockref.c index 7819c2d..9d76f40 100644 --- a/lib/lockref.c +++ b/lib/lockref.c @@ -19,6 +19,7 @@ if (likely(old.lock_count == prev.lock_count)) { \ SUCCESS; \ } \ + cpu_relax(); \ } \ } while (0) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 01/11] random: don't feed stack data into pool when interrupt regs NULL
> In this case fast_mix would use two uninitialized ints from the stack > and mix it into the pool. Is the concern here is that an attacker might know (or be able to control) what is on the stack - and so get knowledge of what is being mixed into the pool? > In this case set the input to 0. And the fix is to guarantee that everyone knows what is being mixed in? (!) Wouldn't it be better to adjust the "nbytes" parameter to fast_mix(..., ..., sizeof (input)); to only mix in the part of input[] that we successfully initialized? Untested patch below. Signed-off-by: Tony Luck --- diff --git a/drivers/char/random.c b/drivers/char/random.c index 7737b5bd26af..5c4ec0abb702 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -745,16 +745,19 @@ void add_interrupt_randomness(int irq, int irq_flags) struct pt_regs *regs = get_irq_regs(); unsigned long now = jiffies; __u32 input[4], cycles = get_cycles(); + int nbytes; input[0] = cycles ^ jiffies; input[1] = irq; + nbytes = 2 * sizeof(input[0]); if (regs) { __u64 ip = instruction_pointer(regs); input[2] = ip; input[3] = ip >> 32; + nbytes += 2 * sizeof(input[0]); } - fast_mix(fast_pool, input, sizeof(input)); + fast_mix(fast_pool, input, nbytes); if ((fast_pool->count & 1023) && !time_after(now, fast_pool->last + HZ)) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] X86 ACPI: Use #ifdef not #if for CONFIG_X86 check
Fix a build warning on ia64: include/linux/acpi.h:437:5: warning: "CONFIG_X86" is not defined Signed-off-by: Tony Luck --- diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 4f42332..f70f18d 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -434,7 +434,7 @@ void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state, acpi_status acpi_os_prepare_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control); -#if CONFIG_X86 +#ifdef CONFIG_X86 void arch_reserve_mem_area(acpi_physical_address addr, size_t size); #else static inline void arch_reserve_mem_area(acpi_physical_address addr, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: boot failure on i7-3317u in Samsung 900x3c
> STATUS be23110a MCGSTATUS 5 The SDM can help here. See volume 3A, section 15.9.2 "Compound Error Codes". The low 16 bits of the status in this case are 0001 0001 1010 This tells us that you have a cache error in L2 cache severe enough that the processor has begun filtering further error reports. -Tony
RE: [RFC PATCH 0/3] mca_config stuff
> Therefore, I can toggle the bits in the mce code with mca_cfg.. > When defining accessing them through the device attributes in sysfs, I > use a new macro DEVICE_BIT_ATTR which gets the corresponding bit number > of that same bit in the bitfield. This gives only one function which > operates on a bitfield instead of a single function per bit in the > bitfield. Is this true across all architectures? I know that pa-risc instructions that operate on bitfields use "0" to operate on the high order bit rather than the low order one. I don't recall whether this spills over into the compiler. If it did, then you'd have to have different #defines for the bit numbers[1]. For this specific use case it wouldn't matter because you are just using it in x86 code. But device_store_bit() and device_show_bit() are in generic code - so they must be able to work across all architectures. -Tony [1] Or fix the store/show bit functions to transform the bit numbers from "little-bitian" to "big-bitian" on architectures that count the other way. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH 3/3] Convert mce_disabled
struct mca_config { - u64 dont_log_ce : 1, -#define MCA_CFG_DONT_LOG_CE0 - __resv1 : 63; + u64 dont_log_ce : 1, +#define MCA_CFG_DONT_LOG_CE 0 + mca_disabled: 1, +#define MCA_CFG_MCA_DISABLED 1 + __resv1 : 62; }; If we do head in this direction - I don't think it is useful to change just one bit on each commit. We should batch in larger groups. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 3/3] HWPOISON, hugetlbfs: fix RSS-counter warning
if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { - if (PageAnon(page)) + if (PageHuge(page)) + ; + else if (PageAnon(page)) dec_mm_counter(mm, MM_ANONPAGES); else dec_mm_counter(mm, MM_FILEPAGES); This style minimizes the "diff" ... but wouldn't it be nicer to say: if (!PageHuge(page)) { old code in here } -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/3] HWPOISON, hugetlbfs: fix warning on freeing hwpoisoned hugepage
> This patch fixes the warning from __list_del_entry() which is triggered > when a process tries to do free_huge_page() for a hwpoisoned hugepage. Ultimately it would be nice to avoid poisoning huge pages. Generally we know the location of the poison to a cache line granularity (but sometimes only to a 4K granularity) ... and it is rather inefficient to take an entire 2M page out of service. With 1G pages things would be even worse!! It also makes life harder for applications that would like to catch the SIGBUS and try to take their own recovery actions. Losing more data than they really need to will make it less likely that they can do something to work around the loss. Has anyone looked at how hard it might be to have the code in memory-failure.c break up a huge page and only poison the 4K that needs to be taken out of service? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v3] x86/mce: Honour bios-set CMCI threshold
> What's wrong with userspace tools parsing /proc/cmdline and seeing that > mce_bios_cmci_threshold has been set since this is the only way to set > it anyway? The argument might be on the command line, but may have been rejected because the BIOS didn't set the thresholds? So then you'd have to look at the command line, *and* check /var/log/messages to make sure we hadn't printed the message saying the BIOS was unsupportive. BUT ... I don't think that knowing this is sufficient. A userspace tool would want to know what value had been set for each bank. So if it really wants to do something interesting, just knowing that "bios set some thresholds" doesn't sound like enough information. BUT (squared) do you even really need to know that thresholds were set? You could look at bits {52:38} in the MCi_STATUS information for the bank to see how many corrected errors had been logged. -Tony
RE: [PATCH v3] x86/mce: Honour bios-set CMCI threshold
> @Tony: I'll send it upwards soonish in case there are no objections. > This way no stable backport will be needed. Acked-by: Tony Luck -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: new execve/kernel_thread design
> Surprisingly enough, ia64 one seems to work on actual hardware; I have sent > Tony an incremental patch cleaning copy_thread() up, waiting for results of > testing that on SMP box. Tiny bit faster than plain 3.7-rc1. lmbench3 reports fork+execve test at between 558 to 567 usec with the new code, compared with 562-572 usec with the old. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 2/2] Do not change worker's running cpu in cmci_rediscover().
> In this case, the following BUG_ON in try_to_wake_up_local() will be > triggered: > BUG_ON(rq != this_rq()); Logically this looks OK - what is the test case to trigger this? I've done a moderate amount of testing of cpu online/offline while injecting corrected errors (when testing the CMCI storm patches) ... but didn't see this problem. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online
> This patch skips taking a psinfo->buf_lock when just one cpu is online > because stopped cpus turn to offline via smp_send_stop() > in some architectures like x86, powerpc or arm64. That seems an impressive list of preconditions. So for this to help we need to have taken all but one cpu offline, then be in some code that is holding the pstore lock and get hit by an NMI which causes us to recurse into the pstore code. Can all these things really happen (did you run into this problem on a real system?). Or is this just a theoretical problem. Ugly (but practical) hacks might be OK to solve real problems. But do we really want them to fix problems that actually never happen? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH] pstore: Skip spinlock when just one cpu is online
> But you are assuming that kmsg_dump is perfect and it isn't, in which case > by putting kmsg_dump in the kdump path, you actually may be blocking kdump > from working. I think the concern is that kdump isn't perfect, so sometimes we don't get a good dump from it. In those cases it would have been nice to have a pstore log of the original problem. But ... I don't see an answer to this problem. Adding more code just increases the number of possible places we can fail (especially as we are executing in a state where we know that things are all messed up ... the first kernel panic'd because something bad happened that we didn't know how to fix). A boot argument might help - so we can force use of pstore in cases where kdump is failing (or prevent use of pstore in cases where it seem to be preventing us getting to kdump ... I don't have a preference). BUT this would only be useful if we had a repeatable problem so that we could switch to the other mode ... and it seems likely that the kinds of problems that cause pstore or kdump to fail would be weird cases that are not very repeatable :-( -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] ACPI5 error injection fix
The following changes since commit b69f0859dc8e633c5d8c06845811588fe17e68b3: Linux 3.7-rc8 (2012-12-03 11:22:37 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-einj-fix-for-acpi5 for you to fetch changes up to 112f1fc08d0b3f81c594af617d88c0db6ce0873c: ACPI, APEI, EINJ: Add missed ACPI5 support for error trigger table (2012-12-07 11:50:02 -0800) Trivial fix for error injection code using ACPI5 version of EINJ Chen Gong (1): ACPI, APEI, EINJ: Add missed ACPI5 support for error trigger table drivers/acpi/apei/einj.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] pstore fixes for 3.8 merge window
The following changes since commit 9489e9dcae718d5fde988e4a684a0f55b5f94d17: Linux 3.7-rc7 (2012-11-25 17:59:19 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git tags/please-pull-pstore_mevent for you to fetch changes up to f94ec0c0594ef73ab3a2f1f32735aca8ddaf65e2: efi_pstore: Add a format check for an existing variable name at erasing time (2012-11-26 16:08:37 -0800) Patch series to allow EFI variable backend to pstore to hold multiple records. Seiji Aguchi (7): efi_pstore: Check remaining space with QueryVariableInfo() before writing data efi_pstore: Add a logic erasing entries to an erase callback efi_pstore: Remove a logic erasing entries from a write callback to hold multiple logs efi_pstore: Add ctime to argument of erase callback efi_pstore: Add a sequence counter to a variable name efi_pstore: Add a format check for an existing variable name at reading time efi_pstore: Add a format check for an existing variable name at erasing time drivers/acpi/apei/erst.c | 16 ++--- drivers/firmware/efivars.c | 163 +++- fs/pstore/inode.c |7 +- fs/pstore/internal.h |2 +- fs/pstore/platform.c | 13 ++-- fs/pstore/ram.c|9 ++- include/linux/efi.h|1 + include/linux/pstore.h |6 +- 8 files changed, 144 insertions(+), 73 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v5 0/7] efi_pstore: multiple event logging support
v4 -> v5 - Rebase to 3.7-rc5 - Add count to an argument of a write callback executed in pstore_console_write() to build successfully in case where CONSIG_PSTORE_CONSOLE=y is specified. (Patch 5/7) Applied. It's in my git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git next tree now so should pick up into linux-next by tomorrow or the day after. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 1/1] arch Kconfig: remove references to IRQ_PER_CPU
> But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig > files where it was defined or selected. It's completely unused so remove > the remaining references. Acked-by: Tony Luck [Hope someone picks up this whole patch ... otherwise I can take the ia64 hunk] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Fwd: PROBLEM: Random kernel panic & system freeze when watching video
>I had to build the latest mcelog from kernel.org and it tells you a >little bit more: it is an internal parity error. I don't know, though, >what errors reported in bank 2 pertain to on this cpu model - Intel >should know :). Intel is a big place ... we didn't document which bank reports which structure in the public data sheet ... so I'd have to find some internal docs for that. Not sure it would help though as the MC2STATUS register value 0xb205 just says "internal error" - without any extra bits in the "mscod" subfield. This is the "default:" at the end of the internal "switch" where all the usual suspects have been eliminated and we just know that something went wrong. One possibility is a thermal problem (since I presume that video decode is making the CPU work hard). Does the machine get noticeably warmer, or does the fan kick up to high speed when this happens? Another (harder to isolate) is power supply ... if the CPU is drawing more power during video decode, then you might see a brownout where voltage drops too low. This could cause all sorts of internal problems. How predictable is the problem? If you power on the system from cold (unused for a few minutes) and immediately start watching a video (same video in multiple tests) ... do you see the hang/crash at the same point in the movie? Or are there large variations. If we have a plain s/w bug doing something evil that trips the problem, I'd expect a high level of repeatability. If you have a thermal problem then you may be able to accelerate it by partially blocking the fan exhaust. Electrical ... I suppose it might be different on AC power vs. battery. -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
[GIT PULL] Use perf/event tracing to report PCI Express advanced errors
The following changes since commit d1c3ed669a2d452cacfb48c2d171a1f364dae2ed: Linux 3.8-rc2 (2013-01-02 18:13:21 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-aer-trace for you to fetch changes up to 2cced2d95961acd318e9395578a60ee424d9db80: aerdrv: Cleanup log output for AER (2013-01-03 14:35:41 -0800) Use perf/event tracing to report PCI Express advanced errors. Lance Ortiz (3): aerdrv: Trace Event for PCI Express Advanced Error Reporting aerdrv: Enhanced AER logging aerdrv: Cleanup log output for AER drivers/acpi/apei/cper.c | 19 ++-- drivers/pci/pcie/aer/aerdrv_errprint.c | 63 ++ include/linux/aer.h|4 +- include/trace/events/ras.h | 77 4 files changed, 129 insertions(+), 34 deletions(-) create mode 100644 include/trace/events/ras.h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v9 3/3] aerdrv: Cleanup log output for AER
I sent a "please pull" to Ingo/Peter/Thomas about an hour ago ... if they push back (or ignore) we can fold your ack and nit-picks into another version. > s/elimiating/eliminating/ above. Ugh ... nobody spotted this one ("many eyes" really does work!) > I remove the "v1-v2" notes when I merge patches because I don't think > they're useful any more. But if Tony applies these, he can use his > judgment. Yes - I do too. In fact it is easier on the maintainer if this sort of meta-commentary goes *after* the "---" in the patch. Then it is available for review (where it is most helpful), but tools will automatically drop it when applying the patch. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ia64: Make sure interrupts enabled when we "safe_halt()"
In commit d166991234347215dc23fc9dc15a63a83a1a54e1 idle: Implement generic idle function Thomas Gleixner cleaned up many things but perturbed some fragile code that was keeping ia64 alive. So we started seeing: WARNING: at kernel/cpu/idle.c:94 cpu_idle_loop+0x360/0x380() and other unpleasantness like system hangs during boot. We really shouldn't ever halt with interrupts disabled. Signed-off-by: Tony Luck --- Please fold into the same branch as the generic idle changes. diff --git a/arch/ia64/include/asm/irqflags.h b/arch/ia64/include/asm/irqflags.h index 2b68d85..1bf2cf2 100644 --- a/arch/ia64/include/asm/irqflags.h +++ b/arch/ia64/include/asm/irqflags.h @@ -89,6 +89,7 @@ static inline bool arch_irqs_disabled(void) static inline void arch_safe_halt(void) { + arch_local_irq_enable(); ia64_pal_halt_light(); /* PAL_HALT_LIGHT */ } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 0/5] ACPI / scan: Make it possible to use the container hotplug with other scan handlers
>> Tony promised me to test those patches on his box, so we'll know for sure >> in a while. Tested this series - and the box boots just fine with no unexpected messages. But I should note that this box doesn't have anything that is hot pluggable, so I couldn't test hotplug (which seems to be deeply involved with things that this patch is touching). Of course that means that I haven't been testing hotplug - so it might have been broken for years and I'd never have noticed. -Tony
RE: [GIT PULL] Some error injection fixes to queue for 3.11
>> Pulled, thanks Tony! >> >> Len, are you fine with this route [tip:x86/ras tree] for the >> drivers/acpi/apei/einj.c changes? > > Yes, the RAS guys basically own that code. These patches also got picked up by Rafael and are in his ACPI tree too. I think the patches were applied identically, so there should not be any merge conflicts when this all comes back together in the 3.11 merge window. Rafael already had a chat about who will take future apei changes so that we won't have this happen again. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
> Interesting, why? Why would we even need such an option? My impression > is, if ACPI tells us FF, MCE code doesn't poll those banks anymore. So > where do the duplicated reports come from? The option is only disabling the Linux side of firmware first ... the BIOS will still be doing it and generating records to feed to the OS using APEI. So Linux may see the error in a bank and report it, and BIOS may report the same error. Though I'd expect that to be rare as whoever saw it first would most likely clear the bank before the other could see it. I asked for the option because I'm nervous about just skipping some banks on the say-so of the BIOS ... what if the BIOS did something wrong. This option gives us a way to return to the way things were before this patch. These parts are now looking good ... but we still need to tackle what Linux does when it does get the CPER record. I suspect we need to preserve the existing "fake an mcelog entry with just the address" on old platforms, but need to do something smarter on new ones. -Tony
RE: [PATCH v4 0/8] Nvram-to-pstore
> You need to mount pstore to access the files. > > # mkdir /dev/pstore > # mount -t pstore - /dev/pstore > > to unmount > > # umount /dev/pstore > > References: http://lwn.net/Articles/421297/ Note that /dev/pstore has fallen out of fashion as the mount point ... we now (since 3.9) suggest /sys/fs/pstore > Documentation/ABI/testing/pstore This file was updated with the new location. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
> Why, fill out struct mce and do mce_log(mce) does not suffice? There is (or should be) a lot more interesting stuff in the CPER than just the address. Stuff that we don't have fields for in the existing mcelog structure. We also need to treat filtered records from modern APEI implementations a bit differently from the old stuff. The original user of this code was Westmere-EX, which used it as a workaround for a missing address in MCi_ADDR for corrected errors. So in that scenario we had every error being reported and mcelog(8) deamon doing the threshold analysis to decide when to take action. In this new modern world - Naveen wants to have the BIOS decide the threshold, so we'd like Linux to take some action as soon as it sees just one CPER. -Tony
RE: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
>> There is (or should be) > > Ha! Oh ye of little faith - I'm sure the BIOS will get this right this time :-) > Ok, seriously: so the situation should still be fine, FF reported errors > get the CPER format while the rest, the "old" MCE format. > > cper.c is doing printk so I'm guessing it would need to get its own > tracepoint and carry that to userspace. Yes - a tracepoint is the right answer here for all the new stuff. > Concerning the RAS daemon, Robert and I are making good progress so once > we have the persistent events in perf, we can read that tracepoint in > userspace and do whatever we want with the error info. Mauro has a rasdaemon in progress git://git.fedorahosted.org/rasdaemon.git just picks up perf/events and logs to a sqlite database. >> In this new modern world - Naveen wants to have the BIOS decide the >> threshold, so we'd like Linux to take some action as soon as it sees >> just one CPER. > > Why would Linux have to intervene if it is doing FF - wasn't the deal > behind Firmware First for the firmware to get the error first and handle > accordingly? Because Linux can do runtime things that the BIOS can't - like offline a 4K page. Idea here is that BIOS does whatever the OEM thinks is the right level of threshholding - not bothering the OS with petty details of random corrected erorrs that mean nothing. But if there is some repeated error (like a stuck bit) then the BIOS can provide a CPER to the OS telling it that it would be a good idea to stop using that page. And this is where the semantics of a CPER change between the original WSM-EX implementation ... where Linux expects to see all the errors and do its own thresholding only taking a page offline if it sees a lot of CPER refer to the same page; and now - where the BIOS does the counting and tells Linux just once to take the page offline. -Tony
RE: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
> Ok, where is that semantics? What in a CPER record does say "this error > should tell you that you need to offline the containing page and I'm > telling you this exactly only once"? Error Severity 0, i.e. Recoverable? Naveen - this one is for you (or for your BIOS team). Can you get us a sample CPER that you plan to provide when the BIOS decides that its threshold has been exceeded? How will it be different from what old WSM-EX platforms were sending to us? Hopefully the answer is encoded in the CPER record and not in some code we have to put in Linux to say "if (IBMplatform) do_thing_1(); else ... " > Ok, we're talking about the S in RAS now. Do we have error recovery > strategies specified anywhere? Are they per-platform or generic? Is this > CPER strategy above, for example, only valid for some platforms or for > all APEI-using hardware? mcelog(8) daemon has been doing this for years ... but it used the "predictive failure analysis" buzzwords that were popular way back then (today the marketing people seem to prefer "self healing" ). Whatever the name, the concept is the same ... take some set of corrected event reports and infer from them that something worse may happen soon, and use that information to try to avoid the (possibly) impending crash. > Questions over questions... Questions are good - they help fill out gaps -Tony
RE: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
> The above question about what to do *without* going to userspace and > back is maybe more interesting and we'd need a clean design there... > we'll see. Yes - this case (where the BIOS did all the threshold math and made the decision) should be one where Linux kernel could just implement the action directly. Perhaps controlled by a knob to say whether we really trust the BIOS that much. But we will also have cases where a smart user agent can correlate data from multiple sources to identify the real root cause (e.g. some temperature anomalies around the same time as some memory errors that occur at 10am on the third Tuesday each month -> cause is air conditioner maintenance guy that shuts down the a/c for 10 minutes to change the filter). I'll leave writing an agent that smart as an exercise for the concerned data center manager :-) -Tony
[PATCH] [IA64] sim: Add casts to avoid assignment warnings
Pointers in the efi_runtime_services_t structure now have type "void *" (formerly they were "unsigned long"). So we now see a bunch of warnings like this: arch/ia64/hp/sim/boot/fw-emu.c:293: warning: assignment makes pointer from integer without a cast Add (void *) casts to the 10 affected lines to make the build quiet again. Signed-off-by: Tony Luck --- Boris, Matt - Can you add this patch to the same tree that commit 43ab0476a648053e5998bf081f47f215375a4502 [linux-next id] efi: Convert runtime services function ptrs is in so that it will follow along behind it. Thanks. arch/ia64/hp/sim/boot/fw-emu.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/arch/ia64/hp/sim/boot/fw-emu.c b/arch/ia64/hp/sim/boot/fw-emu.c index 271f412..87bf9ad 100644 --- a/arch/ia64/hp/sim/boot/fw-emu.c +++ b/arch/ia64/hp/sim/boot/fw-emu.c @@ -290,16 +290,16 @@ sys_fw_init (const char *args, int arglen) efi_runtime->hdr.signature = EFI_RUNTIME_SERVICES_SIGNATURE; efi_runtime->hdr.revision = EFI_RUNTIME_SERVICES_REVISION; efi_runtime->hdr.headersize = sizeof(efi_runtime->hdr); - efi_runtime->get_time = __pa(&fw_efi_get_time); - efi_runtime->set_time = __pa(&efi_unimplemented); - efi_runtime->get_wakeup_time = __pa(&efi_unimplemented); - efi_runtime->set_wakeup_time = __pa(&efi_unimplemented); - efi_runtime->set_virtual_address_map = __pa(&efi_unimplemented); - efi_runtime->get_variable = __pa(&efi_unimplemented); - efi_runtime->get_next_variable = __pa(&efi_unimplemented); - efi_runtime->set_variable = __pa(&efi_unimplemented); - efi_runtime->get_next_high_mono_count = __pa(&efi_unimplemented); - efi_runtime->reset_system = __pa(&efi_reset_system); + efi_runtime->get_time = (void *)__pa(&fw_efi_get_time); + efi_runtime->set_time = (void *)__pa(&efi_unimplemented); + efi_runtime->get_wakeup_time = (void *)__pa(&efi_unimplemented); + efi_runtime->set_wakeup_time = (void *)__pa(&efi_unimplemented); + efi_runtime->set_virtual_address_map = (void *)__pa(&efi_unimplemented); + efi_runtime->get_variable = (void *)__pa(&efi_unimplemented); + efi_runtime->get_next_variable = (void *)__pa(&efi_unimplemented); + efi_runtime->set_variable = (void *)__pa(&efi_unimplemented); + efi_runtime->get_next_high_mono_count = (void *)__pa(&efi_unimplemented); + efi_runtime->reset_system = (void *)__pa(&efi_reset_system); efi_tables->guid = SAL_SYSTEM_TABLE_GUID; efi_tables->table = __pa(sal_systab); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 2/2] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
> - Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a > flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it > looks like we could consider this as an indication to offline the page; > though I am not sure if/how this relates to the threshold value above. This one sounds to make sense ... the flag description sounds exactly what we want - I won't feel embarrassed explaining to people why Linux takes action when it sees a record like this. -Tony
RE: [PATCH v3] aerdrv: Move cper_print_aer() call out of interrupt context
> + /* > + * TODO: This function needs to be re-written so that it's output > + * matches the output of aer_print_error(). Right now, the output > + * is formatted very differently. > + */ So we have this big "TODO" comment sitting there very prominently ... which Linus is bound to ask about if I ask him to pull this into 3.10-rcX ... what's the impact of this? What should I say when he asks why should he pull this fix into 3.10 when there is still some work to do? Is matching the output no big deal and can wait for some future, while moving the pci bits to the work function needs to go in now? -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
[GIT PULL] Fix aer error logging
The following changes since commit e4aa937ec75df0eea0bee03bffa3303ad36c986b: Linux 3.10-rc3 (2013-05-26 16:00:47 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-aertracefix for you to fetch changes up to 37448adfc7ce0d6d5892b87aa8d57edde4126f49: aerdrv: Move cper_print_aer() call out of interrupt context (2013-05-30 10:51:20 -0700) Can't call pci_get_domain_bus_and_slot() from interupt context Lance Ortiz (1): aerdrv: Move cper_print_aer() call out of interrupt context drivers/acpi/apei/cper.c | 18 -- drivers/acpi/apei/ghes.c | 4 +++- drivers/pci/pcie/aer/aerdrv_core.c | 5 - drivers/pci/pcie/aer/aerdrv_errprint.c | 4 ++-- include/linux/aer.h| 5 +++-- 5 files changed, 12 insertions(+), 24 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] efi, pstore: Cocci spatch "memdup.spatch"
> Who wants to pick this one up? Tony? Sure - I'll take it. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 3/3] powerpc/pseries: Support compression of oops text via pstore
> Introducing headersize in pstore_write() API would need changes at > multiple places whereits being called. The idea is to move the > compression support to pstore infrastructure so that other platforms > could also make use of it. Any thoughts on the back/forward compatibility as we switch to compressed pstore data? E.g. imagine I have a system installed with some Linux distribution with a kernel too old to know about compressed pstore. I use that machine to run the latest kernels that do compression ... and one fine day one of them crashes hard - logging in compressed form to pstore. Now I boot my distro kernel to pick up the pieces ... what do I see in /sys/fs/pstore/*? Some compressed files? Can I read them with some tool? This somewhat of a corner case - but not completely unrealistic ... I'd at least like to be reassured that the old kernel won't choke when it sees the compressed blobs. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86/MCE: Update MCE severity condition check
> The SDM talks about "non-affected" logical processors, but perhaps we > can call this an "unaffected" thread? "unaffected" sounds a bit more natural (but close enough to the wording in the SDM that people should see the connection). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2 1/2] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
+/* + * Indicates MCA banks controlled by the current cpu for CMCI. Note that this + * can change when a cpu is offlined or brought online since some MCA banks + * are shared across cpus. When a cpu is offlined, cmci_clear() disables CMCI + * on all banks owned by the cpu and clears this bitfield. At this point, + * cmci_rediscover() kicks in and a different cpu may end up taking + * ownership of some of the shared MCA banks that were previously owned + * by the offlined cpu. + */ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned); Maybe an extra sentence or two at the beginning to say *why* we need this. E.g. /* * CMCI can be delivered to multiple cpus that share a machine check bank * so we need to designate a single cpu to process errors logged in each bank * in the interrupt handler (otherwise we would have many races and potential * double reporting of the same error. */ ... -Tony
[GIT PULL] for tip x86/ras branch - queue for 3.11
The following changes since commit 9e895ace5d82df8929b16f58e9f515f6d54ab82d: Linux 3.10-rc7 (2013-06-22 09:47:31 -1000) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git tags/please-pull-mce-bitmap-comment for you to fetch changes up to 0644414e62561f0ba1bea7c5ba6a94cc50dac3e3: mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem (2013-06-25 13:53:27 -0700) Better comments so we understand our existing machine check bank bitmaps - prelude to adding another bitmap soon. Naveen N. Rao (1): mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem arch/x86/kernel/cpu/mcheck/mce.c | 5 - arch/x86/kernel/cpu/mcheck/mce_intel.c | 12 2 files changed, 16 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/