> I fixed it up (see below) and can carry the fix as necessary.
I rebased the series onto 3.7-rc7 (using the same merge fix that you did) ...
so you shouldn't see the merge error next time.
-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message
> 1. use firmware information
> According to ACPI spec 5.0, SRAT table has memory affinity structure
> and the structure has Hot Pluggable Filed. See "5.2.16.2 Memory
> Affinity Structure". If we use the information, we might be able to
> specify movable memory by firmware. For example, if
> The other bit is that if you really really want high reliability, memory
> mirroring is the way to go; it is the only way you will be able to
> hotremove memory without having to have a pre-event to migrate the
> memory away from the affected node before the memory is offlined.
Some platforms do
> If any significant percentage of memory is in ZONE_MOVABLE then the memory
> hotplug people will have to deal with all the lowmem/highmem problems
> that used to be faced by 32-bit x86 with PAE enabled.
While these problems may still exist on large systems - I think it becomes
harder to constru
> > Signed-off-by: Josh Boyer
>
> Acked-by: Kees Cook
Queued for next merge window. Thanks.
-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
P
> Changelog
> v5 -> v6
> - Rebase to a latest linux-next tree.
> - Modify a comment from "efivar_update_sysfs_entry" to
> "efivar_update_sysfs_entries" in include/linux/efi.h (Patch 2/2)
Applied to my internal pstore topic branch - which feeds to linux-next. Note
that my branch was based
> Today's linux-next merge of the vfs tree got a conflict in
> arch/ia64/kernel/palinfo.c between commit 40c275bd92b8 ("[IA64] Fix stack
> overflow in create_palinfo_proc_entries") from the ia64 tree and commit
> d8e904861a28 ("palinfo fixes") from the vfs tree.
>
> I fixed it up (arbitrarily choos
> But this patch mainly to fix the unbalanced dev->enable_cnt in IA64 which
> will print WARNING Calltrace
> in dmesg.
Thanks for the explanation.
> If you think it is valuable, I will try to improve resource assignment in
> IA64 like other arch (eg arm, m68k, mips and sh..)
> in another patch.
>> Tony mentioned that this patch worked fine for him. So could you
>> kindly pick up this patch?
>
> Normally, Tony picks up the Intel side of MCE. Tony, want me to do it?
I'll pick it up. Thanks.
-Tony
The following changes since commit 07961ac7c0ee8b546658717034fe692fd12eefa9:
Linux 3.9-rc5 (2013-03-31 15:12:43 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
tags/please-pull-cmci_rediscover
for you to fetch changes up to 7a0c819d2
> Some (broken?) EFI implementations return always a MaximumVariableSize of 0,
> check against max_size only if it is non-zero.
The spec doesn't say that zero has any special meaning - so if an implementation
returns max_size == 0 but lets you set a variable to a size > 0, then I don't
think
ther
+ if (WARN_ON_ONCE(!nb))
+ goto out;
+
WARN_ON_ONCE() will drop a stack trace to the console - is that going to be
useful?
If you want a message perhaps:
if (!nb) {
printk_once("something interesting about not having
a
Calling memcmp() to check the value of the first byte in a string is overkill.
Just use buf[0] == '1' or buf[0] != '1' as appropriate.
Signed-off-by: Tony Luck
---
[Inspired by a rant on IRC about a different driver doing something similar]
diff --git a/drivers/staging/iio/addac/adt7316.c
b/d
> I will post a patch to fix it. How about always keep node0 unhotpluggable ?
Node 0 (or more specifically the node that contains memory <4GB) will be
full of BIOS reserved holes in the memory map. It probably isn't removable
even if Linux thinks it is. Someday we might have a smart BIOS that can
> assume first cpu only have 1G ram, and other 31 socket will have bunch of ram
That doesn't seem to be a very realistic assumption. Can you even still buy 1G
DIMMs for servers? I'd think that a minimum would be to have each of four
channels populated with a 4G DIMM - so 16GB on first cpu. But e
> b. it will be freed to slub before run time.
> like init code and initrd disk.
If this is a problem - I'd be inclined to disable the code that frees it. It's
only
a few hundred KB of code, and possibly a few MB of initrd. Too small to
worry about on a hot pluggable server.
> In that ca
> In efi_init() memory aligns in IA64_GRANULE_SIZE(16M). If set
> "crashkernel=1024M-:600M"
Is this where the real problem begins? Should we insist that users provide
crashkernel
parameters rounded to GRANULE boundaries?
-Tony
N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�&j:+v������
The following changes since commit d1c3ed669a2d452cacfb48c2d171a1f364dae2ed:
Linux 3.8-rc2 (2013-01-02 18:13:21 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
tags/please-pull-pstore
for you to fetch changes up to fb0af3f2b1b613e
The following changes since commit 2ef14f465b9e096531343f5b734cffc5f759f4a6:
Merge branch 'x86-mm-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2013-02-21 18:06:55
-0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linu
> Note: I'm not sure if it's ok to change sysfs entries and this does break
> userspace tools that depend on the current path for some of these attributes.
> So, they will need to be updated to use the new path. However, if we ever get
> to a point where cpu0 can be offlined, these tools will need
> This is an unusual configuration but it's not unheard of. PPC64 in rare
> (and usually broken) configurations can have one node span another. Tony
> should know if such a configuration is normally allowed on Itanium or if
> this should be considered a platform bug. Tony?
We definitely have platf
> this i believe builds an implicit dependency between the mca_asm.o
> position within the image and the ia64_mca_data percpu variable it
> accesses - it relies on the immediate 22 addressing mode that has 4MB of
> scope. Per chance, the .config you sent creates a 14MB image, and the
> percpu v
> I'll start digging on why this doesn't boot ... but you might as well
> send the fixes so far upstream to Linus so that the SMP fix is available
Well a pure 2.6.24 version compiled with CONFIG_SMP=n booted just fine, so
the breakage is recent ... and more than likely related to this change.
I'v
> hm, as far as i could check, on ia64 UP the .percpu section link
> difference was the only ia64 difference i could find out of those
> changes. Could you try to copy a 2.6.24 include/asm-generic/percpu.h,
> include/asm-ia64.h and include/linux/percpu.h into your current tree,
> and see whethe
> So the percpu changes are innocent ... something else since 2.6.24 is
> to blame. Only 5749 commits :-) I'll start bisecting.
12 bisections later ... nothing! I think I got lost in the
maze. Bisection #5 had a crash, but it looked to be a very
differnt crash (and looked to happen later than
> Applied that patch and UP kernel built ok, and then crashed in the
> same place with the memset() to a user-looking address from kmem_cache_alloc()
>
> So the percpu changes are innocent ... something else since 2.6.24 is
> to blame. Only 5749 commits :-) I'll start bisecting.
The bisection na
> Now that I've an IA64 box on top of the other boxes
> (IBM with Calgary-X, Intel VT-d, AMD Vi, and AMD GART - that
> can use SWIOTLB as fallback) I can reliably do regression
> testing.
Acked-by: Tony Luck
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body o
> First of all, I do think I was answering your question. As I said
> before, if an online cpu == dying here, there must be something wrong.
> Am I right here ?
Yes - but there is a fuzzy line over where it is good to check for "something
wrong"
or whether to trust that the caller of the function
> I wonder if the default should be to not show headers, and to add this
> flag to the backends that want the pstore-added header. I think the
> more common case going forward will to be without headers since
> backends should arguably storing metadata themselves.
Perhaps just add the headings whe
> This config item has not carried much meaning for a while now and is
> almost always enabled by default. As agreed during the Linux kernel
> summit, remove it.
Acked-by: Tony Luck
[ditto for parts 012 and 013 of 193]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" i
> Third round, incorporating feedback from the last time.
Paste one of these onto each piece:
Acked-by: Tony Luck
Acked-by: Tony Luck
Acked-by: Tony Luck
Acked-by: Tony Luck
Acked-by: Tony Luck
-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" i
> Just curious -- can you reproduce the same problem with
> CONFIG_PRINTK_TIME as I'm seeing?
Yes I can reproduce this (on latest Linus tree). System
dies with no console output ... looks like the boot cpu
may have taken a machine check (it isn't responding to my
debugger).
-Tony
--
To unsubscri
> I guess sched_init() is too early... it does seem really strange to
> me, but I just double checked with Ingo's patch and it does indeed
> hang. The slow way to make progress is just to go through
> start_kernel() line-by-line and enable cpu_clock() at each stage, and
> see where it stops hangin
On 10/31/2012 11:42 AM, Rafael J. Wysocki wrote:
> I wonder if the x86 and/or ia64 maintainers have any reservations?
Can you elaborate on the "tested by mika" that you put into the 0/5
message. Especially w.r.t. ia64. Compile tested? Boot tested? Ran with
some new device that uses the ACPI enumer
> By "tested" I mean "run with some new devices that use the ACPI enumeration
> provided here, on x86". Sorry for being too vague.
Do you or Mika have access to an ia64 box to test. If not, can you suggest
some way that I could exercise this code w/o the new devices. Or at least
reassure myself
> The BIOSes of currently available ia64 systems don't contain ACPI nodes whose
> IDs will match the IDs of the new devices (ie. the ones that are going to be
> added to acpi_platform_device_ids[]), so for ia64 it should be sufficient to
> test that code as is (ie. without any new devices in the sy
> That is correct, unfortunately. That information is not available to
> software in all cases. Maybe APEI could be used for that DIMM location
> mapping through simple tables instead of letting it fumble the error
> handling path.
Not much hope for "simple"[1] tables. There is also a timings iss
> Right, but at least in the csrow case, we still can compute back the
> csrow even with the interleaving, after we know how it is done exactly
> (on which address bits, etc). I think this should be doable on Intel
> controllers too but I don't know.
No. Architecturally all Intel provides is the p
> I agree with this. Most of it looks easily fixable, but how would I
> enable the fix for ia64? For PA it's simple: I'll just use
> CONFIG_STACK_GROWSUP, but that won't work for you.
ia64 has an ugly chicken vs. egg build dependency. When trying to build our
asm-offsets.h
file (to get #define
> And my plan was to get rid of the fact that backends touch pstore->buf
> directly. Backends would always receive anonymous 'buf' pointer (we
> already have write_buf callback that does exactly this), and thus it
It feels like we are just shuffling the lock problem from one place
to another. In
> If we go back to first principles, what do we want to do? We want the
> system administrator to know that a file might be potentially
> corrupted. And perhaps, if a program tries to read from that file, it
> should get an error. If we have a program that has that file mmap'ed
> at the time of
> Well, we could set a new attribute bit on the file which indicates
> that the file has been corrupted, and this could cause any attempts to
> open the file to return some error until the bit has been cleared.
That sounds a lot better than renaming/moving the file.
> This would persist across re
> What I would recommend is adding a
>
> #define FS_CORRUPTED_FL 0x0100 /* File is corrupted */
>
> ... and which could be accessed and cleared via the lsattr and chattr
> programs.
Good - but we need some space to save the corrupted range information
too. These errors should be
The following changes since commit 8f0d8163b50e01f398b14bcd4dc039ac5ab18d64:
Linux 3.7-rc3 (2012-10-28 12:24:48 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
tags/please-pull-tangchen
for you to fetch changes up to 85b97637bb40a9f4
>>
>> Tony Luck (1):
>> [IA64] sim: Add casts to avoid assignment warnings
>>
>> arch/ia64/hp/sim/boot/fw-emu.c | 20 ++--
>> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> I don't see this commit in Lin
> Nope, this is a just-in-case thing. I think you or Tony asked to have
> this in a previous discussion so that we're covered if firmware starts
> acting up. Other than that, I'm ok if this is left out.
I'm struggling to think of a case where this would help. It implies that
we are on a running
> What would be a reasonable maximum limit for the number of memory
> controllers, on a -EX machine?
Westmere-EX has one memory controller per socket ... and there are glueless
systems up to 8 sockets. So 8 there. Not sure if any OEM is building larger
machines with a node controller (SGI? Not
This is good - but the real solution is to stop poisoning entire huge pages ...
they should
be broken into 4K pages and just one 4K page should be poisoned.
Naoya Horiguchi: I thought that you were looking at this problem some months
ago. Any progress?
-Tony
--
To unsubscribe from this list: se
>>Sorry, I have no meaningful progress on this. Splitting hugepages is not
>>a trivial operation, and introduce more complexity on hugetlbfs code.
>>I don't hit on any usecase of it rather than memory failure, so I'm not
>>sure that it's worth doing now.
>
> Agreed. ;-)
Agreed that huge pages shou
> Transparent huge pages are not helpful for DB workload which there is a lot
> of
> shared memory
Hmm. Perhaps they should be. If a database allocates most[1] of the memory on a
machine to a shared memory segment - that *ought* to be a candidate for using
transparent huge pages. Now that we h
The following changes since commit e831cbfc1ad843b5542cc45f777e1a00b73c0685:
Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux (2013-09-11 08:36:03
-0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.gi
> The reason behind compression failure is the size of big_oops_buf which is too
> big for efivars case. I will do some experiments with different kind of texts
> for buffer size 1024 to check if 100/53 suits for all the cases.
...
> Yes this can be changed to zlib_inflateInit2().
Original patch
> *If* however the cpu_relax() makes sense on other platforms maybe we could
> add something like we have already with "arch_mutex_cpu_relax()":
I'll do some more measurements on ia64. During my first tests cpu_relax()
seemed
to be a big win - but I only ran "./t" a couple of times. Later (with
> And there can't be any livelock, since by definition somebody else
> _did_ make progress. In fact, adding the cpu_relax() probably just
> makes things much less fair - once somebody else raced on you, the
> cpu_relax() now makes it more likely that _another_ cpu does so too.
>
> That said, let's
> Also, it strikes me that ia64 has tons of different versions of
> cmpxchg, and the one you use by default is the one with "acquire"
> semantics
Not "tons", just two. You can ask for "acquire" or "release" semantics,
there is no relaxed option.
Worse still - early processor implementations actu
> That said, another thing that strikes me is that you have 32 CPU
> threads, and the stupid test-program I sent out had MAX_THREADS set to
> 16. Did you change that? Becuase if not, then some of the extreme
> performance profile might be about how the threads get scheduled on
> your machine (HT t
>> Worse still - early processor implementations actually just ignored
>> the acquire/release and did a full fence all the time. Unfortunately
>> this meant a lot of badly written code that used .acq when they really
>> wanted .rel became legacy out in the wild - so when we made a cpu
>> that stri
- big_oops_buf_sz = (psinfo->bufsize * 100) / 45;
+ big_oops_buf_sz = (psinfo->bufsize * 100) / cmpr;
Tested on an ERST backed system. Seems to be working (we save a little less
information
per ERST record than before this change (uncompressed size goes down from
~17500 to
~16400 by
+ default:
+ cmpr = 60;
+ break;
+ }
Is this the right "default"? It may be a good choice for a backend with a
really
tiny buffer (1 ... 999). But less good for a (theoretical) backend with a
larger
buffer (10001 ... infinity and beyond). Which are you
>> Previous attempt to fix was b042e47491ba5f487601b5141a3f1d8582304170
>>
>> Suggested use of is_power_of_2() was bogus because is_power_of_2(0) is
>> false (documented behaviour).
>>
>> Signed-off-by: Maxime Bizon
>
> Yes, excellent point. :)
>
> Acked-by: Kees Cook
Applied. Thanks.
-Tony
--
The following changes since commit b36f4be3de1b123d8601de062e7dbfc904f305fb:
Linux 3.11-rc6 (2013-08-18 14:36:53 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
tags/please-pull-pstore
for you to fetch changes up to 3bd11cf56e4d9c
While we are likley to succeed and break out of this loop, it isn't
guaranteed. We should be power and thread friendly if we do have to
go around for a second (or third, or more) attempt.
Signed-off-by: Tony Luck
---
diff --git a/lib/lockref.c b/lib/lockref.c
index 7819c2d..9d76f40 100644
---
> In this case fast_mix would use two uninitialized ints from the stack
> and mix it into the pool.
Is the concern here is that an attacker might know (or be able to control) what
is on
the stack - and so get knowledge of what is being mixed into the pool?
> In this case set the input to 0.
And
Fix a build warning on ia64:
include/linux/acpi.h:437:5: warning: "CONFIG_X86" is not defined
Signed-off-by: Tony Luck
---
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4f42332..f70f18d 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -434,7 +434,7 @@ void acpi_
> STATUS be23110a MCGSTATUS 5
The SDM can help here. See volume 3A, section 15.9.2 "Compound Error Codes".
The low 16 bits of the status in this case are 0001 0001 1010
This tells us that you have a cache error in L2 cache severe enough that the
processor has begun filtering further
> Therefore, I can toggle the bits in the mce code with mca_cfg..
> When defining accessing them through the device attributes in sysfs, I
> use a new macro DEVICE_BIT_ATTR which gets the corresponding bit number
> of that same bit in the bitfield. This gives only one function which
> operates on a
struct mca_config {
- u64 dont_log_ce : 1,
-#define MCA_CFG_DONT_LOG_CE0
- __resv1 : 63;
+ u64 dont_log_ce : 1,
+#define MCA_CFG_DONT_LOG_CE 0
+ mca_disabled: 1,
+#define MCA_CFG_MCA_DISABLED 1
+ __resv1 : 62;
};
If we
if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
- if (PageAnon(page))
+ if (PageHuge(page))
+ ;
+ else if (PageAnon(page))
dec_mm_counter(mm, MM_ANONPAGES);
else
> This patch fixes the warning from __list_del_entry() which is triggered
> when a process tries to do free_huge_page() for a hwpoisoned hugepage.
Ultimately it would be nice to avoid poisoning huge pages. Generally we know the
location of the poison to a cache line granularity (but sometimes only
> What's wrong with userspace tools parsing /proc/cmdline and seeing that
> mce_bios_cmci_threshold has been set since this is the only way to set
> it anyway?
The argument might be on the command line, but may have been rejected
because the BIOS didn't set the thresholds? So then you'd have to lo
> @Tony: I'll send it upwards soonish in case there are no objections.
> This way no stable backport will be needed.
Acked-by: Tony Luck
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http:/
> Surprisingly enough, ia64 one seems to work on actual hardware; I have sent
> Tony an incremental patch cleaning copy_thread() up, waiting for results of
> testing that on SMP box.
Tiny bit faster than plain 3.7-rc1. lmbench3 reports fork+execve test at between
558 to 567 usec with the new code,
> In this case, the following BUG_ON in try_to_wake_up_local() will be
> triggered:
> BUG_ON(rq != this_rq());
Logically this looks OK - what is the test case to trigger this? I've done a
moderate
amount of testing of cpu online/offline while injecting corrected errors (when
testing
the CMCI s
> This patch skips taking a psinfo->buf_lock when just one cpu is online
> because stopped cpus turn to offline via smp_send_stop()
> in some architectures like x86, powerpc or arm64.
That seems an impressive list of preconditions. So for this to
help we need to have taken all but one cpu offline
> But you are assuming that kmsg_dump is perfect and it isn't, in which case
> by putting kmsg_dump in the kdump path, you actually may be blocking kdump
> from working.
I think the concern is that kdump isn't perfect, so sometimes we don't get a
good dump
from it. In those cases it would have be
The following changes since commit b69f0859dc8e633c5d8c06845811588fe17e68b3:
Linux 3.7-rc8 (2012-12-03 11:22:37 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
tags/please-pull-einj-fix-for-acpi5
for you to fetch changes up to 112f1f
The following changes since commit 9489e9dcae718d5fde988e4a684a0f55b5f94d17:
Linux 3.7-rc7 (2012-11-25 17:59:19 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
tags/please-pull-pstore_mevent
for you to fetch changes up to f94ec0c0
v4 -> v5
- Rebase to 3.7-rc5
- Add count to an argument of a write callback executed in
pstore_console_write()
to build successfully in case where CONSIG_PSTORE_CONSOLE=y is specified.
(Patch 5/7)
Applied. It's in my
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git next
> But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig
> files where it was defined or selected. It's completely unused so remove
> the remaining references.
Acked-by: Tony Luck
[Hope someone picks up this whole patch ... otherwise I can take the ia64 hunk]
--
To unsubscribe from
>I had to build the latest mcelog from kernel.org and it tells you a
>little bit more: it is an internal parity error. I don't know, though,
>what errors reported in bank 2 pertain to on this cpu model - Intel
>should know :).
Intel is a big place ... we didn't document which bank reports
which st
The following changes since commit d1c3ed669a2d452cacfb48c2d171a1f364dae2ed:
Linux 3.8-rc2 (2013-01-02 18:13:21 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
tags/please-pull-aer-trace
for you to fetch changes up to 2cced2d95961acd
I sent a "please pull" to Ingo/Peter/Thomas about an hour ago ... if they
push back (or ignore) we can fold your ack and nit-picks into another
version.
> s/elimiating/eliminating/ above.
Ugh ... nobody spotted this one ("many eyes" really does work!)
> I remove the "v1-v2" notes when I merge pat
In commit d166991234347215dc23fc9dc15a63a83a1a54e1
idle: Implement generic idle function
Thomas Gleixner cleaned up many things but perturbed some
fragile code that was keeping ia64 alive. So we started
seeing:
WARNING: at kernel/cpu/idle.c:94 cpu_idle_loop+0x360/0x380()
and other unpleasantn
>> Tony promised me to test those patches on his box, so we'll know for sure
>> in a while.
Tested this series - and the box boots just fine with no unexpected messages.
But I should note that this box doesn't have anything that is hot pluggable, so
I
couldn't test hotplug (which seems to be dee
>> Pulled, thanks Tony!
>>
>> Len, are you fine with this route [tip:x86/ras tree] for the
>> drivers/acpi/apei/einj.c changes?
>
> Yes, the RAS guys basically own that code.
These patches also got picked up by Rafael and are in his ACPI tree
too. I think the patches were applied identically, so
> Interesting, why? Why would we even need such an option? My impression
> is, if ACPI tells us FF, MCE code doesn't poll those banks anymore. So
> where do the duplicated reports come from?
The option is only disabling the Linux side of firmware first ... the BIOS
will still be doing it and gener
> You need to mount pstore to access the files.
>
> # mkdir /dev/pstore
> # mount -t pstore - /dev/pstore
>
> to unmount
>
> # umount /dev/pstore
>
> References: http://lwn.net/Articles/421297/
Note that /dev/pstore has fallen out of fashion as the mount point ... we now
(since 3.9)
suggest /
> Why, fill out struct mce and do mce_log(mce) does not suffice?
There is (or should be) a lot more interesting stuff in the CPER than just the
address. Stuff
that we don't have fields for in the existing mcelog structure. We also need
to treat filtered
records from modern APEI implementations
>> There is (or should be)
>
> Ha!
Oh ye of little faith - I'm sure the BIOS will get this right this time :-)
> Ok, seriously: so the situation should still be fine, FF reported errors
> get the CPER format while the rest, the "old" MCE format.
>
> cper.c is doing printk so I'm guessing it woul
> Ok, where is that semantics? What in a CPER record does say "this error
> should tell you that you need to offline the containing page and I'm
> telling you this exactly only once"? Error Severity 0, i.e. Recoverable?
Naveen - this one is for you (or for your BIOS team). Can you get us a sample
> The above question about what to do *without* going to userspace and
> back is maybe more interesting and we'd need a clean design there...
> we'll see.
Yes - this case (where the BIOS did all the threshold math and made the
decision)
should be one where Linux kernel could just implement the ac
Pointers in the efi_runtime_services_t structure now have type
"void *" (formerly they were "unsigned long"). So we now see a
bunch of warnings like this:
arch/ia64/hp/sim/boot/fw-emu.c:293: warning: assignment makes pointer from
integer without a cast
Add (void *) casts to the 10 affected lines
> - Two, the Generic Error Data Entry (aka UEFI Section Descriptor) has a
> flag which indicates 'Error Threshold Exceeded'. From the UEFI spec, it
> looks like we could consider this as an indication to offline the page;
> though I am not sure if/how this relates to the threshold value above.
> + /*
> + * TODO: This function needs to be re-written so that it's output
> + * matches the output of aer_print_error(). Right now, the output
> + * is formatted very differently.
> + */
So we have this big "TODO" comment sitting there very prominently ... which
Linus
i
The following changes since commit e4aa937ec75df0eea0bee03bffa3303ad36c986b:
Linux 3.10-rc3 (2013-05-26 16:00:47 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
tags/please-pull-aertracefix
for you to fetch changes up to 37448adfc7ce
> Who wants to pick this one up? Tony?
Sure - I'll take it.
-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.
> Introducing headersize in pstore_write() API would need changes at
> multiple places whereits being called. The idea is to move the
> compression support to pstore infrastructure so that other platforms
> could also make use of it.
Any thoughts on the back/forward compatibility as we switch to c
> The SDM talks about "non-affected" logical processors, but perhaps we
> can call this an "unaffected" thread?
"unaffected" sounds a bit more natural (but close enough to the wording in
the SDM that people should see the connection).
-Tony
--
To unsubscribe from this list: send the line "unsubsc
+/*
+ * Indicates MCA banks controlled by the current cpu for CMCI. Note that this
+ * can change when a cpu is offlined or brought online since some MCA banks
+ * are shared across cpus. When a cpu is offlined, cmci_clear() disables CMCI
+ * on all banks owned by the cpu and clears this bitfield.
The following changes since commit 9e895ace5d82df8929b16f58e9f515f6d54ab82d:
Linux 3.10-rc7 (2013-06-22 09:47:31 -1000)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
tags/please-pull-mce-bitmap-comment
for you to fetch changes up to 06444
1 - 100 of 1172 matches
Mail list logo